test by HappenLee · Pull Request #60989 · apache/doris

HappenLee · 2026-03-03T10:00:04Z

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

When total tablets are much larger than pipeline capacity, one-tablet-per-instance planning creates excessive BE concurrency pressure in query-cache workloads. Trigger partition-based planning when: total_tablets > parallel_pipeline_task_num * participating_be_num Before: instance_num ~= total_tablets After: instance_num ~= partitions_on_each_be Per-BE planning example: BE1 tablets: p1[t1,t2], p2[t3] -> instances: [p1:t1,t2], [p2:t3] BE2 tablets: p1[t4], p2[t5,t6] -> instances: [p1:t4], [p2:t5,t6] This keeps tablets from the same partition in one instance and separates different partitions into different instances. If partition mapping is incomplete or partition planning fails, fallback to default planning for correctness. Tests: - partition-based planning path - fallback-to-default path (incomplete mapping) - non-query-cache default planning path

hello-stephen · 2026-03-03T10:00:11Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

Copilot

Pull request overview

This PR extends query-cache execution to support multiple tablets per instance (when appropriate) and updates FE planning to group tablets by partition to reduce BE instance concurrency/pressure while preserving cache correctness constraints.

Changes:

FE: Add “best-effort” partition-based instance assignment for query-cache scans of a single OLAP table, with fallback to default planning.
BE: Update query cache key construction to support multiple scan ranges (multiple tablets) in one instance and update cache source profiling accordingly.
Tests: Add FE unit tests for partition-based assignment/fallback, and expand BE unit tests for multi-tablet cache-key behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
fe/fe-core/src/main/java/.../UnassignedScanSingleOlapTableJob.java	Adds partition-based instance assignment for query-cache scans with safe fallback.
fe/fe-core/src/test/java/.../UnassignedScanSingleOlapTableJobTest.java	Adds unit tests covering partition grouping, fallback, and non-query-cache behavior.
be/src/pipeline/query_cache/query_cache.h	Enhances cache key building to handle multiple tablets per instance and validates consistency constraints.
be/src/pipeline/exec/cache_source_operator.cpp	Updates profiling output to reflect multiple tablets used for cache lookup.
be/test/pipeline/exec/query_cache_test.cpp	Adds/updates tests for multi-tablet cache-key construction and failure cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-03T10:06:53Z

be/src/pipeline/query_cache/query_cache.h

+        std::vector<int64_t> tablet_ids;
+        tablet_ids.reserve(scan_ranges.size());
+        for (const auto& scan_range : scan_ranges) {
+            auto tablet_id = scan_range.scan_range.palo_scan_range.tablet_id;
+            tablet_ids.push_back(tablet_id);
+        }
+        std::sort(tablet_ids.begin(), tablet_ids.end());
+


QueryCache::build_cache_key now uses std::sort / std::find_if (and std::from_chars) in this header, but the header doesn't include <algorithm> (and should also explicitly include <charconv> if not guaranteed elsewhere). Depending on include order this can break compilation; please add the required standard headers here.

Copilot · 2026-03-03T10:06:53Z

be/src/pipeline/exec/cache_source_operator.cpp

+            tablet_ids_str += ",";
+        }
+    }
+    custom_profile()->add_info_string("CacheTabletId", tablet_ids_str);


CacheTabletId profile entry now contains a comma-separated list of multiple tablet ids. Consider renaming the profile key to something plural (e.g. CacheTabletIds) to avoid confusion when debugging profiles.

Suggested change

custom_profile()->add_info_string("CacheTabletId", tablet_ids_str);

custom_profile()->add_info_string("CacheTabletIds", tablet_ids_str);

Copilot · 2026-03-03T10:06:54Z

be/test/pipeline/exec/query_cache_test.cpp

+        TPaloScanRange palp_scan_range1;
+        palp_scan_range1.__set_tablet_id(1);
+        palp_scan_range1.__set_version("100");


The variable name palp_scan_range* looks like a typo (should likely be palo_scan_range* to match the Thrift type TPaloScanRange). Renaming would improve readability and avoid confusion in future test edits.

Copilot · 2026-03-03T10:06:54Z

...he/doris/nereids/trees/plans/distribute/worker/job/UnassignedScanSingleOlapTableJobTest.java

+        ConnectContext connectContext = new ConnectContext();
+        connectContext.setThreadLocalInfo();
+        connectContext.setQueryId(new TUniqueId(1, 1));
+        connectContext.getSessionVariable().parallelPipelineTaskNum = 1;
+        StatementContext statementContext = new StatementContext(


These tests set the thread-local ConnectContext via setThreadLocalInfo() but never clear it (e.g. ConnectContext.remove()). Many other FE tests clean up the thread-local to avoid cross-test interference; please add an @AfterEach (or finally block) to remove the context.

Copilot · 2026-03-03T10:06:54Z

be/src/pipeline/query_cache/query_cache.h

+        for (const auto& scan_range : scan_ranges) {
+            auto tablet_id = scan_range.scan_range.palo_scan_range.tablet_id;
+            tablet_ids.push_back(tablet_id);
+        }


QueryCache::build_cache_key reads scan_range.scan_range.palo_scan_range.* without checking scan_range.scan_range.__isset.palo_scan_range (the previous DCHECK is gone). If a non-palo scan range reaches here this will build an invalid key / version; please validate the field is set for every range and return a clear error instead of assuming it.

Copilot · 2026-03-03T10:06:54Z

be/src/pipeline/query_cache/query_cache.h

+            std::from_chars(scan_range_iter->scan_range.palo_scan_range.version.data(),
+                            scan_range_iter->scan_range.palo_scan_range.version.data() +
+                                    scan_range_iter->scan_range.palo_scan_range.version.size(),
+                            current_version);


std::from_chars(...) parse result is ignored. If version is empty/non-numeric/partially-numeric, current_version can remain -1 (or parse partially) and still be treated as valid, leading to wrong cache versioning. Please check the returned std::from_chars_result (ec/ptr) and fail fast on parse errors.

Suggested change

std::from_chars(scan_range_iter->scan_range.palo_scan_range.version.data(),

scan_range_iter->scan_range.palo_scan_range.version.data() +

scan_range_iter->scan_range.palo_scan_range.version.size(),

current_version);

const auto& version_str = scan_range_iter->scan_range.palo_scan_range.version;

const char* version_begin = version_str.data();

const char* version_end = version_begin + version_str.size();

auto parse_result = std::from_chars(version_begin, version_end, current_version);

if (parse_result.ec != std::errc() || parse_result.ptr != version_end) {

return Status::InternalError("tablet version is invalid, plan error");

}

Copilot · 2026-03-03T10:06:55Z

be/src/pipeline/query_cache/query_cache.h

+            auto scan_range_iter =
+                    std::find_if(scan_ranges.begin(), scan_ranges.end(),
+                                 [&tablet_id](const TScanRangeParams& range) {
+                                     return range.scan_range.palo_scan_range.tablet_id == tablet_id;
+                                 });


build_cache_key does a std::find_if(scan_ranges.begin(), scan_ranges.end(), ...) inside a loop over tablet_ids, making it O(n^2) per instance. For large instances this adds avoidable CPU overhead; consider a single pass that builds a map (tablet_id -> parsed version) or collects (tablet_id, version, range) pairs and then sorts.

Copilot · 2026-03-03T10:06:55Z

be/src/pipeline/query_cache/query_cache.h

+
+            auto find_tablet = cache_param.tablet_to_range.find(tablet_id);
+            if (find_tablet == cache_param.tablet_to_range.end()) {
+                return Status::InternalError("Not find tablet in partition_to_tablets, plan error");


The error message says partition_to_tablets, but the code is actually checking cache_param.tablet_to_range. Please update the message to reference the correct field to make diagnosing plan issues easier.

Suggested change

return Status::InternalError("Not find tablet in partition_to_tablets, plan error");

return Status::InternalError("Not find tablet in tablet_to_range, plan error");

924060929 and others added 3 commits March 3, 2026 13:46

fix

b06fdee

[opt](query cache) Support multiple tablets in cache key building

40f64dd

Copilot AI review requested due to automatic review settings March 3, 2026 10:00

Copilot started reviewing on behalf of HappenLee March 3, 2026 10:00 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test#60989

test#60989
HappenLee wants to merge 3 commits intoapache:masterfrom
HappenLee:pr-60974

HappenLee commented Mar 3, 2026

Uh oh!

hello-stephen commented Mar 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	custom_profile()->add_info_string("CacheTabletId", tablet_ids_str);
	custom_profile()->add_info_string("CacheTabletIds", tablet_ids_str);

-            std::from_chars(scan_range_iter->scan_range.palo_scan_range.version.data(),
-                            scan_range_iter->scan_range.palo_scan_range.version.data() +
-                                    scan_range_iter->scan_range.palo_scan_range.version.size(),
-                            current_version);
+            const auto& version_str = scan_range_iter->scan_range.palo_scan_range.version;
+            const char* version_begin = version_str.data();
+            const char* version_end = version_begin + version_str.size();
+            auto parse_result = std::from_chars(version_begin, version_end, current_version);
+            if (parse_result.ec != std::errc() || parse_result.ptr != version_end) {
+                return Status::InternalError("tablet version is invalid, plan error");
+            }

	return Status::InternalError("Not find tablet in partition_to_tablets, plan error");
	return Status::InternalError("Not find tablet in tablet_to_range, plan error");

Conversation

HappenLee commented Mar 3, 2026

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

hello-stephen commented Mar 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants