Skip to content

branch-4.0: [opt](maxcompute) Optimize split generation for LIMIT queries with partition equality predicates #60895#60973

Open
github-actions[bot] wants to merge 1 commit intobranch-4.0from
auto-pick-60895-branch-4.0
Open

branch-4.0: [opt](maxcompute) Optimize split generation for LIMIT queries with partition equality predicates #60895#60973
github-actions[bot] wants to merge 1 commit intobranch-4.0from
auto-pick-60895-branch-4.0

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Mar 3, 2026

Cherry-picked from #60895

…rtition equality predicates (#60895)

### What problem does this PR solve?

When a MaxCompute query contains only partition equality predicates and
a LIMIT clause, use row_offset split strategy to read only the required
number of rows instead of generating splits for all data. This reduces
split count from potentially many to exactly one, improving query
latency
for common LIMIT patterns like `SELECT * FROM t WHERE pt='x' LIMIT N`.

Key changes:
- Add `checkOnlyPartitionEqualityPredicate()` to detect eligible queries
- Add `getSplitsWithLimitOptimization()` using SplitByRowOffset with
  crossPartition=false, reading min(limit, totalRowCount) rows
- Add session variable `enable_mc_limit_split_optimization` (default
off)
- Add timing logs for split generation phases to aid performance
diagnosis
- Add unit tests for predicate check and limit optimization logic
- Add regression tests covering single/multi-partition tables, JOINs,
  aggregations, subqueries, window functions, and edge cases
@github-actions github-actions bot requested a review from yiguolei as a code owner March 3, 2026 04:51
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring closed this Mar 3, 2026
@dataroaring dataroaring reopened this Mar 3, 2026
@hello-stephen
Copy link
Contributor

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants