perf: early termination for right semi/anti hash joins#21775
perf: early termination for right semi/anti hash joins#21775SubhamSinghal wants to merge 1 commit intoapache:mainfrom
Conversation
|
run benchmark tpch10 |
|
run benchmark tpch_mem |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing semi-anti-early-termination (f436877) to 64619a6 (merge-base) diff using: tpch10 File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing semi-anti-early-termination (f436877) to 64619a6 (merge-base) diff using: tpch_mem File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch_mem — base (merge-base)
tpch_mem — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch10 — base (merge-base)
tpch10 — branch
File an issue against this benchmark runner |
|
TPC-H Benchmark @SubhamSinghal this looks too fast to be correct? |
|
@Dandandan might be some mistake on my side for calculating benchmark. but the optimisation looks fine right. Planning to file follow-up PR if direction looks right. |
|
run benchmark tpcds |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing semi-anti-early-termination (f436877) to 64619a6 (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
Which issue does this PR close?
Rationale for this change
For RightSemi/RightAnti hash joins, the probe phase only needs to know whether at least one matching build row exists per probe row. However, the current code:
(probe, build)pairequal_rows_arron all pairs (take()+eq_dyn_null(), allocating intermediate arrays)get_semi_indicesto keep just the first match per probe rowFor skewed data with N duplicate build rows per key, this produces N pairs per probe row only to keep 1 — O(N) wasted work in both chain traversal and equality verification.
What changes are included in this PR?
New
get_first_matchmethod onJoinHashMapTypetrait (join_hash_map.rs):Option<u64>— the matching build row index, orNoneget_first_match_impl, reused byJoinHashMapU32,JoinHashMapU64, andPruningJoinHashMapNew
process_probe_batch_right_semi_antifunction (stream.rs):get_first_matchwith an equality predicate built fromJoinKeyComparator::is_equal(zero-allocation per-row comparison)build_batch_from_indicesAre these changes tested?
Covered by existing tests:
No new tests added — this is a pure optimization that produces identical results. The existing test suite comprehensively covers RightSemi/RightAnti join.
Are there any user-facing changes?
No.
TPC-H Benchmark result: