feat: statistics-driven TopK optimization for parquet (file reorder + RG reorder + threshold init + cumulative prune) by zhuqi-lucas · Pull Request #21580 · apache/datafusion

zhuqi-lucas · 2026-04-13T06:40:11Z

Which issue does this PR close?

Closes #21691
Partial fix for #21399

Rationale for this change

TopK queries (ORDER BY col DESC/ASC LIMIT K) on parquet data have several inefficiencies:

Files and RGs are read in arbitrary order, not optimized for the sort direction
The dynamic filter threshold starts as lit(true), so early RGs are never pruned
All RGs are opened even when the top-K values are concentrated in a few RGs

What changes are included in this PR?

A chain of composable optimizations that minimize I/O for TopK queries:

1. Global file reorder (`FileSource::reorder_files`)

Sort files in the shared work queue by column statistics. DESC: highest min first; ASC: lowest max first. Works for ALL TopK via DynamicFilterPhysicalExpr.sort_options. Bails fast when sort column not in file schema (GROUP BY + ORDER BY).

2. RG reorder within file (`reorder_by_statistics`)

Reorder row groups by min values (ASC). Works for all TopK via DynamicFilter sort_options (with file schema check). Combined with reverse for DESC queries.

3. TopK threshold init from statistics (`try_init_topk_threshold`)

Before reading data, compute threshold from RG min/max stats. Runs BEFORE PruningPredicate build so the threshold is compiled into the predicate. Uses GtEq/LtEq to include boundary values. Null-aware filter for NULLS FIRST. Uses df.fetch() (TopK K value) so stats init skips when K spans multiple RGs. Restricted to sort pushdown + no WHERE (pure DynamicFilter predicate).

4. Cumulative RG pruning (`truncate_row_groups`)

After reorder + reverse, accumulate rows from the front until >= K, prune the rest. For non-sort-pushdown TopK, guarded by a non-overlap check (max(i) <= min(i+1)). Only when predicate is pure DynamicFilter (no WHERE).

5. Compose reorder + reverse

Sequential steps instead of mutually exclusive. Reverse only triggers when reorder succeeds (sort column found in file schema).

How they work together

File reorder (best file first in shared queue)
  → RG reorder (best RG first within file)
    → Reverse (flip for DESC)
      → Stats init (threshold from RG stats → PruningPredicate)
        → RG pruning (60 of 61 RGs skipped, zero I/O!)
          → Cumulative prune (confirm enough rows for K)
            → Read only 1 RG

Coverage matrix

Scenario	File reorder	RG reorder	Reverse	Stats init	Cumulative prune
Non-overlapping + no WHERE	✅	✅	✅	✅ (17-60x)	✅
Non-overlapping + WHERE	✅	✅	✅	❌	❌
Overlapping RGs	✅	✅	✅	❌	❌
Sort column not in parquet	❌ fast bail	❌ fast bail	❌	❌	❌

Local benchmark (single file, 61 sorted RGs, DESC LIMIT, 1 partition)

Query	Baseline	With optimizations	Speedup
Q1 (DESC LIMIT 100)	28.48 ms	1.64 ms	17.4x
Q2 (DESC LIMIT 1000)	22.24 ms	0.37 ms	60.1x
Q3 (SELECT * LIMIT 100)	22.51 ms	0.66 ms	34.1x
Q4 (SELECT * LIMIT 1000)	22.37 ms	0.61 ms	36.7x

Key bug fix: `SortExec.fetch` ordering

create_filter() was called before new_sort.fetch was set, so DynamicFilterPhysicalExpr.fetch was always 0. Fixed by setting fetch before creating the filter.

Changes to `DynamicFilterPhysicalExpr`

sort_options: Option<Vec<SortOptions>> — sort direction for each child
fetch: Option<usize> — TopK K value for cumulative pruning
new_with_sort_options() constructor, sort_options() and fetch() getters
Set by SortExec::create_filter() for all TopK queries

Are these changes tested?

110 unit tests in datafusion-datasource-parquet (all pass)
SLT tests: sort_pushdown.slt (Tests H/I/J/K), push_down_filter_parquet.slt, explain_analyze.slt, topk.slt (3 files)
Fuzz test: test_fuzz_topk_filter_pushdown — updated with tiebreaker columns for deterministic ORDER BY
ClickBench: no regression (fast bail for GROUP BY + ORDER BY queries)

Are there any user-facing changes?

No. Transparent optimization — same results, faster TopK on parquet with statistics.

zhuqi-lucas · 2026-04-13T06:42:46Z

cc @alamb @adriangb

Dandandan · 2026-04-13T06:44:38Z

run benchmarks

adriangbot · 2026-04-13T06:46:34Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234363943-1136-k2bb2 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (a013bf6) to 29c5dd5 (merge-base) diff using: tpch
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-13T06:46:47Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234363943-1134-pcn8f 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (a013bf6) to 29c5dd5 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-13T06:47:16Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234363943-1135-k9dwc 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (a013bf6) to 29c5dd5 (merge-base) diff using: tpcds
Results will be posted here when complete

File an issue against this benchmark runner

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR improves TopK performance for Parquet scans when sort pushdown is Inexact by enabling row-group reordering based on statistics, so likely “best” row groups are read earlier and dynamic filters can tighten sooner.

Changes:

Thread an optional LexOrdering from ParquetSource::try_pushdown_sort through ParquetMorselizer to the access-plan preparation step.
Add PreparedAccessPlan::reorder_by_statistics to reorder row_group_indexes using Parquet statistics.
Add unit tests covering reorder/skip behavior for multiple edge cases.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
datafusion/datasource-parquet/src/source.rs	Plumbs sort ordering into the file source for later row-group reordering.
datafusion/datasource-parquet/src/opener.rs	Carries optional sort ordering into the opener and applies `reorder_by_statistics` during plan preparation.
datafusion/datasource-parquet/src/access_plan.rs	Implements row-group reordering by statistics and adds focused unit tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-13T06:47:20Z

+        let sort_order = LexOrdering::new(order.iter().cloned());
+        let mut new_source = self.clone().with_reverse_row_groups(true);
+        new_source.sort_order_for_reorder = sort_order;


LexOrdering::new(...) appears to return a Result<LexOrdering, _> (as used with .unwrap() in the new unit tests), but here it’s assigned directly without ?/unwrap, and then assigned to sort_order_for_reorder: Option<LexOrdering> without wrapping in Some(...). This should be changed to construct a LexOrdering with error propagation and store it as Some(sort_order) (or skip setting the field on error). Otherwise this won’t compile.

Suggested change

let sort_order = LexOrdering::new(order.iter().cloned());

let mut new_source = self.clone().with_reverse_row_groups(true);

new_source.sort_order_for_reorder = sort_order;

let sort_order = LexOrdering::new(order.iter().cloned())?;

let mut new_source = self.clone().with_reverse_row_groups(true);

new_source.sort_order_for_reorder = Some(sort_order);

Copilot · 2026-04-13T06:47:21Z

+        // LexOrdering is guaranteed non-empty, so first() returns &PhysicalSortExpr
+        let first_sort_expr = sort_order.first();


sort_order.first() (if LexOrdering is Vec-like) returns Option<&PhysicalSortExpr>, but the code uses it as if it were &PhysicalSortExpr (first_sort_expr.expr...). This is likely a compile error. A concrete fix is to obtain the first element via iteration and handle the empty case (e.g., early-return Ok(self) if no sort expressions), then use the returned &PhysicalSortExpr.

Suggested change

// LexOrdering is guaranteed non-empty, so first() returns &PhysicalSortExpr

let first_sort_expr = sort_order.first();

let first_sort_expr = match sort_order.iter().next() {

Some(expr) => expr,

None => {

debug!("Skipping RG reorder: empty sort order");

return Ok(self);

}

};

Copilot · 2026-04-13T06:47:21Z

+            }
+        };
+
+        let descending = first_sort_expr.options.descending;


For DESC ordering, reordering by min values is often a poor proxy for “row group likely contains the largest values first”; typically you want to sort by max when descending == true (and by min when ascending). This can significantly reduce the intended TopK benefit (and can even choose a worse first row group when ranges overlap). Consider switching to row_group_maxs(...) for descending order, and update the doc comment (currently mentions “min/max”) and the DESC unit test accordingly.

Copilot · 2026-04-13T06:47:22Z

+        // Get min values for the selected row groups
+        let rg_metadata: Vec<&RowGroupMetaData> = self
+            .row_group_indexes
+            .iter()
+            .map(|&idx| file_metadata.row_group(idx))
+            .collect();
+
+        let min_values = match converter.row_group_mins(rg_metadata.iter().copied()) {
+            Ok(vals) => vals,
+            Err(e) => {
+                debug!("Skipping RG reorder: cannot get min values: {e}");
+                return Ok(self);
+            }
+        };
+
+        // Sort indices by min values
+        let sort_options = arrow::compute::SortOptions {
+            descending,
+            nulls_first: first_sort_expr.options.nulls_first,
+        };
+        let sorted_indices = match arrow::compute::sort_to_indices(
+            &min_values,


For DESC ordering, reordering by min values is often a poor proxy for “row group likely contains the largest values first”; typically you want to sort by max when descending == true (and by min when ascending). This can significantly reduce the intended TopK benefit (and can even choose a worse first row group when ranges overlap). Consider switching to row_group_maxs(...) for descending order, and update the doc comment (currently mentions “min/max”) and the DESC unit test accordingly.

Suggested change

// Get min values for the selected row groups

let rg_metadata: Vec<&RowGroupMetaData> = self

.row_group_indexes

.iter()

.map(|&idx| file_metadata.row_group(idx))

.collect();

let min_values = match converter.row_group_mins(rg_metadata.iter().copied()) {

Ok(vals) => vals,

Err(e) => {

debug!("Skipping RG reorder: cannot get min values: {e}");

return Ok(self);

}

};

// Sort indices by min values

let sort_options = arrow::compute::SortOptions {

descending,

nulls_first: first_sort_expr.options.nulls_first,

};

let sorted_indices = match arrow::compute::sort_to_indices(

&min_values,

// Get values for the selected row groups: mins for ASC, maxs for DESC

let rg_metadata: Vec<&RowGroupMetaData> = self

.row_group_indexes

.iter()

.map(|&idx| file_metadata.row_group(idx))

.collect();

let sort_values = match if descending {

converter.row_group_maxs(rg_metadata.iter().copied())

} else {

converter.row_group_mins(rg_metadata.iter().copied())

} {

Ok(vals) => vals,

Err(e) => {

debug!("Skipping RG reorder: cannot get min/max values: {e}");

return Ok(self);

}

};

// Sort indices by the statistics that best match the requested order

let sort_options = arrow::compute::SortOptions {

descending,

nulls_first: first_sort_expr.options.nulls_first,

};

let sorted_indices = match arrow::compute::sort_to_indices(

&sort_values,

This is a good point

Yes, this is a good point.

Copilot · 2026-04-13T06:47:22Z

+        let sorted_indices = match arrow::compute::sort_to_indices(
+            &min_values,
+            Some(sort_options),
+            None,
+        ) {


If multiple row groups share the same min (or max) statistic, sort_to_indices may not guarantee a deterministic/stable tie-breaker across platforms/versions. Since row-group order can affect scan reproducibility and performance debugging, consider adding a stable secondary key (e.g., original row group index) when statistics are equal.

Dandandan · 2026-04-13T06:52:16Z

+    /// - 0 or 1 row groups (nothing to reorder)
+    /// - Sort expression is not a simple column reference
+    /// - Statistics are unavailable
+    pub(crate) fn reorder_by_statistics(


I think @adriangb had the great idea to also order by grouping keys which can

reduce cardinality within partitions (partition-local state can be smaller)

allow for better cache locality (row groups with more equal keys are grouped together)

Doesn't have to be in this PR but perhaps we can think about how it fits in.

Thanks @Dandandan for review! That's a great extension. The reorder_by_statistics method is generic enough to take any LexOrdering — it doesn't need to be tied to TopK specifically. So extending this for GROUP BY should be a matter of:

Computing a preferred RG ordering from grouping keys in the aggregate planner

Passing it through to ParquetSource::sort_order_for_reorder

Happy to track this as a follow-up issue. Will open one after this PR lands.

Thanks @Dandandan! Created #21581 to track this. The existing infrastructure from this PR should be directly reusable — mainly needs the aggregate planner to populate sort_order_for_reorder from grouping keys.

Dandandan · 2026-04-13T06:57:24Z

run benchmarks

env:
    PUSHDOWN_FILTERS: true
    REORDER_FILTERS: true

adriangbot · 2026-04-13T06:59:28Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234422910-1137-vvpxc 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (5018882) to 29c5dd5 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-13T06:59:39Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234422910-1138-qjxtt 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (5018882) to 29c5dd5 (merge-base) diff using: tpcds
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-13T06:59:40Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234422910-1139-wpv6n 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (5018882) to 29c5dd5 (merge-base) diff using: tpch
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-13T07:05:19Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃         feat_reorder-row-groups-by-stats ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │              6.76 / 7.21 ±0.73 / 8.66 ms │              6.66 / 7.05 ±0.75 / 8.56 ms │    no change │
│ QQuery 2  │        145.75 / 146.85 ±1.06 / 148.46 ms │        144.70 / 146.59 ±1.26 / 148.61 ms │    no change │
│ QQuery 3  │        114.48 / 115.73 ±0.96 / 117.37 ms │        113.70 / 114.51 ±0.53 / 115.15 ms │    no change │
│ QQuery 4  │    1336.62 / 1383.93 ±28.36 / 1413.36 ms │    1341.10 / 1369.70 ±25.80 / 1400.67 ms │    no change │
│ QQuery 5  │        172.96 / 173.76 ±0.93 / 175.52 ms │        172.87 / 173.71 ±1.06 / 175.76 ms │    no change │
│ QQuery 6  │       831.80 / 869.64 ±22.94 / 893.03 ms │       817.74 / 885.67 ±36.62 / 924.77 ms │    no change │
│ QQuery 7  │        343.27 / 346.44 ±2.73 / 351.44 ms │        343.10 / 347.19 ±4.26 / 354.07 ms │    no change │
│ QQuery 8  │        115.88 / 117.19 ±0.88 / 118.45 ms │        115.12 / 116.82 ±1.09 / 118.14 ms │    no change │
│ QQuery 9  │        101.81 / 105.19 ±5.49 / 116.11 ms │       100.38 / 109.94 ±10.62 / 129.61 ms │    no change │
│ QQuery 10 │        105.57 / 107.18 ±0.90 / 108.19 ms │        106.58 / 108.12 ±1.52 / 110.94 ms │    no change │
│ QQuery 11 │        951.72 / 963.82 ±6.42 / 968.96 ms │        964.70 / 976.63 ±7.17 / 986.01 ms │    no change │
│ QQuery 12 │           44.04 / 47.17 ±1.95 / 49.81 ms │           46.18 / 47.24 ±0.73 / 48.28 ms │    no change │
│ QQuery 13 │        403.35 / 405.32 ±1.43 / 407.21 ms │        403.48 / 405.90 ±1.57 / 407.89 ms │    no change │
│ QQuery 14 │     1009.53 / 1015.83 ±5.78 / 1026.74 ms │    1002.02 / 1020.87 ±15.11 / 1047.87 ms │    no change │
│ QQuery 15 │           16.12 / 17.91 ±1.76 / 20.55 ms │           17.16 / 19.05 ±1.98 / 22.65 ms │ 1.06x slower │
│ QQuery 16 │              7.31 / 7.59 ±0.20 / 7.91 ms │              7.88 / 8.59 ±0.73 / 9.70 ms │ 1.13x slower │
│ QQuery 17 │        229.47 / 231.19 ±1.44 / 233.08 ms │        240.36 / 243.29 ±2.25 / 246.13 ms │ 1.05x slower │
│ QQuery 18 │        126.61 / 128.79 ±1.68 / 131.55 ms │        134.44 / 135.44 ±0.69 / 136.56 ms │ 1.05x slower │
│ QQuery 19 │        156.40 / 157.51 ±0.85 / 159.00 ms │        162.88 / 164.97 ±1.38 / 166.21 ms │    no change │
│ QQuery 20 │           13.75 / 14.66 ±0.67 / 15.79 ms │           15.45 / 15.77 ±0.24 / 15.99 ms │ 1.08x slower │
│ QQuery 21 │           19.51 / 20.16 ±0.51 / 20.76 ms │           21.05 / 21.45 ±0.35 / 22.06 ms │ 1.06x slower │
│ QQuery 22 │        481.56 / 489.10 ±4.25 / 493.73 ms │        491.92 / 498.86 ±8.92 / 516.06 ms │    no change │
│ QQuery 23 │        873.33 / 884.81 ±6.37 / 892.89 ms │       887.79 / 896.87 ±10.62 / 916.66 ms │    no change │
│ QQuery 24 │        381.97 / 385.40 ±3.50 / 391.57 ms │        382.98 / 387.53 ±2.66 / 390.54 ms │    no change │
│ QQuery 25 │        340.86 / 344.45 ±2.19 / 346.83 ms │        340.45 / 343.50 ±1.73 / 345.12 ms │    no change │
│ QQuery 26 │           80.84 / 81.79 ±0.82 / 83.32 ms │           81.27 / 82.66 ±0.83 / 83.86 ms │    no change │
│ QQuery 27 │              6.88 / 7.52 ±0.74 / 8.95 ms │              6.87 / 7.23 ±0.48 / 8.16 ms │    no change │
│ QQuery 28 │        148.59 / 150.91 ±1.93 / 153.50 ms │        148.77 / 150.19 ±1.12 / 152.08 ms │    no change │
│ QQuery 29 │        282.92 / 285.15 ±1.50 / 287.45 ms │        283.28 / 284.50 ±1.29 / 286.63 ms │    no change │
│ QQuery 30 │           43.82 / 45.47 ±1.49 / 48.04 ms │           42.04 / 44.32 ±1.23 / 45.71 ms │    no change │
│ QQuery 31 │        169.84 / 172.47 ±2.27 / 176.48 ms │        169.29 / 171.30 ±1.68 / 173.99 ms │    no change │
│ QQuery 32 │           57.32 / 58.30 ±0.57 / 59.04 ms │           57.74 / 58.68 ±0.98 / 60.05 ms │    no change │
│ QQuery 33 │        140.09 / 143.20 ±1.85 / 145.61 ms │        141.03 / 142.63 ±0.92 / 143.86 ms │    no change │
│ QQuery 34 │              6.96 / 7.30 ±0.30 / 7.84 ms │              7.00 / 7.23 ±0.26 / 7.61 ms │    no change │
│ QQuery 35 │        107.83 / 109.50 ±1.10 / 110.70 ms │        106.64 / 109.99 ±2.04 / 112.55 ms │    no change │
│ QQuery 36 │              6.51 / 6.99 ±0.48 / 7.88 ms │              6.53 / 6.71 ±0.20 / 7.07 ms │    no change │
│ QQuery 37 │             8.73 / 9.25 ±0.67 / 10.56 ms │              8.21 / 8.81 ±0.46 / 9.57 ms │    no change │
│ QQuery 38 │           85.23 / 88.80 ±4.94 / 98.57 ms │           84.88 / 87.11 ±3.72 / 94.52 ms │    no change │
│ QQuery 39 │        125.77 / 129.50 ±4.44 / 137.69 ms │        127.22 / 128.59 ±1.12 / 130.62 ms │    no change │
│ QQuery 40 │        111.88 / 117.77 ±5.93 / 128.78 ms │        110.11 / 117.40 ±8.84 / 134.20 ms │    no change │
│ QQuery 41 │           15.66 / 16.18 ±0.59 / 17.31 ms │           14.86 / 16.06 ±1.18 / 18.18 ms │    no change │
│ QQuery 42 │        108.25 / 110.15 ±1.58 / 112.51 ms │        107.58 / 109.71 ±1.45 / 111.73 ms │    no change │
│ QQuery 43 │              5.98 / 6.31 ±0.27 / 6.73 ms │              6.03 / 6.53 ±0.80 / 8.12 ms │    no change │
│ QQuery 44 │           11.71 / 12.19 ±0.68 / 13.53 ms │           11.79 / 12.14 ±0.20 / 12.35 ms │    no change │
│ QQuery 45 │           51.06 / 52.40 ±0.78 / 53.28 ms │           50.16 / 52.05 ±1.38 / 54.23 ms │    no change │
│ QQuery 46 │              8.65 / 8.89 ±0.17 / 9.16 ms │              8.63 / 8.88 ±0.20 / 9.11 ms │    no change │
│ QQuery 47 │        710.73 / 722.40 ±6.30 / 729.86 ms │        733.32 / 745.39 ±6.92 / 754.00 ms │    no change │
│ QQuery 48 │        289.87 / 294.72 ±4.74 / 300.92 ms │        288.52 / 294.73 ±6.45 / 306.78 ms │    no change │
│ QQuery 49 │        251.64 / 252.97 ±1.41 / 255.48 ms │        255.34 / 255.97 ±0.56 / 256.78 ms │    no change │
│ QQuery 50 │        222.58 / 228.41 ±4.03 / 235.01 ms │        226.54 / 233.03 ±5.00 / 240.31 ms │    no change │
│ QQuery 51 │        180.59 / 184.08 ±2.83 / 187.39 ms │        183.67 / 186.36 ±1.96 / 188.77 ms │    no change │
│ QQuery 52 │        107.91 / 109.06 ±0.89 / 110.34 ms │        110.03 / 111.42 ±0.81 / 112.47 ms │    no change │
│ QQuery 53 │        104.18 / 105.27 ±1.15 / 107.27 ms │        105.42 / 106.80 ±1.39 / 109.23 ms │    no change │
│ QQuery 54 │        147.22 / 148.77 ±1.46 / 151.25 ms │        149.43 / 151.07 ±0.93 / 152.31 ms │    no change │
│ QQuery 55 │        108.14 / 109.84 ±1.73 / 112.49 ms │        108.46 / 110.09 ±1.84 / 113.65 ms │    no change │
│ QQuery 56 │        141.89 / 144.13 ±1.81 / 146.71 ms │        142.50 / 144.45 ±1.14 / 145.70 ms │    no change │
│ QQuery 57 │        172.45 / 174.79 ±1.90 / 177.95 ms │        176.55 / 178.66 ±1.32 / 180.38 ms │    no change │
│ QQuery 58 │        292.06 / 297.55 ±2.86 / 299.80 ms │        290.26 / 298.13 ±6.17 / 309.20 ms │    no change │
│ QQuery 59 │        199.68 / 202.81 ±4.20 / 210.76 ms │        196.18 / 201.51 ±2.94 / 205.01 ms │    no change │
│ QQuery 60 │        145.63 / 146.61 ±1.30 / 149.18 ms │        144.93 / 145.97 ±0.58 / 146.66 ms │    no change │
│ QQuery 61 │           13.09 / 13.42 ±0.22 / 13.74 ms │           13.26 / 13.48 ±0.20 / 13.79 ms │    no change │
│ QQuery 62 │      903.17 / 972.36 ±73.07 / 1110.97 ms │       893.58 / 940.09 ±28.77 / 975.15 ms │    no change │
│ QQuery 63 │        104.29 / 106.12 ±1.11 / 107.45 ms │        106.05 / 107.70 ±1.58 / 110.72 ms │    no change │
│ QQuery 64 │        684.26 / 697.36 ±8.94 / 709.76 ms │        684.46 / 694.14 ±8.98 / 708.42 ms │    no change │
│ QQuery 65 │        251.36 / 257.09 ±3.42 / 260.31 ms │        254.62 / 257.41 ±1.68 / 259.77 ms │    no change │
│ QQuery 66 │        239.53 / 250.02 ±7.11 / 258.78 ms │        242.28 / 255.27 ±8.91 / 265.85 ms │    no change │
│ QQuery 67 │        312.00 / 314.58 ±2.53 / 317.92 ms │        315.48 / 320.21 ±4.64 / 327.33 ms │    no change │
│ QQuery 68 │             8.86 / 9.83 ±1.00 / 11.23 ms │            8.85 / 10.18 ±0.80 / 10.96 ms │    no change │
│ QQuery 69 │        101.72 / 103.97 ±1.51 / 106.48 ms │        102.05 / 104.18 ±2.49 / 108.67 ms │    no change │
│ QQuery 70 │        349.78 / 359.97 ±8.05 / 369.60 ms │        341.41 / 352.18 ±7.53 / 362.53 ms │    no change │
│ QQuery 71 │        135.76 / 139.31 ±3.34 / 145.38 ms │        136.47 / 138.33 ±2.01 / 141.85 ms │    no change │
│ QQuery 72 │       610.09 / 628.55 ±11.61 / 639.76 ms │        618.47 / 625.85 ±7.96 / 638.13 ms │    no change │
│ QQuery 73 │              7.84 / 8.48 ±0.63 / 9.55 ms │             6.99 / 9.50 ±2.08 / 13.02 ms │ 1.12x slower │
│ QQuery 74 │        578.61 / 594.14 ±8.66 / 602.76 ms │        598.69 / 607.55 ±7.77 / 620.04 ms │    no change │
│ QQuery 75 │        276.80 / 279.31 ±2.06 / 282.54 ms │        280.18 / 283.49 ±2.96 / 288.22 ms │    no change │
│ QQuery 76 │        133.52 / 135.27 ±1.41 / 137.64 ms │        134.75 / 136.00 ±1.23 / 138.35 ms │    no change │
│ QQuery 77 │        187.89 / 190.27 ±1.66 / 192.45 ms │        188.51 / 190.82 ±2.30 / 194.65 ms │    no change │
│ QQuery 78 │        339.14 / 344.57 ±3.86 / 351.08 ms │        335.65 / 342.70 ±4.54 / 349.86 ms │    no change │
│ QQuery 79 │        235.13 / 237.76 ±1.41 / 239.31 ms │        235.50 / 239.71 ±2.39 / 242.04 ms │    no change │
│ QQuery 80 │        321.80 / 323.60 ±1.62 / 326.10 ms │        318.67 / 323.18 ±2.71 / 327.21 ms │    no change │
│ QQuery 81 │           26.78 / 27.93 ±1.53 / 30.94 ms │           26.92 / 27.55 ±0.60 / 28.44 ms │    no change │
│ QQuery 82 │        200.18 / 203.94 ±2.03 / 205.90 ms │        198.56 / 201.76 ±2.24 / 204.67 ms │    no change │
│ QQuery 83 │           38.87 / 40.48 ±1.80 / 43.37 ms │           38.94 / 40.31 ±0.99 / 41.65 ms │    no change │
│ QQuery 84 │           49.28 / 50.21 ±0.71 / 51.04 ms │           48.23 / 49.02 ±0.65 / 49.90 ms │    no change │
│ QQuery 85 │        146.57 / 149.44 ±1.63 / 151.12 ms │        149.68 / 151.20 ±0.91 / 152.20 ms │    no change │
│ QQuery 86 │           39.01 / 40.64 ±0.95 / 41.99 ms │           38.73 / 41.24 ±1.97 / 44.03 ms │    no change │
│ QQuery 87 │           88.77 / 90.74 ±2.50 / 95.56 ms │           87.04 / 89.26 ±2.80 / 94.74 ms │    no change │
│ QQuery 88 │        100.76 / 101.56 ±0.68 / 102.61 ms │        101.35 / 102.49 ±0.61 / 103.04 ms │    no change │
│ QQuery 89 │        118.94 / 121.61 ±1.96 / 124.28 ms │        120.01 / 120.84 ±0.76 / 121.77 ms │    no change │
│ QQuery 90 │           24.31 / 25.44 ±1.09 / 27.13 ms │           24.29 / 24.83 ±0.42 / 25.47 ms │    no change │
│ QQuery 91 │           60.69 / 63.74 ±1.69 / 65.64 ms │           62.38 / 65.54 ±1.79 / 67.28 ms │    no change │
│ QQuery 92 │           57.86 / 58.66 ±0.63 / 59.28 ms │           58.08 / 59.35 ±1.21 / 61.29 ms │    no change │
│ QQuery 93 │        187.91 / 189.87 ±2.11 / 193.21 ms │        187.01 / 189.06 ±1.65 / 192.02 ms │    no change │
│ QQuery 94 │           60.96 / 62.29 ±1.12 / 63.81 ms │           61.39 / 62.58 ±0.65 / 63.32 ms │    no change │
│ QQuery 95 │        129.33 / 129.84 ±0.28 / 130.12 ms │        129.13 / 130.31 ±1.16 / 132.24 ms │    no change │
│ QQuery 96 │           73.64 / 75.06 ±0.88 / 76.02 ms │           72.31 / 74.37 ±1.26 / 76.19 ms │    no change │
│ QQuery 97 │        124.76 / 127.56 ±2.34 / 131.18 ms │        124.99 / 127.31 ±1.47 / 128.75 ms │    no change │
│ QQuery 98 │        152.55 / 155.51 ±2.30 / 158.79 ms │        152.39 / 155.81 ±2.16 / 159.14 ms │    no change │
│ QQuery 99 │ 10799.15 / 10864.77 ±35.59 / 10898.37 ms │ 10810.76 / 10852.07 ±29.64 / 10887.35 ms │    no change │
└───────────┴──────────────────────────────────────────┴──────────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 31773.55ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 31858.42ms │
│ Average Time (HEAD)                             │   320.94ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   321.80ms │
│ Queries Faster                                  │          0 │
│ Queries Slower                                  │          7 │
│ Queries with No Change                          │         92 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric	Value
Wall time	159.2s
Peak memory	5.5 GiB
Avg memory	4.5 GiB
CPU user	262.4s
CPU sys	17.8s
Peak spill	0 B

tpcds — branch

Metric	Value
Wall time	159.6s
Peak memory	5.6 GiB
Avg memory	4.5 GiB
CPU user	263.9s
CPU sys	17.1s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-04-13T07:07:37Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃      feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.23 / 4.55 ±6.48 / 17.51 ms │          1.19 / 4.44 ±6.37 / 17.18 ms │     no change │
│ QQuery 1  │        14.40 / 14.69 ±0.23 / 14.92 ms │        14.23 / 14.59 ±0.20 / 14.82 ms │     no change │
│ QQuery 2  │        44.12 / 44.31 ±0.17 / 44.54 ms │        44.11 / 44.28 ±0.11 / 44.47 ms │     no change │
│ QQuery 3  │        41.87 / 44.88 ±2.83 / 48.21 ms │        45.19 / 46.05 ±0.92 / 47.82 ms │     no change │
│ QQuery 4  │     301.57 / 305.93 ±4.33 / 312.84 ms │     283.62 / 292.46 ±7.51 / 301.96 ms │     no change │
│ QQuery 5  │     343.76 / 349.26 ±2.90 / 351.52 ms │     340.90 / 346.09 ±3.65 / 350.98 ms │     no change │
│ QQuery 6  │          5.00 / 7.74 ±2.08 / 10.58 ms │          5.80 / 8.85 ±4.45 / 17.63 ms │  1.14x slower │
│ QQuery 7  │        16.79 / 17.42 ±0.40 / 17.97 ms │        16.80 / 16.96 ±0.18 / 17.30 ms │     no change │
│ QQuery 8  │     417.96 / 426.38 ±7.76 / 436.12 ms │     421.02 / 426.12 ±4.84 / 434.24 ms │     no change │
│ QQuery 9  │     666.36 / 676.75 ±8.41 / 686.88 ms │    655.03 / 663.43 ±10.04 / 682.29 ms │     no change │
│ QQuery 10 │        92.21 / 93.67 ±2.08 / 97.80 ms │       93.16 / 96.01 ±4.39 / 104.70 ms │     no change │
│ QQuery 11 │     104.40 / 105.92 ±1.09 / 107.50 ms │     103.33 / 108.14 ±4.01 / 115.55 ms │     no change │
│ QQuery 12 │     345.12 / 351.62 ±5.08 / 358.61 ms │     338.51 / 349.23 ±6.79 / 358.13 ms │     no change │
│ QQuery 13 │    454.95 / 466.86 ±13.39 / 492.97 ms │    459.79 / 482.10 ±32.75 / 546.44 ms │     no change │
│ QQuery 14 │     344.61 / 348.93 ±4.09 / 356.53 ms │     343.57 / 349.57 ±3.91 / 354.96 ms │     no change │
│ QQuery 15 │    354.08 / 376.26 ±20.59 / 412.84 ms │    353.05 / 376.01 ±22.93 / 414.75 ms │     no change │
│ QQuery 16 │    717.04 / 731.91 ±17.98 / 766.45 ms │    714.64 / 749.96 ±28.15 / 784.01 ms │     no change │
│ QQuery 17 │     711.73 / 718.92 ±4.04 / 723.31 ms │     713.17 / 717.82 ±5.10 / 727.41 ms │     no change │
│ QQuery 18 │ 1419.90 / 1476.97 ±45.70 / 1523.25 ms │  1361.04 / 1376.04 ±9.64 / 1390.97 ms │ +1.07x faster │
│ QQuery 19 │       35.97 / 46.26 ±19.56 / 85.37 ms │        35.78 / 38.31 ±1.93 / 41.70 ms │ +1.21x faster │
│ QQuery 20 │    712.30 / 733.16 ±16.60 / 755.99 ms │     707.03 / 714.65 ±8.59 / 731.42 ms │     no change │
│ QQuery 21 │     767.93 / 773.25 ±4.38 / 778.46 ms │     757.44 / 762.39 ±4.21 / 769.10 ms │     no change │
│ QQuery 22 │  1137.01 / 1149.77 ±8.70 / 1162.30 ms │  1134.94 / 1140.41 ±5.69 / 1150.59 ms │     no change │
│ QQuery 23 │ 3090.99 / 3109.19 ±13.77 / 3131.70 ms │ 3079.16 / 3106.21 ±14.77 / 3123.90 ms │     no change │
│ QQuery 24 │     100.24 / 103.67 ±2.55 / 106.89 ms │     100.11 / 102.94 ±1.70 / 105.16 ms │     no change │
│ QQuery 25 │     139.49 / 141.35 ±1.43 / 143.90 ms │     137.95 / 141.81 ±2.79 / 146.51 ms │     no change │
│ QQuery 26 │      98.88 / 101.65 ±1.99 / 104.34 ms │      98.97 / 103.22 ±2.21 / 104.74 ms │     no change │
│ QQuery 27 │     852.53 / 858.70 ±9.74 / 878.11 ms │     846.79 / 851.11 ±4.26 / 857.73 ms │     no change │
│ QQuery 28 │ 3273.16 / 3306.19 ±16.95 / 3319.21 ms │ 3289.86 / 3315.39 ±20.65 / 3344.13 ms │     no change │
│ QQuery 29 │        50.27 / 54.97 ±4.49 / 62.93 ms │        50.24 / 56.60 ±5.57 / 65.85 ms │     no change │
│ QQuery 30 │     361.99 / 367.45 ±5.71 / 374.86 ms │     354.82 / 363.42 ±7.55 / 376.71 ms │     no change │
│ QQuery 31 │     354.41 / 371.28 ±9.10 / 378.15 ms │    361.59 / 378.76 ±12.41 / 394.38 ms │     no change │
│ QQuery 32 │ 1214.59 / 1260.10 ±34.96 / 1305.41 ms │ 1041.72 / 1056.56 ±15.17 / 1084.81 ms │ +1.19x faster │
│ QQuery 33 │ 1515.52 / 1570.85 ±38.41 / 1634.04 ms │  1469.34 / 1474.38 ±7.24 / 1488.32 ms │ +1.07x faster │
│ QQuery 34 │ 1485.86 / 1532.37 ±26.82 / 1565.04 ms │  1477.09 / 1487.24 ±7.28 / 1496.50 ms │     no change │
│ QQuery 35 │    393.36 / 426.17 ±54.55 / 534.85 ms │     391.43 / 401.95 ±7.89 / 411.93 ms │ +1.06x faster │
│ QQuery 36 │     115.02 / 120.80 ±3.82 / 125.19 ms │     118.07 / 122.65 ±3.64 / 128.83 ms │     no change │
│ QQuery 37 │        49.52 / 51.49 ±1.92 / 55.07 ms │        47.48 / 50.62 ±1.83 / 52.79 ms │     no change │
│ QQuery 38 │        74.07 / 76.49 ±1.25 / 77.52 ms │        75.14 / 77.58 ±1.56 / 79.76 ms │     no change │
│ QQuery 39 │     209.85 / 215.78 ±4.18 / 220.73 ms │     203.18 / 218.92 ±8.79 / 228.06 ms │     no change │
│ QQuery 40 │        24.46 / 25.99 ±1.19 / 27.44 ms │        21.42 / 23.66 ±1.70 / 26.54 ms │ +1.10x faster │
│ QQuery 41 │        20.66 / 22.64 ±2.61 / 27.69 ms │        19.87 / 21.02 ±1.09 / 22.36 ms │ +1.08x faster │
│ QQuery 42 │        19.06 / 19.93 ±0.46 / 20.34 ms │        19.08 / 20.03 ±0.64 / 21.10 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 23002.46ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 22498.00ms │
│ Average Time (HEAD)                             │   534.94ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   523.21ms │
│ Queries Faster                                  │          7 │
│ Queries Slower                                  │          1 │
│ Queries with No Change                          │         35 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	115.8s
Peak memory	36.4 GiB
Avg memory	27.0 GiB
CPU user	1079.4s
CPU sys	98.2s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	113.5s
Peak memory	40.1 GiB
Avg memory	33.2 GiB
CPU user	1075.7s
CPU sys	81.0s
Peak spill	0 B

File an issue against this benchmark runner

Dandandan · 2026-04-13T07:08:52Z

run benchmark clickbench_partitioned clickbench_extended

adriangbot · 2026-04-13T07:09:10Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234477729-1140-pwvsf 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (5018882) to 29c5dd5 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-13T07:10:54Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234477729-1141-9x5wm 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (5018882) to 29c5dd5 (merge-base) diff using: clickbench_extended
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-13T07:13:29Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃         feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │              6.60 / 7.07 ±0.83 / 8.73 ms │              6.60 / 7.06 ±0.83 / 8.71 ms │     no change │
│ QQuery 2  │        143.88 / 144.99 ±1.22 / 147.28 ms │        145.49 / 146.39 ±0.75 / 147.36 ms │     no change │
│ QQuery 3  │        114.20 / 115.49 ±1.27 / 117.17 ms │        113.54 / 114.31 ±0.75 / 115.71 ms │     no change │
│ QQuery 4  │    1407.17 / 1446.02 ±28.01 / 1488.27 ms │    1346.64 / 1364.53 ±15.80 / 1393.25 ms │ +1.06x faster │
│ QQuery 5  │        171.78 / 174.25 ±2.64 / 179.25 ms │        172.57 / 174.59 ±1.43 / 176.25 ms │     no change │
│ QQuery 6  │       849.48 / 877.16 ±22.87 / 905.49 ms │       828.07 / 868.05 ±30.14 / 901.99 ms │     no change │
│ QQuery 7  │        343.32 / 346.59 ±2.93 / 350.49 ms │        341.03 / 345.45 ±3.39 / 351.37 ms │     no change │
│ QQuery 8  │        117.21 / 119.39 ±1.66 / 121.19 ms │        117.29 / 118.25 ±1.00 / 119.81 ms │     no change │
│ QQuery 9  │        101.98 / 105.46 ±2.06 / 107.89 ms │        101.30 / 103.91 ±2.47 / 107.10 ms │     no change │
│ QQuery 10 │        105.17 / 106.91 ±1.15 / 108.76 ms │        104.99 / 106.26 ±0.68 / 106.86 ms │     no change │
│ QQuery 11 │       950.10 / 964.75 ±15.56 / 992.71 ms │       952.07 / 966.13 ±10.18 / 977.82 ms │     no change │
│ QQuery 12 │           49.17 / 50.87 ±1.79 / 53.14 ms │           44.24 / 45.58 ±1.32 / 48.07 ms │ +1.12x faster │
│ QQuery 13 │        400.79 / 408.52 ±5.63 / 417.80 ms │        401.25 / 405.70 ±3.28 / 410.32 ms │     no change │
│ QQuery 14 │     1004.14 / 1009.46 ±3.40 / 1013.29 ms │     1004.37 / 1006.86 ±1.62 / 1008.42 ms │     no change │
│ QQuery 15 │           15.60 / 16.29 ±0.72 / 17.64 ms │           15.34 / 16.91 ±1.14 / 18.64 ms │     no change │
│ QQuery 16 │              7.29 / 7.58 ±0.23 / 7.82 ms │              7.31 / 7.77 ±0.29 / 8.12 ms │     no change │
│ QQuery 17 │        229.09 / 231.19 ±1.64 / 233.94 ms │        227.78 / 229.37 ±1.34 / 231.54 ms │     no change │
│ QQuery 18 │        129.42 / 129.76 ±0.34 / 130.31 ms │        126.89 / 128.72 ±1.18 / 130.47 ms │     no change │
│ QQuery 19 │        154.96 / 157.06 ±1.46 / 158.84 ms │        155.11 / 156.36 ±1.00 / 157.47 ms │     no change │
│ QQuery 20 │           13.40 / 14.08 ±0.44 / 14.76 ms │           13.71 / 14.26 ±0.30 / 14.57 ms │     no change │
│ QQuery 21 │           18.94 / 19.61 ±0.35 / 19.91 ms │           19.53 / 19.89 ±0.31 / 20.34 ms │     no change │
│ QQuery 22 │        486.41 / 490.25 ±2.46 / 492.89 ms │        485.59 / 488.57 ±2.14 / 491.78 ms │     no change │
│ QQuery 23 │        881.75 / 888.47 ±6.90 / 897.22 ms │        874.89 / 884.92 ±8.92 / 901.54 ms │     no change │
│ QQuery 24 │        382.00 / 384.91 ±2.95 / 389.94 ms │        381.40 / 383.88 ±3.14 / 389.86 ms │     no change │
│ QQuery 25 │        340.39 / 342.38 ±1.35 / 343.76 ms │        336.89 / 340.18 ±2.75 / 343.84 ms │     no change │
│ QQuery 26 │           82.02 / 82.93 ±0.75 / 84.22 ms │           81.69 / 83.69 ±2.44 / 87.06 ms │     no change │
│ QQuery 27 │              7.14 / 7.67 ±0.78 / 9.20 ms │              6.75 / 6.99 ±0.29 / 7.51 ms │ +1.10x faster │
│ QQuery 28 │        148.11 / 151.11 ±2.46 / 155.58 ms │        148.59 / 150.08 ±0.99 / 151.50 ms │     no change │
│ QQuery 29 │        280.02 / 283.14 ±1.85 / 285.10 ms │        278.47 / 282.20 ±2.15 / 284.39 ms │     no change │
│ QQuery 30 │           43.46 / 46.42 ±1.97 / 48.60 ms │           43.38 / 45.04 ±1.56 / 47.82 ms │     no change │
│ QQuery 31 │        169.78 / 171.51 ±1.02 / 172.58 ms │        171.26 / 173.61 ±1.70 / 175.62 ms │     no change │
│ QQuery 32 │           56.82 / 58.73 ±1.23 / 60.51 ms │           57.19 / 57.73 ±0.63 / 58.94 ms │     no change │
│ QQuery 33 │        141.79 / 142.90 ±0.89 / 144.49 ms │        140.06 / 142.63 ±2.83 / 147.88 ms │     no change │
│ QQuery 34 │              7.10 / 7.27 ±0.16 / 7.54 ms │             7.31 / 8.11 ±1.00 / 10.04 ms │  1.12x slower │
│ QQuery 35 │        105.24 / 108.18 ±1.55 / 109.74 ms │        113.10 / 114.31 ±1.13 / 115.81 ms │  1.06x slower │
│ QQuery 36 │              6.52 / 6.61 ±0.11 / 6.82 ms │              6.69 / 7.12 ±0.48 / 8.04 ms │  1.08x slower │
│ QQuery 37 │             8.66 / 9.39 ±0.80 / 10.84 ms │             8.66 / 9.51 ±0.66 / 10.70 ms │     no change │
│ QQuery 38 │           86.45 / 88.58 ±2.96 / 94.37 ms │           87.34 / 90.25 ±4.29 / 98.75 ms │     no change │
│ QQuery 39 │        125.56 / 128.65 ±2.68 / 132.66 ms │        126.42 / 130.74 ±3.44 / 136.60 ms │     no change │
│ QQuery 40 │        108.75 / 116.53 ±6.97 / 129.42 ms │        120.88 / 127.63 ±9.32 / 145.94 ms │  1.10x slower │
│ QQuery 41 │           14.34 / 15.28 ±0.58 / 16.07 ms │           14.30 / 15.82 ±1.19 / 17.47 ms │     no change │
│ QQuery 42 │        108.24 / 109.86 ±1.55 / 112.63 ms │        108.34 / 109.85 ±0.93 / 110.82 ms │     no change │
│ QQuery 43 │              6.00 / 6.12 ±0.12 / 6.31 ms │              5.93 / 6.03 ±0.12 / 6.27 ms │     no change │
│ QQuery 44 │           11.93 / 12.85 ±0.98 / 14.29 ms │           11.79 / 12.23 ±0.34 / 12.81 ms │     no change │
│ QQuery 45 │           51.59 / 52.20 ±0.71 / 53.58 ms │           50.50 / 51.48 ±0.80 / 52.59 ms │     no change │
│ QQuery 46 │              8.37 / 8.86 ±0.32 / 9.30 ms │              8.22 / 8.55 ±0.21 / 8.79 ms │     no change │
│ QQuery 47 │        730.15 / 735.98 ±6.90 / 748.40 ms │        705.59 / 712.82 ±4.86 / 720.66 ms │     no change │
│ QQuery 48 │        293.14 / 296.48 ±3.12 / 301.21 ms │        294.01 / 296.74 ±2.30 / 300.54 ms │     no change │
│ QQuery 49 │        250.28 / 253.44 ±3.22 / 259.53 ms │        251.81 / 253.12 ±1.05 / 254.43 ms │     no change │
│ QQuery 50 │        226.01 / 230.32 ±4.01 / 235.24 ms │        220.67 / 223.64 ±2.76 / 228.09 ms │     no change │
│ QQuery 51 │        183.04 / 185.25 ±2.09 / 189.07 ms │        178.31 / 181.98 ±1.95 / 184.09 ms │     no change │
│ QQuery 52 │        107.65 / 110.58 ±3.03 / 116.28 ms │        108.42 / 110.26 ±2.22 / 114.63 ms │     no change │
│ QQuery 53 │        102.87 / 103.59 ±0.90 / 105.24 ms │        103.27 / 104.20 ±1.01 / 106.01 ms │     no change │
│ QQuery 54 │        144.26 / 147.65 ±2.00 / 150.36 ms │        145.75 / 148.22 ±2.27 / 152.02 ms │     no change │
│ QQuery 55 │        107.20 / 108.13 ±0.76 / 109.28 ms │        107.44 / 109.68 ±1.38 / 111.81 ms │     no change │
│ QQuery 56 │        141.05 / 142.32 ±1.01 / 144.15 ms │        140.48 / 142.52 ±1.42 / 144.84 ms │     no change │
│ QQuery 57 │        172.82 / 175.12 ±1.39 / 176.89 ms │        174.64 / 176.19 ±1.47 / 178.51 ms │     no change │
│ QQuery 58 │        286.62 / 296.24 ±6.87 / 305.53 ms │       285.31 / 298.28 ±13.20 / 317.51 ms │     no change │
│ QQuery 59 │        199.23 / 200.95 ±1.69 / 204.20 ms │        195.69 / 199.36 ±3.05 / 203.59 ms │     no change │
│ QQuery 60 │        144.67 / 145.48 ±0.66 / 146.41 ms │        142.34 / 143.44 ±1.31 / 145.79 ms │     no change │
│ QQuery 61 │           12.99 / 13.45 ±0.35 / 13.95 ms │           12.73 / 13.06 ±0.22 / 13.34 ms │     no change │
│ QQuery 62 │       904.73 / 932.43 ±16.55 / 947.84 ms │       901.55 / 934.20 ±25.10 / 966.87 ms │     no change │
│ QQuery 63 │        103.15 / 106.72 ±3.02 / 110.78 ms │        103.85 / 105.22 ±1.02 / 106.83 ms │     no change │
│ QQuery 64 │        683.07 / 685.79 ±2.81 / 690.87 ms │        680.75 / 687.10 ±3.46 / 690.59 ms │     no change │
│ QQuery 65 │        246.22 / 253.56 ±4.22 / 258.12 ms │        252.05 / 256.03 ±3.55 / 262.20 ms │     no change │
│ QQuery 66 │       234.63 / 253.48 ±10.76 / 265.83 ms │        247.60 / 256.44 ±7.16 / 265.72 ms │     no change │
│ QQuery 67 │        307.25 / 316.77 ±5.71 / 323.28 ms │       319.99 / 334.45 ±14.63 / 357.79 ms │  1.06x slower │
│ QQuery 68 │           10.40 / 11.74 ±1.30 / 14.02 ms │            9.81 / 10.88 ±0.79 / 12.24 ms │ +1.08x faster │
│ QQuery 69 │        100.32 / 103.93 ±2.11 / 106.32 ms │        102.81 / 105.31 ±1.32 / 106.40 ms │     no change │
│ QQuery 70 │       342.77 / 354.40 ±11.96 / 373.37 ms │        337.23 / 344.94 ±6.28 / 351.91 ms │     no change │
│ QQuery 71 │        134.41 / 137.03 ±1.43 / 138.73 ms │        136.55 / 137.88 ±1.11 / 139.85 ms │     no change │
│ QQuery 72 │        611.97 / 618.14 ±5.14 / 627.10 ms │       605.50 / 623.82 ±12.23 / 637.66 ms │     no change │
│ QQuery 73 │              7.45 / 8.17 ±0.58 / 9.07 ms │             7.32 / 8.36 ±1.07 / 10.11 ms │     no change │
│ QQuery 74 │        581.34 / 592.24 ±8.46 / 606.84 ms │        574.83 / 587.08 ±9.45 / 597.06 ms │     no change │
│ QQuery 75 │        277.59 / 280.04 ±2.61 / 285.00 ms │        275.81 / 279.40 ±2.65 / 283.37 ms │     no change │
│ QQuery 76 │        131.53 / 133.57 ±1.57 / 136.07 ms │        131.98 / 133.99 ±1.18 / 135.67 ms │     no change │
│ QQuery 77 │        188.69 / 190.76 ±1.26 / 192.15 ms │        189.33 / 190.25 ±0.58 / 191.04 ms │     no change │
│ QQuery 78 │        340.49 / 343.98 ±3.33 / 350.02 ms │        339.79 / 342.76 ±2.82 / 346.34 ms │     no change │
│ QQuery 79 │        233.09 / 234.57 ±1.62 / 237.15 ms │        233.94 / 236.02 ±1.24 / 237.23 ms │     no change │
│ QQuery 80 │        320.55 / 323.94 ±2.76 / 327.30 ms │        321.31 / 326.39 ±2.91 / 329.11 ms │     no change │
│ QQuery 81 │           26.33 / 27.38 ±0.68 / 28.20 ms │           26.48 / 27.22 ±0.62 / 28.21 ms │     no change │
│ QQuery 82 │        197.82 / 199.31 ±2.29 / 203.86 ms │        198.55 / 200.71 ±2.16 / 203.59 ms │     no change │
│ QQuery 83 │           39.37 / 41.36 ±2.24 / 45.22 ms │           38.52 / 39.36 ±1.33 / 42.00 ms │     no change │
│ QQuery 84 │           48.63 / 49.58 ±0.88 / 50.80 ms │           48.77 / 49.40 ±0.39 / 49.92 ms │     no change │
│ QQuery 85 │        147.39 / 148.66 ±1.16 / 150.63 ms │        147.83 / 148.63 ±0.66 / 149.75 ms │     no change │
│ QQuery 86 │           38.52 / 40.01 ±1.14 / 41.54 ms │           39.86 / 40.90 ±0.87 / 42.04 ms │     no change │
│ QQuery 87 │           85.60 / 88.73 ±3.70 / 95.81 ms │           85.60 / 88.35 ±3.32 / 94.88 ms │     no change │
│ QQuery 88 │        100.63 / 101.95 ±0.96 / 103.51 ms │         99.93 / 101.22 ±1.04 / 102.68 ms │     no change │
│ QQuery 89 │        118.81 / 119.79 ±1.26 / 122.07 ms │        118.70 / 119.92 ±1.02 / 121.42 ms │     no change │
│ QQuery 90 │           23.99 / 24.20 ±0.20 / 24.55 ms │           22.99 / 24.11 ±0.67 / 24.90 ms │     no change │
│ QQuery 91 │           61.98 / 64.38 ±1.66 / 66.73 ms │           62.03 / 64.30 ±2.30 / 68.74 ms │     no change │
│ QQuery 92 │           57.67 / 58.07 ±0.31 / 58.44 ms │           57.81 / 59.39 ±1.21 / 61.43 ms │     no change │
│ QQuery 93 │        184.73 / 185.90 ±0.88 / 187.18 ms │        185.38 / 188.28 ±1.90 / 190.69 ms │     no change │
│ QQuery 94 │           61.74 / 62.66 ±0.75 / 63.87 ms │           60.38 / 62.32 ±1.48 / 64.92 ms │     no change │
│ QQuery 95 │        127.91 / 128.82 ±0.55 / 129.40 ms │        127.77 / 128.56 ±0.72 / 129.74 ms │     no change │
│ QQuery 96 │           73.22 / 74.44 ±0.77 / 75.59 ms │           73.32 / 74.75 ±1.10 / 76.65 ms │     no change │
│ QQuery 97 │        125.16 / 126.41 ±0.79 / 127.42 ms │        124.06 / 127.60 ±2.47 / 130.65 ms │     no change │
│ QQuery 98 │        154.18 / 156.03 ±1.73 / 159.27 ms │        153.08 / 156.89 ±2.23 / 159.74 ms │     no change │
│ QQuery 99 │ 10778.40 / 10822.79 ±36.20 / 10879.92 ms │ 10738.74 / 10797.52 ±47.31 / 10877.80 ms │     no change │
└───────────┴──────────────────────────────────────────┴──────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 31720.03ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 31590.96ms │
│ Average Time (HEAD)                             │   320.40ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   319.10ms │
│ Queries Faster                                  │          4 │
│ Queries Slower                                  │          5 │
│ Queries with No Change                          │         90 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric	Value
Wall time	158.9s
Peak memory	5.5 GiB
Avg memory	4.5 GiB
CPU user	261.6s
CPU sys	17.7s
Peak spill	0 B

tpcds — branch

Metric	Value
Wall time	158.3s
Peak memory	5.5 GiB
Avg memory	4.7 GiB
CPU user	260.4s
CPU sys	17.4s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-04-13T07:13:35Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃      feat_reorder-row-groups-by-stats ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.19 / 4.47 ±6.41 / 17.29 ms │          1.18 / 4.49 ±6.46 / 17.40 ms │    no change │
│ QQuery 1  │        14.15 / 14.60 ±0.26 / 14.84 ms │        14.15 / 14.56 ±0.22 / 14.82 ms │    no change │
│ QQuery 2  │        44.34 / 44.86 ±0.43 / 45.63 ms │        43.30 / 43.63 ±0.25 / 44.03 ms │    no change │
│ QQuery 3  │        44.54 / 45.82 ±1.10 / 47.71 ms │        43.32 / 44.20 ±1.01 / 46.01 ms │    no change │
│ QQuery 4  │     292.18 / 299.58 ±6.01 / 307.53 ms │     286.50 / 297.56 ±6.69 / 305.63 ms │    no change │
│ QQuery 5  │     347.46 / 350.76 ±2.10 / 353.17 ms │     346.62 / 348.87 ±1.91 / 351.08 ms │    no change │
│ QQuery 6  │          5.72 / 7.22 ±1.67 / 10.45 ms │         5.59 / 10.51 ±5.76 / 21.61 ms │ 1.46x slower │
│ QQuery 7  │        16.96 / 17.07 ±0.12 / 17.27 ms │        16.65 / 16.92 ±0.19 / 17.22 ms │    no change │
│ QQuery 8  │     417.45 / 427.68 ±7.65 / 440.48 ms │     426.14 / 431.89 ±5.38 / 441.00 ms │    no change │
│ QQuery 9  │     677.82 / 684.73 ±7.88 / 698.33 ms │    648.52 / 655.74 ±10.01 / 675.29 ms │    no change │
│ QQuery 10 │        94.83 / 95.71 ±0.80 / 97.09 ms │        90.54 / 93.26 ±2.43 / 97.71 ms │    no change │
│ QQuery 11 │     107.39 / 107.95 ±0.67 / 108.92 ms │     104.16 / 105.15 ±0.72 / 106.34 ms │    no change │
│ QQuery 12 │     349.32 / 356.12 ±4.19 / 361.45 ms │     338.19 / 342.31 ±2.21 / 344.83 ms │    no change │
│ QQuery 13 │    452.34 / 466.84 ±13.61 / 486.92 ms │    441.75 / 464.03 ±17.71 / 491.08 ms │    no change │
│ QQuery 14 │     348.39 / 351.36 ±3.27 / 356.18 ms │     348.22 / 351.57 ±1.94 / 353.55 ms │    no change │
│ QQuery 15 │    357.19 / 373.85 ±17.96 / 408.62 ms │     362.86 / 368.16 ±5.64 / 376.07 ms │    no change │
│ QQuery 16 │     714.87 / 726.32 ±6.98 / 736.12 ms │    741.65 / 757.99 ±13.25 / 781.76 ms │    no change │
│ QQuery 17 │    716.44 / 748.86 ±25.39 / 773.58 ms │     721.88 / 729.91 ±6.46 / 738.41 ms │    no change │
│ QQuery 18 │ 1373.87 / 1427.78 ±45.55 / 1482.21 ms │ 1434.15 / 1503.93 ±35.02 / 1525.61 ms │ 1.05x slower │
│ QQuery 19 │        35.59 / 36.41 ±0.62 / 37.03 ms │        36.31 / 38.22 ±1.99 / 41.73 ms │    no change │
│ QQuery 20 │    713.38 / 725.99 ±13.06 / 742.01 ms │    716.03 / 731.70 ±15.57 / 761.50 ms │    no change │
│ QQuery 21 │     765.34 / 768.87 ±3.09 / 772.97 ms │     762.06 / 764.19 ±1.70 / 767.11 ms │    no change │
│ QQuery 22 │  1134.09 / 1142.01 ±5.27 / 1147.72 ms │  1132.10 / 1137.95 ±4.16 / 1143.66 ms │    no change │
│ QQuery 23 │ 3094.85 / 3120.29 ±14.36 / 3137.68 ms │ 3077.09 / 3115.46 ±20.88 / 3134.31 ms │    no change │
│ QQuery 24 │     100.75 / 103.14 ±1.95 / 106.05 ms │     100.97 / 103.96 ±2.97 / 108.54 ms │    no change │
│ QQuery 25 │     139.98 / 141.47 ±1.44 / 144.06 ms │     138.30 / 140.65 ±1.57 / 142.80 ms │    no change │
│ QQuery 26 │     101.23 / 102.52 ±0.77 / 103.55 ms │     102.40 / 104.22 ±1.40 / 105.89 ms │    no change │
│ QQuery 27 │     855.42 / 859.58 ±5.27 / 869.85 ms │     855.77 / 861.08 ±5.43 / 869.79 ms │    no change │
│ QQuery 28 │ 3284.27 / 3308.55 ±14.18 / 3325.76 ms │ 3289.54 / 3316.91 ±14.83 / 3330.08 ms │    no change │
│ QQuery 29 │        50.39 / 55.78 ±5.11 / 62.80 ms │        51.97 / 56.29 ±4.39 / 63.23 ms │    no change │
│ QQuery 30 │     357.77 / 370.56 ±7.03 / 377.46 ms │     362.16 / 368.32 ±5.55 / 378.64 ms │    no change │
│ QQuery 31 │    363.55 / 385.00 ±12.49 / 398.04 ms │     398.54 / 401.59 ±2.74 / 405.02 ms │    no change │
│ QQuery 32 │ 1034.15 / 1059.31 ±22.35 / 1100.09 ms │ 1173.83 / 1288.89 ±81.25 / 1419.40 ms │ 1.22x slower │
│ QQuery 33 │ 1472.92 / 1487.84 ±11.01 / 1499.14 ms │ 1466.40 / 1513.38 ±43.97 / 1593.67 ms │    no change │
│ QQuery 34 │ 1464.67 / 1499.90 ±31.40 / 1548.80 ms │ 1475.45 / 1491.40 ±14.74 / 1517.30 ms │    no change │
│ QQuery 35 │     390.93 / 396.99 ±5.12 / 404.97 ms │     392.12 / 396.84 ±3.54 / 401.88 ms │    no change │
│ QQuery 36 │     120.42 / 122.98 ±1.62 / 125.38 ms │     119.20 / 123.01 ±3.25 / 127.60 ms │    no change │
│ QQuery 37 │        49.66 / 50.72 ±1.27 / 53.16 ms │        47.35 / 50.08 ±1.55 / 51.79 ms │    no change │
│ QQuery 38 │        76.35 / 78.01 ±1.50 / 80.66 ms │        76.64 / 77.90 ±0.90 / 78.73 ms │    no change │
│ QQuery 39 │     208.12 / 219.98 ±6.84 / 229.12 ms │     220.84 / 223.45 ±1.88 / 225.94 ms │    no change │
│ QQuery 40 │        24.82 / 25.18 ±0.37 / 25.85 ms │        24.34 / 26.23 ±2.29 / 30.09 ms │    no change │
│ QQuery 41 │        20.47 / 21.79 ±1.17 / 23.54 ms │        20.58 / 21.45 ±0.94 / 23.06 ms │    no change │
│ QQuery 42 │        19.76 / 20.16 ±0.31 / 20.63 ms │        19.68 / 20.30 ±0.47 / 21.02 ms │    no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 22654.62ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 22958.16ms │
│ Average Time (HEAD)                             │   526.85ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   533.91ms │
│ Queries Faster                                  │          0 │
│ Queries Slower                                  │          3 │
│ Queries with No Change                          │         40 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	114.5s
Peak memory	42.0 GiB
Avg memory	32.4 GiB
CPU user	1080.8s
CPU sys	84.9s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	115.9s
Peak memory	37.7 GiB
Avg memory	28.3 GiB
CPU user	1081.1s
CPU sys	96.2s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-04-13T07:23:27Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃      feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.34 / 4.79 ±6.61 / 18.02 ms │          1.22 / 4.58 ±6.53 / 17.64 ms │     no change │
│ QQuery 1  │        15.01 / 15.50 ±0.43 / 16.28 ms │        14.27 / 14.85 ±0.31 / 15.10 ms │     no change │
│ QQuery 2  │        45.69 / 46.01 ±0.28 / 46.38 ms │        44.25 / 44.67 ±0.33 / 45.07 ms │     no change │
│ QQuery 3  │        45.23 / 49.08 ±3.08 / 53.24 ms │        44.46 / 47.22 ±1.51 / 48.61 ms │     no change │
│ QQuery 4  │    307.65 / 329.18 ±15.17 / 353.16 ms │    333.50 / 349.92 ±10.12 / 363.58 ms │  1.06x slower │
│ QQuery 5  │     383.18 / 390.58 ±5.85 / 398.30 ms │    375.25 / 389.29 ±13.55 / 414.94 ms │     no change │
│ QQuery 6  │          5.21 / 7.51 ±2.87 / 13.10 ms │           5.67 / 7.06 ±1.14 / 8.47 ms │ +1.06x faster │
│ QQuery 7  │        17.58 / 18.20 ±0.57 / 19.20 ms │        17.81 / 21.41 ±6.39 / 34.18 ms │  1.18x slower │
│ QQuery 8  │     467.34 / 477.25 ±9.77 / 495.30 ms │    473.21 / 490.70 ±19.41 / 524.50 ms │     no change │
│ QQuery 9  │    699.39 / 729.91 ±22.29 / 765.71 ms │    745.55 / 764.99 ±14.67 / 789.00 ms │     no change │
│ QQuery 10 │      99.31 / 102.80 ±4.68 / 111.76 ms │       95.14 / 99.07 ±3.46 / 104.88 ms │     no change │
│ QQuery 11 │     107.63 / 109.49 ±1.08 / 110.50 ms │     110.95 / 113.80 ±2.44 / 117.91 ms │     no change │
│ QQuery 12 │     389.20 / 393.99 ±3.59 / 399.55 ms │    378.13 / 397.72 ±15.36 / 416.18 ms │     no change │
│ QQuery 13 │    497.82 / 519.79 ±18.11 / 553.00 ms │    507.72 / 534.68 ±18.52 / 564.44 ms │     no change │
│ QQuery 14 │    356.26 / 382.90 ±14.21 / 394.37 ms │     378.78 / 390.21 ±8.26 / 404.19 ms │     no change │
│ QQuery 15 │    397.89 / 421.76 ±20.69 / 455.22 ms │    406.54 / 430.62 ±31.51 / 491.99 ms │     no change │
│ QQuery 16 │    817.79 / 843.15 ±20.23 / 870.39 ms │    795.58 / 834.18 ±21.31 / 858.28 ms │     no change │
│ QQuery 17 │    769.11 / 793.23 ±12.93 / 806.04 ms │    790.19 / 823.71 ±33.40 / 886.71 ms │     no change │
│ QQuery 18 │ 1592.82 / 1638.07 ±31.84 / 1675.50 ms │ 1536.04 / 1625.95 ±49.00 / 1673.65 ms │     no change │
│ QQuery 19 │        36.17 / 38.43 ±2.73 / 41.97 ms │       39.25 / 52.79 ±14.25 / 76.08 ms │  1.37x slower │
│ QQuery 20 │    742.56 / 763.72 ±21.08 / 796.51 ms │    747.36 / 771.03 ±35.37 / 841.37 ms │     no change │
│ QQuery 21 │     787.42 / 799.07 ±8.52 / 810.66 ms │     794.97 / 798.60 ±2.97 / 803.45 ms │     no change │
│ QQuery 22 │  1173.63 / 1184.40 ±7.48 / 1192.57 ms │  1187.50 / 1195.28 ±6.21 / 1202.63 ms │     no change │
│ QQuery 23 │ 3281.57 / 3306.51 ±21.44 / 3343.54 ms │ 3275.73 / 3301.49 ±20.45 / 3332.33 ms │     no change │
│ QQuery 24 │     108.95 / 111.32 ±1.91 / 114.39 ms │     107.23 / 110.08 ±3.29 / 116.30 ms │     no change │
│ QQuery 25 │     144.33 / 146.42 ±1.45 / 148.05 ms │     143.15 / 145.40 ±1.34 / 146.55 ms │     no change │
│ QQuery 26 │     107.01 / 108.68 ±1.66 / 111.45 ms │     105.69 / 108.26 ±1.82 / 110.40 ms │     no change │
│ QQuery 27 │     883.03 / 891.30 ±4.95 / 898.28 ms │    874.98 / 887.56 ±12.77 / 911.22 ms │     no change │
│ QQuery 28 │ 3386.14 / 3425.51 ±28.74 / 3464.68 ms │ 3398.98 / 3422.81 ±12.65 / 3436.83 ms │     no change │
│ QQuery 29 │        53.27 / 58.56 ±6.00 / 69.01 ms │        52.73 / 57.20 ±4.79 / 64.77 ms │     no change │
│ QQuery 30 │     405.31 / 409.26 ±4.00 / 416.32 ms │     393.87 / 407.38 ±7.40 / 414.80 ms │     no change │
│ QQuery 31 │    383.84 / 403.31 ±16.85 / 432.32 ms │    397.47 / 427.43 ±17.62 / 452.58 ms │  1.06x slower │
│ QQuery 32 │ 1072.71 / 1165.40 ±47.01 / 1203.37 ms │ 1232.95 / 1400.42 ±92.57 / 1512.21 ms │  1.20x slower │
│ QQuery 33 │ 1640.38 / 1658.01 ±11.14 / 1675.05 ms │ 1633.63 / 1658.56 ±17.62 / 1686.57 ms │     no change │
│ QQuery 34 │ 1681.98 / 1711.27 ±17.86 / 1732.02 ms │ 1664.80 / 1679.12 ±11.31 / 1694.41 ms │     no change │
│ QQuery 35 │    454.45 / 479.47 ±16.71 / 500.66 ms │    462.20 / 481.40 ±13.83 / 502.10 ms │     no change │
│ QQuery 36 │     122.21 / 128.91 ±3.68 / 132.42 ms │     124.76 / 133.75 ±5.72 / 142.63 ms │     no change │
│ QQuery 37 │        52.60 / 56.72 ±3.07 / 61.58 ms │        52.08 / 54.36 ±1.53 / 56.64 ms │     no change │
│ QQuery 38 │        77.39 / 80.30 ±1.75 / 82.02 ms │        79.32 / 81.94 ±2.16 / 84.78 ms │     no change │
│ QQuery 39 │     242.23 / 248.31 ±4.99 / 255.10 ms │     246.21 / 254.21 ±7.55 / 264.73 ms │     no change │
│ QQuery 40 │        28.18 / 30.63 ±1.34 / 32.00 ms │        24.84 / 27.65 ±1.91 / 29.46 ms │ +1.11x faster │
│ QQuery 41 │        22.58 / 23.91 ±0.89 / 25.28 ms │        22.53 / 23.86 ±1.25 / 26.03 ms │     no change │
│ QQuery 42 │        21.19 / 22.21 ±1.14 / 24.33 ms │        21.33 / 22.81 ±1.23 / 24.84 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 24524.79ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 24888.03ms │
│ Average Time (HEAD)                             │   570.34ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   578.79ms │
│ Queries Faster                                  │          2 │
│ Queries Slower                                  │          5 │
│ Queries with No Change                          │         36 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	123.9s
Peak memory	42.3 GiB
Avg memory	31.0 GiB
CPU user	1165.0s
CPU sys	98.5s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	125.7s
Peak memory	40.7 GiB
Avg memory	29.1 GiB
CPU user	1166.3s
CPU sys	111.6s
Peak spill	0 B

File an issue against this benchmark runner

Dandandan · 2026-04-13T07:24:14Z

I wonder if the ordering should be done before the files / row groups are assigned to partitions? So they are more globally sorted instead of just per partition? It seems now they are sorted within each partition, which should help, but perhaps not nearly as much as it would be if all the partitions contain the optimal row groups?

This would also help in the case of #21581

adriangbot · 2026-04-13T07:28:28Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                                   HEAD ┃        feat_reorder-row-groups-by-stats ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0  │     806.64 / 824.04 ±15.06 / 844.04 ms │       818.76 / 831.86 ±9.65 / 847.07 ms │ no change │
│ QQuery 1  │      207.91 / 208.37 ±0.34 / 208.85 ms │       208.05 / 209.42 ±1.18 / 211.29 ms │ no change │
│ QQuery 2  │      493.00 / 495.57 ±2.04 / 499.15 ms │       501.52 / 504.02 ±1.65 / 505.95 ms │ no change │
│ QQuery 3  │      313.03 / 314.64 ±0.96 / 315.57 ms │       313.38 / 315.81 ±1.51 / 317.65 ms │ no change │
│ QQuery 4  │     656.64 / 674.45 ±10.93 / 686.40 ms │       663.78 / 674.03 ±8.68 / 688.82 ms │ no change │
│ QQuery 5  │ 9437.73 / 9707.73 ±166.88 / 9887.44 ms │ 9679.36 / 9939.30 ±174.56 / 10160.05 ms │ no change │
│ QQuery 6  │  1002.26 / 1011.57 ±14.99 / 1041.49 ms │     997.60 / 1006.50 ±9.67 / 1023.43 ms │ no change │
│ QQuery 7  │     773.67 / 806.98 ±35.77 / 873.62 ms │       778.19 / 786.06 ±5.20 / 792.91 ms │ no change │
│ QQuery 8  │      397.92 / 404.38 ±5.08 / 412.20 ms │       398.58 / 404.24 ±5.67 / 415.04 ms │ no change │
│ QQuery 9  │  2807.44 / 2826.33 ±16.14 / 2853.16 ms │   2754.46 / 2797.70 ±24.98 / 2824.10 ms │ no change │
│ QQuery 10 │      633.75 / 639.16 ±5.96 / 648.49 ms │      631.36 / 642.65 ±13.99 / 670.06 ms │ no change │
│ QQuery 11 │  2047.27 / 2070.44 ±19.89 / 2101.14 ms │   2049.92 / 2079.78 ±21.09 / 2115.19 ms │ no change │
│ QQuery 12 │      200.39 / 202.67 ±2.01 / 205.97 ms │       194.24 / 202.01 ±6.44 / 213.63 ms │ no change │
└───────────┴────────────────────────────────────────┴─────────────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 20186.32ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 20393.39ms │
│ Average Time (HEAD)                             │  1552.79ms │
│ Average Time (feat_reorder-row-groups-by-stats) │  1568.72ms │
│ Queries Faster                                  │          0 │
│ Queries Slower                                  │          0 │
│ Queries with No Change                          │         13 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_extended — base (merge-base)

Metric	Value
Wall time	101.8s
Peak memory	32.6 GiB
Avg memory	27.4 GiB
CPU user	981.3s
CPU sys	48.2s
Peak spill	0 B

clickbench_extended — branch

Metric	Value
Wall time	102.8s
Peak memory	34.1 GiB
Avg memory	29.7 GiB
CPU user	986.9s
CPU sys	46.1s
Peak spill	0 B

File an issue against this benchmark runner

zhuqi-lucas · 2026-04-13T07:33:43Z

I wonder if the ordering should be done before the files / row groups are assigned to partitions? So they are more globally sorted instead of just per partition? It seems now they are sorted within each partition, which should help, but perhaps not nearly as much as it would be if all the partitions contain the optimal row groups?

This would also help in the case of #21581

Great point @Dandandan — you're right that global reorder is much more effective than per-partition reorder. With global reorder + round-robin distribution, each partition's first RG is close to the global optimum, so:

All partitions quickly converge to tight local TopK thresholds in parallel
SPM merging finishes faster because each partition's first few batches contain optimal values → LIMIT can be satisfied with minimal reads across partitions

The current per-partition reorder is limited because even after sorting, partition 0's "best" RG might be much worse than the global best (which may have landed in partition 2).

Moving to global reorder would require changes at the planning / EnforceDistribution layer to load RG statistics and redistribute RGs across partitions. I'd prefer to keep this PR as an incremental step (per-partition) and track global reorder as a follow-up — it would benefit both #21317 and #21581.

Does this make sense?

Dandandan · 2026-04-13T07:40:29Z

I wonder if the ordering should be done before the files / row groups are assigned to partitions? So they are more globally sorted instead of just per partition? It seems now they are sorted within each partition, which should help, but perhaps not nearly as much as it would be if all the partitions contain the optimal row groups?
This would also help in the case of #21581

Great point @Dandandan — you're right that global reorder is much more effective than per-partition reorder. With global reorder + round-robin distribution, each partition's first RG is close to the global optimum, so:

All partitions quickly converge to tight local TopK thresholds in parallel

SPM merging finishes faster because each partition's first few batches contain optimal values → LIMIT can be satisfied with minimal reads across partitions

The current per-partition reorder is limited because even after sorting, partition 0's "best" RG might be much worse than the global best (which may have landed in partition 2).

Moving to global reorder would require changes at the planning / EnforceDistribution layer to load RG statistics and redistribute RGs across partitions. I'd prefer to keep this PR as an incremental step (per-partition) and track global reorder as a follow-up — it would benefit both #21317 and #21581.

Does this make sense?

Sure, makes sense.

…lly exclusive Previously reorder_by_statistics and reverse_row_groups were mutually exclusive (else-if). This meant DESC queries on unsorted data could only get one optimization. Now they compose: reorder always sorts RGs by min ASC, then reverse flips for DESC. This ensures correct results for both sorted and unsorted inputs without regression. Also removes prepare_with_optimizer in favor of calling optimize() directly on each optimizer, and simplifies reorder_by_statistics to always use min ASC (direction handled by reverse).

The previous jitter formula only added overlap between adjacent RGs but kept the overall RG order ascending by min values. This meant reorder_by_statistics was a no-op — there was nothing to reorder. Fix by bucketing rows into 60 chunks, sorting within each chunk (with jitter for overlap), then scrambling chunk order using a deterministic permutation. This produces RGs that are individually sorted but appear in scrambled order in the file, so reorder_by_statistics has real work to do.

Add FileSource::reorder_files() trait method and ParquetSource implementation that sorts files by column statistics before placing them in the shared work queue. For DESC queries, files with the highest min value come first; for ASC, lowest max first. This ensures the first file read by any partition is the globally optimal one for TopK threshold convergence, complementing the intra-file RG reorder.

Before reading any parquet data, scan row group min/max statistics to compute an initial threshold for TopK's dynamic filter. This allows row-level filtering to benefit immediately from the first file opened, rather than waiting until TopK processes enough rows to build a threshold organically. Algorithm (single-column sort): - DESC LIMIT K: threshold = max(min) across RGs with num_rows >= K Filter: col > threshold - ASC LIMIT K: threshold = min(max) across RGs with num_rows >= K Filter: col < threshold Sort direction is read from sort_options on DynamicFilterPhysicalExpr, which is now set by SortExec::create_filter() for TopK queries. This makes the optimization work for ALL TopK queries on parquet, not just those with sort pushdown. The DynamicFilterPhysicalExpr is shared across all partitions, so each file's threshold update is visible to subsequent files globally. Graceful fallback: skips initialization when sort_options is absent, statistics are unavailable, column not found, or multi-column sort.

Previously file reorder and RG reorder only worked with sort pushdown (Inexact path, WITH ORDER). Now they extract sort info from DynamicFilterPhysicalExpr.sort_options in the predicate, which is set by SortExec for ALL TopK queries regardless of WITH ORDER. This means ORDER BY col DESC LIMIT K on any parquet table benefits from file reorder (best file first in shared queue), RG reorder (best RG first within file), and stats init (threshold before I/O).

Move try_init_topk_threshold() from build_stream() to prune_row_groups(), before prune_by_statistics(). This way: - File 1: stats init sets threshold from ALL its RG statistics, then prune_by_statistics uses it to prune file 1's own RGs. Only the best RG(s) are read, rest skipped with zero I/O. - File 2+: dynamic filter already has tight threshold from file 1, most RGs pruned immediately. This effectively achieves dynamic RG pruning without needing morsel- level scheduling — the threshold is computed from statistics (no data read), then used to prune RGs in the same file.

RG reorder and reverse must only trigger when sort pushdown is active (sort_order_for_reorder is set). Applying them to non-sort-pushdown TopK queries changes the RG read order, which alters tie-breaking for equal values (e.g. NULLs) and causes non-deterministic results. File reorder and stats init remain enabled for ALL TopK queries since they only affect pruning (which rows are skipped), not the relative order of rows within remaining RGs. Fixes fuzz_cases::topk_filter_pushdown::test_fuzz_topk_filter_pushdown

…wrap Critical fix: PruningPredicate compiles the expression at build time, so the DynamicFilterPhysicalExpr must be updated BEFORE the predicate is built. Previously stats init ran after, making RG pruning ineffective for the current file. Also fixes: - Unwrap CastExpr to find the inner Column (projection may add casts) - Use limit=1 default when scan limit is None (TopK fetch is at SortExec level, not pushed to scan) - Only init threshold in sort pushdown path to avoid tie-breaking changes for non-sort-pushdown TopK queries Local benchmark: single file with 61 sorted RGs, DESC LIMIT Baseline: 22-25ms per query Feature: 0.4-1.2ms per query (20-58x faster)

Add null-aware filter for NULLS FIRST sort: `col IS NULL OR col > threshold` ensures RGs with NULLs are not incorrectly pruned. Stats init remains restricted to sort pushdown path because pruning changes tie-breaking for equal values across RGs, which causes non-deterministic results in non-sort-pushdown TopK queries. The null-aware filter is still useful for sort pushdown DESC NULLS FIRST.

Stats init now fires for all TopK queries, not just sort pushdown path. The null-aware filter (IS NULL OR col > threshold for NULLS FIRST) ensures correctness when NULLs are present. Fix fuzz test: add remaining columns as ASC NULLS LAST tiebreakers to ORDER BY, making the sort fully deterministic. This is the correct approach since SQL doesn't guarantee tie-breaking order, and any optimization that changes RG read order may produce different but equally valid results for tied rows.

Stats init with max(min) threshold can over-prune for non-sorted data: the threshold may exceed the actual Kth value when rows are distributed across multiple RGs. This caused output_rows=0 in explain_analyze tests. Restrict stats init to sort pushdown path where data ordering guarantees the threshold is a valid lower bound. Keep fuzz test tiebreaker fix as it's independently correct (SQL doesn't guarantee tie-breaking order).

The max(min)/min(max) algorithm is only a valid threshold bound when RGs are non-overlapping (guaranteed by sorted data with sort pushdown). For overlapping RGs, top-K values may span multiple RGs and the threshold can over-prune, producing fewer results than expected. Keep stats init restricted to sort pushdown path. Keep fuzz test tiebreaker fix (independently correct).

Stats init now fires for ALL TopK queries where the predicate is only the DynamicFilterPhysicalExpr (no WHERE clause combined). This is safe because without WHERE, raw RG statistics accurately represent the qualifying rows. For TopK + WHERE queries, stats init remains restricted to sort pushdown path because the WHERE clause narrows qualifying rows below what raw statistics suggest, making the threshold potentially unsafe. Also adds surviving-rows safety check: after computing threshold, verify that remaining RGs have enough total rows (>= K) before applying. This prevents over-pruning when top-K values span multiple overlapping RGs.

Stats init is only safe when: 1. Sort pushdown active (sorted, non-overlapping RGs) 2. Predicate is DynamicFilter only (no WHERE clause) WHERE clause narrows qualifying rows below what raw statistics suggest, making the threshold potentially unsafe even on sorted data. Type coercion (CastExpr) issues also need resolution for general TopK support — tracked as follow-up work. Includes type cast fix (parquet stats type → column type) and surviving-rows safety check for future use.

Instead of threshold-based pruning (which fails with WHERE clauses due to unknown qualifying row counts), use cumulative row counting: after reorder + reverse, accumulate rows from the front until we have enough for the TopK fetch limit (K), then prune the rest. This works for sort pushdown with or without WHERE because it only depends on row counts + RG ordering, not threshold values or types. Adds fetch field to DynamicFilterPhysicalExpr (set by SortExec) so the parquet reader knows the TopK K value. Keeps stats init for the no-WHERE sort pushdown case (20-58x speedup) as a complementary optimization that also helps cross-file pruning via the shared DynamicFilter.

Extend RG reorder, reverse, and cumulative pruning beyond sort pushdown to ALL TopK queries via DynamicFilterPhysicalExpr sort_options. For non-sort-pushdown TopK, cumulative pruning is guarded by a non-overlap check: after reorder, verify adjacent RGs satisfy max[i] <= min[i+1]. Only prune when RGs are non-overlapping (guarantees top-K values are in the first N RGs). Sort pushdown path skips the overlap check (sorted data is guaranteed non-overlapping).

… regression) Reverse and cumulative pruning from DynamicFilter now only trigger when reorder_optimizer is Some (the sort column was found in parquet stats). For GROUP BY + ORDER BY queries, the sort column is an aggregate output not in parquet — reorder bails out, so reverse and cumulative prune should also skip. Previously, reverse ran regardless, changing I/O patterns with no benefit (Q23 2x slower in ClickBench).

…s PR - benchmarks/bench.sh: data generation fix belongs in apache#21711 - listing_table_partitions.slt: unrelated change from another branch Fuzz test tiebreaker retained (needed for RG reorder on all TopK).

…LT tests Remove try_init_topk_threshold and compute_best_threshold_from_stats. Stats init had multiple issues: - Gt vs GtEq boundary (excluded valid top-K values) - Conflicted with cumulative prune when K spans multiple RGs - Type coercion (CastExpr) and WHERE clause interaction Cumulative RG pruning is strictly better: works with WHERE, no threshold computation, no type issues. After reorder + reverse, just count rows from the front until >= K, truncate the rest. Add comprehensive SLT tests: - Test I: WITH ORDER + DESC LIMIT (stats init + cumulative prune) - Test J: Non-overlapping RGs without WITH ORDER (DynamicFilter path) - Test K: Overlapping RGs (cumulative prune must NOT trigger)

Bring back stats init with all issues fixed: - GtEq/LtEq instead of Gt/Lt (include boundary values) - Use df.fetch() as limit (TopK K value, not scan limit) When K > single RG rows, stats init skips → cumulative prune handles it - Cast threshold to column data type (parquet vs table schema mismatch) - Null-aware filter for NULLS FIRST - Generation check prevents overwrite by later partitions - Restricted to sort pushdown + pure DynamicFilter (no WHERE) Stats init and cumulative prune are complementary: - Stats init: updates PruningPredicate → prunes at RG statistics level - Cumulative prune: truncates after reorder+reverse → prunes by row count Both work together without conflict when using df.fetch().

create_filter() was called before new_sort.fetch was set, so DynamicFilterPhysicalExpr.fetch was always 0 (or None from old self). Fix by setting fetch before creating the filter. This was the root cause of stats init and cumulative prune not triggering on CI — fetch=0 meant "no rows needed" → skip.

For GROUP BY + ORDER BY queries, the TopK sort column is an aggregate output (e.g. COUNT(*)) that doesn't exist in the parquet file schema. Previously we still created ReorderByStatistics which tried to look up the column in statistics — wasted work. Now check column existence in file schema before creating the optimizer. This eliminates overhead for non-scan-level TopK queries (ClickBench Q40-Q42 regression fix).

- truncate_row_groups now skips when row_selection is present to preserve page-level pruning state (xudong review feedback) - Remove incomplete DynamicRgPruner exploration code

Multi-key ORDER BY: use the first sort key for RG-level optimizations. Secondary keys only affect tie-breaking within RGs, not RG decisions. truncate_row_groups: skip truncation when row_selection exists to preserve page-level pruning state (xudong review). Tests: - Test L: multi-key DESC/ASC LIMIT (6 sub-tests) - truncate unit tests: basic, row_selection skip, no-op overflow

Test M: file declared WITH ORDER (id ASC, value ASC), multi-key queries testing: - M.1: EXPLAIN showing reverse_row_groups=true for fully reversed match - M.2: DESC, DESC LIMIT 3 — correct results - M.3: larger LIMIT spanning multiple RGs - M.4: ASC, ASC (same direction = Exact, sort elimination) - M.5: partial match (first key reversed, second key same) — NOT Inexact - M.6: full sort, data integrity check

Copilot AI review requested due to automatic review settings April 13, 2026 06:40

github-actions Bot added the datasource Changes to the datasource crate label Apr 13, 2026

zhuqi-lucas force-pushed the feat/reorder-row-groups-by-stats branch from 3700464 to a013bf6 Compare April 13, 2026 06:42

Copilot AI reviewed Apr 13, 2026

View reviewed changes

github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label Apr 13, 2026

Copilot started reviewing on behalf of zhuqi-lucas April 13, 2026 06:50 View session

Dandandan reviewed Apr 13, 2026

View reviewed changes

zhuqi-lucas mentioned this pull request Apr 13, 2026

Reorder row groups by GROUP BY keys to reduce aggregate partition state and improve cache locality #21581

Open

zhuqi-lucas mentioned this pull request Apr 13, 2026

Add benchmark for sort pushdown Inexact path (row group reorder) #21582

Closed

zhuqi-lucas added 29 commits April 23, 2026 13:01

fix: restore benchmark files from upstream main

832a541

fix: escape brackets in doc comment to fix rustdoc link error

f76c5bc

chore: remove benchmark and listing_table_partitions changes from thi…

f91c5b9

…s PR - benchmarks/bench.sh: data generation fix belongs in apache#21711 - listing_table_partitions.slt: unrelated change from another branch Fuzz test tiebreaker retained (needed for RG reorder on all TopK).

fix: cumulative prune only without WHERE to avoid under-returning rows

f7c42d8

fix: use slt:ignore for non-deterministic output_rows_skew metric

8ded1f4

fix: skip cumulative prune when row_selection exists + cleanup

138bd8e

- truncate_row_groups now skips when row_selection is present to preserve page-level pruning state (xudong review feedback) - Remove incomplete DynamicRgPruner exploration code

zhuqi-lucas force-pushed the feat/reorder-row-groups-by-stats branch from 2081071 to d725d84 Compare April 23, 2026 05:07

		// LexOrdering is guaranteed non-empty, so first() returns &PhysicalSortExpr
		let first_sort_expr = sort_order.first();

-        // LexOrdering is guaranteed non-empty, so first() returns &PhysicalSortExpr
-        let first_sort_expr = sort_order.first();
+        let first_sort_expr = match sort_order.iter().next() {
+            Some(expr) => expr,
+            None => {
+                debug!("Skipping RG reorder: empty sort order");
+                return Ok(self);
+            }
+        };

Conversation

zhuqi-lucas commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

1. Global file reorder (FileSource::reorder_files)

2. RG reorder within file (reorder_by_statistics)

3. TopK threshold init from statistics (try_init_topk_threshold)

4. Cumulative RG pruning (truncate_row_groups)

5. Compose reorder + reverse

How they work together

Coverage matrix

Local benchmark (single file, 61 sorted RGs, DESC LIMIT, 1 partition)

Key bug fix: SortExec.fetch ordering

Changes to DynamicFilterPhysicalExpr

Are these changes tested?

Are there any user-facing changes?

Uh oh!

zhuqi-lucas commented Apr 13, 2026

Uh oh!

Dandandan commented Apr 13, 2026

Uh oh!

adriangbot commented Apr 13, 2026

Uh oh!

adriangbot commented Apr 13, 2026

Uh oh!

adriangbot commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Dandandan Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

zhuqi-lucas Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Dandandan Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhuqi-lucas Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

zhuqi-lucas Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Dandandan commented Apr 13, 2026

Uh oh!

adriangbot commented Apr 13, 2026

Uh oh!

adriangbot commented Apr 13, 2026

Uh oh!

adriangbot commented Apr 13, 2026

Uh oh!

adriangbot commented Apr 13, 2026

Uh oh!

adriangbot commented Apr 13, 2026

Uh oh!

Dandandan commented Apr 13, 2026

Uh oh!

adriangbot commented Apr 13, 2026

zhuqi-lucas commented Apr 13, 2026 •

edited

Loading

1. Global file reorder (`FileSource::reorder_files`)

2. RG reorder within file (`reorder_by_statistics`)

3. TopK threshold init from statistics (`try_init_topk_threshold`)

4. Cumulative RG pruning (`truncate_row_groups`)

Key bug fix: `SortExec.fetch` ordering

Changes to `DynamicFilterPhysicalExpr`

Dandandan Apr 13, 2026 •

edited

Loading

zhuqi-lucas commented Apr 13, 2026 •

edited

Loading