Skip to content

feat(parquet): row-group morselization for sibling FileStream stealing#21766

Draft
Dandandan wants to merge 8 commits intoapache:mainfrom
Dandandan:row-group-morselization
Draft

feat(parquet): row-group morselization for sibling FileStream stealing#21766
Dandandan wants to merge 8 commits intoapache:mainfrom
Dandandan:row-group-morselization

Conversation

@Dandandan
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Follow-up to #21351 (Dynamic work scheduling in FileStream), which closed #20529 and explicitly deferred "splitting files into smaller units (e.g. across row groups)" as future work. This PR implements that.

  • Closes #.

Rationale for this change

With #21351, sibling FileStreams already steal whole files from a SharedWorkSource queue. But a single large parquet file still bottlenecks on one worker — the other N−1 sibling partitions sit idle even though each row group is independently readable. This shows up on single-file queries (ClickBench-style) and on the long-tail large-file case in multi-file scans.

This PR adds row-group granularity: the worker that pops a file donates its other row groups back to the shared queue so idle siblings steal them.

What changes are included in this PR?

Donation path (datafusion/datasource-parquet/src/opener.rs):

  • New `ParquetOpenState::SplitAndDonate` state between `LoadMetadata` and `PrepareFilters`. After metadata load, the donor keeps the first eligible row group; each remaining one is pushed to the front of the shared queue as a `PartitionedFile` clone whose `range` is a one-byte `FileRange` at that row group's starting offset.
  • The existing `prune_by_range` path matches that offset and scopes the stealer to exactly that row group — no new extension types, no metadata carried through `PartitionedFile.extensions`, no access-plan donation.
  • If the caller pre-narrowed the scan with a `file_range` that still spans multiple row groups (byte-range file partitioning), splitting stays inside that range: donated ranges remain subsets of the caller's.
  • Guards:
    • Caller-supplied `ParquetAccessPlan` in `extensions` → respected as-is, no donation.
    • Single row group in scope (whole file, or caller range isolating one RG) → no donation.

Shared queue plumbing:

  • `SharedWorkSource` is now `pub`; gains `push_front(items)`, `pop_front()`, and `Default`.
  • `FileSource::create_morselizer` takes an extra `Option` parameter so format-specific morselizers can participate in donation. Non-parquet sources ignore it.
  • `row_group_start_offset` helper is extracted into `row_group_filter.rs` and reused by both `prune_by_range` and the new donation path.

Trade-offs (v1):

  • Stealers re-read the parquet footer for their chunk. Object stores typically cache the range so this is cheap; carrying loaded metadata across siblings is left for a follow-up.
  • If a sibling drains the shared queue before the donor has donated, that sibling terminates (it observes an empty queue at `scan_state.rs`'s `ScanAndReturn::Done`). Accepted for v1; fixing requires splitter-handles / queue wakeup and can be added separately.

Are these changes tested?

Yes. Five new unit tests in `datafusion/datasource-parquet/src/opener.rs`:

  • `row_group_split_donates_remaining_row_groups` — donor reads RG 0; three donated chunks each read exactly their row group, in file order.
  • `row_group_split_skips_single_row_group_file` — no donation when the file has one row group.
  • `row_group_split_respects_caller_access_plan` — `ParquetAccessPlan` in extensions suppresses donation; caller plan executes as specified.
  • `row_group_split_within_caller_file_range` — caller byte range covering all RGs is split; donated ranges stay inside the caller range.
  • `row_group_split_skips_when_caller_range_covers_single_row_group` — narrow caller range isolating one RG suppresses donation.

All existing `datafusion-datasource` and `datafusion-datasource-parquet` tests continue to pass. `cargo clippy --all-targets --all-features -- -D warnings` is clean on both crates.

Are there any user-facing changes?

Performance only — faster single-file and tail-file scans under sibling work stealing. No semantic or API changes visible to SQL users. `SharedWorkSource` becomes `pub` (it was `pub(crate)`); `FileSource::create_morselizer` gains one parameter — default implementations ignore it.


🤖 Generated with Claude Code

When a parquet file is scanned inside a shared sibling-stream pool (the
`SharedWorkSource` introduced by apache#21351), the first stream to open the
file now donates its remaining row groups back to the shared queue so
idle sibling partitions can steal them. A single large parquet file no
longer bottlenecks on one worker.

Implementation:

- `ParquetOpenState` gains a `SplitAndDonate` state between
  `LoadMetadata` and `PrepareFilters`. The donor keeps the first
  eligible row group and pushes each remaining one onto the front of
  the shared queue as a `PartitionedFile` clone whose `range` is a
  one-byte `FileRange` at the row group's starting offset. The
  existing `prune_by_range` path matches that offset and scopes the
  stealer to just that row group — no new extension types, no metadata
  carriage, no access-plan donation.
- If the caller pre-narrowed the scan with a `file_range` that still
  covers multiple row groups (byte-range file partitioning), splitting
  stays inside that range: donated ranges remain subsets of the
  caller's.
- Caller-supplied `ParquetAccessPlan` in `extensions` and single-row-
  group scopes suppress donation.
- `SharedWorkSource` is `pub` and gets `push_front` / `pop_front` /
  `Default`. `row_group_start_offset` is extracted so it's shared with
  `prune_by_range`.

Stealers re-load the parquet footer; object stores typically cache the
range so this is cheap. Sharing loaded metadata across siblings is left
for a follow-up.

5 new tests cover: basic donation + stealer round-trip, single-RG
files, caller access-plan suppression, splitting inside a caller
`file_range`, and single-RG caller ranges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the datasource Changes to the datasource crate label Apr 21, 2026
@Dandandan
Copy link
Copy Markdown
Contributor Author

run benchmarks

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4290084954-1687-mnsfh 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-group-morselization (bfa1a93) to 9a1ed57 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4290084954-1688-5l8cx 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-group-morselization (bfa1a93) to 9a1ed57 (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4290084954-1689-s2g4k 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-group-morselization (bfa1a93) to 9a1ed57 (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and row-group-morselization
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃               row-group-morselization ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.19 / 4.62 ±6.77 / 18.15 ms │          1.20 / 4.68 ±6.81 / 18.30 ms │    no change │
│ QQuery 1  │        12.40 / 12.67 ±0.16 / 12.87 ms │        18.41 / 19.56 ±0.73 / 20.37 ms │ 1.54x slower │
│ QQuery 2  │        36.96 / 37.11 ±0.11 / 37.28 ms │        40.36 / 40.80 ±0.53 / 41.82 ms │ 1.10x slower │
│ QQuery 3  │        31.45 / 32.17 ±0.61 / 33.00 ms │        35.54 / 35.87 ±0.26 / 36.19 ms │ 1.11x slower │
│ QQuery 4  │     235.40 / 240.43 ±3.63 / 245.16 ms │     250.00 / 253.32 ±2.06 / 255.95 ms │ 1.05x slower │
│ QQuery 5  │     283.38 / 285.23 ±1.51 / 287.36 ms │    288.36 / 295.93 ±12.38 / 320.38 ms │    no change │
│ QQuery 6  │           6.30 / 6.94 ±0.39 / 7.26 ms │           7.58 / 8.02 ±0.37 / 8.51 ms │ 1.16x slower │
│ QQuery 7  │        13.71 / 14.34 ±1.12 / 16.57 ms │        20.88 / 21.07 ±0.14 / 21.21 ms │ 1.47x slower │
│ QQuery 8  │     330.39 / 332.81 ±2.56 / 337.04 ms │     331.65 / 336.82 ±3.03 / 340.55 ms │    no change │
│ QQuery 9  │     496.63 / 505.66 ±8.14 / 516.34 ms │    492.30 / 507.25 ±12.96 / 527.13 ms │    no change │
│ QQuery 10 │        74.15 / 75.20 ±0.83 / 76.49 ms │        82.43 / 83.08 ±0.50 / 83.69 ms │ 1.10x slower │
│ QQuery 11 │        84.15 / 85.15 ±0.56 / 85.88 ms │        94.03 / 94.81 ±0.61 / 95.77 ms │ 1.11x slower │
│ QQuery 12 │     275.33 / 282.52 ±8.00 / 296.95 ms │     281.69 / 286.30 ±3.77 / 292.27 ms │    no change │
│ QQuery 13 │     391.93 / 399.25 ±4.51 / 404.80 ms │     406.88 / 415.09 ±6.80 / 422.76 ms │    no change │
│ QQuery 14 │     287.43 / 292.52 ±6.06 / 301.85 ms │     294.69 / 298.23 ±5.06 / 308.29 ms │    no change │
│ QQuery 15 │     286.32 / 292.29 ±7.23 / 305.70 ms │     286.22 / 295.33 ±6.06 / 302.75 ms │    no change │
│ QQuery 16 │     622.92 / 631.80 ±7.15 / 641.03 ms │     628.10 / 634.01 ±5.33 / 641.99 ms │    no change │
│ QQuery 17 │     626.31 / 630.53 ±5.61 / 641.62 ms │     631.64 / 640.27 ±8.19 / 655.28 ms │    no change │
│ QQuery 18 │ 1257.97 / 1271.44 ±11.62 / 1289.48 ms │ 1276.42 / 1287.28 ±12.63 / 1310.27 ms │    no change │
│ QQuery 19 │        28.80 / 29.25 ±0.33 / 29.80 ms │        35.94 / 39.69 ±6.80 / 53.28 ms │ 1.36x slower │
│ QQuery 20 │     517.32 / 528.10 ±7.52 / 537.94 ms │     499.87 / 502.54 ±2.86 / 506.37 ms │    no change │
│ QQuery 21 │     594.74 / 600.33 ±4.15 / 606.11 ms │     591.78 / 593.66 ±1.40 / 595.20 ms │    no change │
│ QQuery 22 │  1073.71 / 1076.64 ±4.02 / 1084.51 ms │  1038.58 / 1044.40 ±3.87 / 1050.39 ms │    no change │
│ QQuery 23 │ 3333.15 / 3348.66 ±13.72 / 3369.88 ms │ 3231.74 / 3244.06 ±13.72 / 3270.00 ms │    no change │
│ QQuery 24 │        41.89 / 42.79 ±0.74 / 43.88 ms │        43.89 / 48.10 ±4.67 / 54.46 ms │ 1.12x slower │
│ QQuery 25 │     113.15 / 114.99 ±1.28 / 116.98 ms │     121.91 / 123.10 ±0.88 / 124.20 ms │ 1.07x slower │
│ QQuery 26 │        42.47 / 42.61 ±0.16 / 42.92 ms │        43.68 / 44.64 ±0.72 / 45.49 ms │    no change │
│ QQuery 27 │     668.92 / 680.20 ±8.65 / 695.47 ms │     658.73 / 664.01 ±3.88 / 669.60 ms │    no change │
│ QQuery 28 │ 2996.21 / 3018.34 ±16.70 / 3043.73 ms │ 2880.66 / 2902.76 ±14.72 / 2925.28 ms │    no change │
│ QQuery 29 │        42.61 / 46.02 ±5.58 / 57.10 ms │        45.84 / 48.68 ±4.31 / 57.24 ms │ 1.06x slower │
│ QQuery 30 │     311.60 / 314.90 ±2.57 / 319.15 ms │     316.25 / 318.95 ±3.17 / 324.69 ms │    no change │
│ QQuery 31 │     305.37 / 307.29 ±1.12 / 308.52 ms │     315.15 / 322.37 ±4.55 / 328.85 ms │    no change │
│ QQuery 32 │  1003.88 / 1008.46 ±3.32 / 1012.43 ms │  1017.34 / 1027.96 ±9.01 / 1044.12 ms │    no change │
│ QQuery 33 │  1430.13 / 1443.76 ±9.71 / 1458.59 ms │ 1443.41 / 1460.09 ±13.30 / 1479.94 ms │    no change │
│ QQuery 34 │ 1449.61 / 1476.48 ±17.47 / 1500.44 ms │  1456.92 / 1472.37 ±8.93 / 1482.24 ms │    no change │
│ QQuery 35 │     290.68 / 298.53 ±8.95 / 315.67 ms │    291.79 / 308.31 ±15.87 / 332.14 ms │    no change │
│ QQuery 36 │        63.77 / 68.30 ±4.80 / 76.52 ms │        63.67 / 72.08 ±6.84 / 83.24 ms │ 1.06x slower │
│ QQuery 37 │        35.79 / 36.27 ±0.64 / 37.46 ms │        36.25 / 38.76 ±2.34 / 43.16 ms │ 1.07x slower │
│ QQuery 38 │        41.17 / 44.44 ±4.10 / 52.39 ms │        39.68 / 42.98 ±4.25 / 51.08 ms │    no change │
│ QQuery 39 │     125.96 / 135.62 ±5.28 / 141.40 ms │     128.97 / 136.74 ±4.64 / 141.83 ms │    no change │
│ QQuery 40 │        14.00 / 16.35 ±3.02 / 22.29 ms │        15.26 / 16.91 ±3.03 / 22.97 ms │    no change │
│ QQuery 41 │        13.85 / 14.12 ±0.18 / 14.31 ms │        14.68 / 15.29 ±0.83 / 16.94 ms │ 1.08x slower │
│ QQuery 42 │        13.29 / 14.54 ±2.01 / 18.54 ms │        14.23 / 14.63 ±0.31 / 15.08 ms │    no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 20139.69ms │
│ Total Time (row-group-morselization)   │ 20060.82ms │
│ Average Time (HEAD)                    │   468.36ms │
│ Average Time (row-group-morselization) │   466.53ms │
│ Queries Faster                         │          0 │
│ Queries Slower                         │         15 │
│ Queries with No Change                 │         28 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 105.0s
Peak memory 30.9 GiB
Avg memory 23.2 GiB
CPU user 1071.3s
CPU sys 62.6s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 105.0s
Peak memory 30.0 GiB
Avg memory 23.0 GiB
CPU user 1086.2s
CPU sys 63.7s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and row-group-morselization
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃                   row-group-morselization ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │              6.88 / 7.38 ±0.82 / 9.02 ms │               6.81 / 7.33 ±0.85 / 9.02 ms │    no change │
│ QQuery 2  │        146.18 / 146.83 ±0.54 / 147.78 ms │         145.42 / 146.23 ±0.60 / 147.19 ms │    no change │
│ QQuery 3  │        112.08 / 113.65 ±0.93 / 114.71 ms │         113.61 / 114.39 ±0.60 / 115.11 ms │    no change │
│ QQuery 4  │     1215.64 / 1217.92 ±2.39 / 1222.13 ms │      1213.16 / 1218.38 ±2.95 / 1222.01 ms │    no change │
│ QQuery 5  │        171.31 / 173.09 ±1.47 / 175.68 ms │         172.45 / 173.76 ±0.93 / 174.94 ms │    no change │
│ QQuery 6  │       805.27 / 824.20 ±14.36 / 839.62 ms │        810.76 / 822.93 ±11.38 / 836.60 ms │    no change │
│ QQuery 7  │        332.14 / 335.32 ±2.59 / 338.49 ms │         334.89 / 336.74 ±1.21 / 338.73 ms │    no change │
│ QQuery 8  │        111.33 / 112.45 ±1.33 / 115.04 ms │         112.77 / 113.73 ±0.91 / 115.24 ms │    no change │
│ QQuery 9  │        100.08 / 103.38 ±2.57 / 105.63 ms │         101.21 / 104.95 ±1.89 / 106.24 ms │    no change │
│ QQuery 10 │        101.11 / 101.58 ±0.48 / 102.29 ms │         101.85 / 102.20 ±0.26 / 102.56 ms │    no change │
│ QQuery 11 │        815.83 / 819.15 ±2.67 / 823.35 ms │         807.74 / 820.27 ±6.39 / 824.91 ms │    no change │
│ QQuery 12 │           42.69 / 43.14 ±0.45 / 43.90 ms │            42.54 / 42.93 ±0.35 / 43.54 ms │    no change │
│ QQuery 13 │        386.73 / 388.01 ±1.08 / 389.87 ms │         385.45 / 389.55 ±4.05 / 397.31 ms │    no change │
│ QQuery 14 │        974.11 / 982.89 ±5.39 / 989.63 ms │         971.35 / 981.01 ±9.04 / 995.38 ms │    no change │
│ QQuery 15 │           14.31 / 14.75 ±0.30 / 15.12 ms │            14.28 / 14.60 ±0.31 / 15.16 ms │    no change │
│ QQuery 16 │              7.22 / 7.29 ±0.10 / 7.49 ms │               7.38 / 7.53 ±0.10 / 7.68 ms │    no change │
│ QQuery 17 │        220.49 / 222.78 ±1.79 / 225.02 ms │         220.38 / 221.72 ±0.87 / 222.89 ms │    no change │
│ QQuery 18 │        121.26 / 122.28 ±0.99 / 124.01 ms │         121.90 / 122.85 ±0.86 / 124.36 ms │    no change │
│ QQuery 19 │        152.12 / 153.31 ±1.23 / 155.59 ms │         152.51 / 154.68 ±2.11 / 158.11 ms │    no change │
│ QQuery 20 │           12.86 / 13.26 ±0.43 / 14.08 ms │            13.05 / 13.22 ±0.15 / 13.50 ms │    no change │
│ QQuery 21 │           18.55 / 18.83 ±0.19 / 19.14 ms │            18.78 / 19.05 ±0.30 / 19.51 ms │    no change │
│ QQuery 22 │        474.99 / 478.91 ±2.79 / 483.61 ms │         475.34 / 482.87 ±4.21 / 487.43 ms │    no change │
│ QQuery 23 │        805.39 / 808.53 ±3.32 / 813.95 ms │         810.11 / 813.78 ±2.76 / 818.66 ms │    no change │
│ QQuery 24 │        370.65 / 372.49 ±3.00 / 378.44 ms │         370.94 / 374.24 ±4.12 / 381.94 ms │    no change │
│ QQuery 25 │        329.67 / 332.48 ±1.53 / 334.31 ms │         330.79 / 332.77 ±1.97 / 336.28 ms │    no change │
│ QQuery 26 │           75.94 / 76.49 ±0.34 / 76.92 ms │            77.05 / 77.47 ±0.35 / 78.07 ms │    no change │
│ QQuery 27 │              6.81 / 6.90 ±0.12 / 7.14 ms │               7.01 / 7.65 ±1.14 / 9.93 ms │ 1.11x slower │
│ QQuery 28 │        148.13 / 149.26 ±1.35 / 151.87 ms │         146.93 / 148.47 ±1.67 / 151.57 ms │    no change │
│ QQuery 29 │        271.62 / 274.40 ±2.96 / 279.94 ms │         270.44 / 274.49 ±2.66 / 278.11 ms │    no change │
│ QQuery 30 │           40.67 / 41.14 ±0.43 / 41.70 ms │            41.04 / 41.50 ±0.48 / 42.32 ms │    no change │
│ QQuery 31 │        162.44 / 164.67 ±3.32 / 171.06 ms │         164.27 / 165.52 ±1.00 / 167.14 ms │    no change │
│ QQuery 32 │           13.22 / 13.61 ±0.34 / 14.23 ms │            13.08 / 13.47 ±0.22 / 13.70 ms │    no change │
│ QQuery 33 │        137.47 / 138.50 ±0.59 / 139.21 ms │         138.26 / 139.42 ±0.75 / 140.61 ms │    no change │
│ QQuery 34 │              6.81 / 6.93 ±0.15 / 7.21 ms │               6.89 / 7.10 ±0.20 / 7.47 ms │    no change │
│ QQuery 35 │         99.13 / 100.14 ±0.87 / 101.19 ms │          99.46 / 101.29 ±1.38 / 103.38 ms │    no change │
│ QQuery 36 │              6.48 / 6.78 ±0.17 / 6.94 ms │               6.67 / 7.13 ±0.29 / 7.57 ms │ 1.05x slower │
│ QQuery 37 │              8.09 / 8.21 ±0.16 / 8.51 ms │               8.22 / 8.40 ±0.21 / 8.80 ms │    no change │
│ QQuery 38 │           84.61 / 85.19 ±0.49 / 85.99 ms │            85.21 / 86.91 ±1.50 / 89.11 ms │    no change │
│ QQuery 39 │        116.66 / 118.66 ±2.14 / 122.73 ms │         118.19 / 120.31 ±2.38 / 124.67 ms │    no change │
│ QQuery 40 │        102.63 / 105.13 ±3.33 / 111.57 ms │         101.32 / 104.73 ±4.20 / 112.64 ms │    no change │
│ QQuery 41 │           13.99 / 14.55 ±0.75 / 15.96 ms │            14.11 / 15.14 ±1.67 / 18.46 ms │    no change │
│ QQuery 42 │        105.50 / 107.16 ±1.71 / 110.27 ms │         106.68 / 107.53 ±0.51 / 108.11 ms │    no change │
│ QQuery 43 │              5.70 / 5.80 ±0.14 / 6.05 ms │               5.79 / 5.88 ±0.11 / 6.09 ms │    no change │
│ QQuery 44 │           11.39 / 11.60 ±0.15 / 11.82 ms │            11.55 / 11.73 ±0.11 / 11.89 ms │    no change │
│ QQuery 45 │           47.40 / 47.71 ±0.34 / 48.33 ms │            47.59 / 48.05 ±0.34 / 48.59 ms │    no change │
│ QQuery 46 │              8.25 / 8.38 ±0.11 / 8.57 ms │               8.38 / 8.58 ±0.15 / 8.78 ms │    no change │
│ QQuery 47 │        660.00 / 670.17 ±5.18 / 674.35 ms │         666.15 / 673.72 ±4.39 / 677.54 ms │    no change │
│ QQuery 48 │        271.77 / 273.61 ±1.20 / 275.14 ms │         271.94 / 275.39 ±2.08 / 278.20 ms │    no change │
│ QQuery 49 │        247.14 / 248.89 ±1.10 / 250.49 ms │         248.40 / 250.03 ±1.25 / 251.61 ms │    no change │
│ QQuery 50 │        200.23 / 205.58 ±3.86 / 211.76 ms │         202.42 / 205.75 ±2.57 / 209.98 ms │    no change │
│ QQuery 51 │        173.29 / 177.22 ±3.10 / 182.63 ms │         177.75 / 180.80 ±2.25 / 184.61 ms │    no change │
│ QQuery 52 │        105.32 / 107.38 ±2.40 / 112.02 ms │         105.77 / 107.58 ±1.66 / 110.69 ms │    no change │
│ QQuery 53 │        100.15 / 101.21 ±0.62 / 101.89 ms │         100.98 / 102.17 ±1.15 / 104.07 ms │    no change │
│ QQuery 54 │        141.02 / 142.30 ±0.70 / 143.11 ms │         140.52 / 142.37 ±1.12 / 143.80 ms │    no change │
│ QQuery 55 │        104.42 / 104.75 ±0.21 / 104.99 ms │         104.80 / 106.45 ±1.29 / 108.65 ms │    no change │
│ QQuery 56 │        136.95 / 137.93 ±0.61 / 138.60 ms │         137.76 / 139.66 ±1.20 / 141.14 ms │    no change │
│ QQuery 57 │        163.39 / 164.72 ±0.93 / 166.11 ms │         164.00 / 166.13 ±1.51 / 167.93 ms │    no change │
│ QQuery 58 │        306.50 / 308.15 ±1.36 / 309.92 ms │         310.58 / 312.95 ±1.26 / 314.21 ms │    no change │
│ QQuery 59 │        195.20 / 196.47 ±1.22 / 198.01 ms │         196.35 / 197.34 ±1.33 / 199.95 ms │    no change │
│ QQuery 60 │        137.82 / 140.24 ±1.96 / 143.12 ms │         139.30 / 140.95 ±1.62 / 144.05 ms │    no change │
│ QQuery 61 │           13.16 / 13.38 ±0.25 / 13.86 ms │            13.41 / 13.56 ±0.16 / 13.88 ms │    no change │
│ QQuery 62 │        849.66 / 856.51 ±5.31 / 865.52 ms │         847.98 / 852.12 ±4.67 / 859.08 ms │    no change │
│ QQuery 63 │        100.61 / 101.81 ±1.27 / 104.20 ms │         100.77 / 101.55 ±0.74 / 102.59 ms │    no change │
│ QQuery 64 │        651.92 / 656.94 ±4.29 / 663.28 ms │         651.55 / 658.13 ±5.96 / 665.39 ms │    no change │
│ QQuery 65 │        239.55 / 242.62 ±2.81 / 246.07 ms │         242.19 / 244.42 ±2.96 / 250.18 ms │    no change │
│ QQuery 66 │       213.29 / 222.54 ±11.11 / 237.95 ms │        211.86 / 220.42 ±10.57 / 240.86 ms │    no change │
│ QQuery 67 │        289.31 / 295.45 ±5.37 / 304.11 ms │         292.04 / 295.91 ±6.91 / 309.73 ms │    no change │
│ QQuery 68 │              8.54 / 8.69 ±0.18 / 9.03 ms │               8.50 / 8.70 ±0.27 / 9.22 ms │    no change │
│ QQuery 69 │          95.85 / 99.45 ±4.48 / 108.00 ms │           96.60 / 97.79 ±1.30 / 100.25 ms │    no change │
│ QQuery 70 │        303.63 / 311.47 ±6.01 / 320.98 ms │        308.47 / 318.53 ±10.66 / 338.08 ms │    no change │
│ QQuery 71 │        131.23 / 133.05 ±2.51 / 137.88 ms │         133.29 / 135.61 ±2.31 / 139.65 ms │    no change │
│ QQuery 72 │        583.55 / 594.33 ±6.81 / 604.74 ms │         570.59 / 583.79 ±7.00 / 590.28 ms │    no change │
│ QQuery 73 │              6.48 / 6.64 ±0.24 / 7.12 ms │               6.54 / 6.73 ±0.23 / 7.17 ms │    no change │
│ QQuery 74 │        515.18 / 520.75 ±2.83 / 522.70 ms │         520.82 / 523.62 ±3.81 / 530.64 ms │    no change │
│ QQuery 75 │        264.32 / 267.42 ±2.15 / 270.74 ms │         266.84 / 268.84 ±1.73 / 272.00 ms │    no change │
│ QQuery 76 │        128.87 / 129.75 ±1.04 / 131.36 ms │         128.38 / 129.77 ±1.06 / 131.11 ms │    no change │
│ QQuery 77 │        186.40 / 187.94 ±1.40 / 189.83 ms │         186.53 / 187.12 ±0.56 / 187.83 ms │    no change │
│ QQuery 78 │        331.83 / 332.80 ±0.86 / 334.26 ms │         326.81 / 328.75 ±1.34 / 330.26 ms │    no change │
│ QQuery 79 │        226.69 / 230.67 ±2.73 / 234.58 ms │         225.24 / 228.79 ±3.53 / 235.42 ms │    no change │
│ QQuery 80 │        321.55 / 322.93 ±1.41 / 325.51 ms │         317.55 / 320.54 ±2.35 / 324.64 ms │    no change │
│ QQuery 81 │           25.53 / 26.00 ±0.59 / 27.14 ms │            25.23 / 25.54 ±0.27 / 26.02 ms │    no change │
│ QQuery 82 │           38.52 / 39.49 ±0.71 / 40.61 ms │            38.31 / 38.80 ±0.30 / 39.19 ms │    no change │
│ QQuery 83 │           36.58 / 36.75 ±0.14 / 37.01 ms │            35.98 / 36.33 ±0.24 / 36.61 ms │    no change │
│ QQuery 84 │           45.88 / 46.11 ±0.19 / 46.43 ms │            45.62 / 46.60 ±1.10 / 48.71 ms │    no change │
│ QQuery 85 │        140.33 / 141.36 ±0.96 / 142.72 ms │         139.58 / 139.82 ±0.24 / 140.24 ms │    no change │
│ QQuery 86 │           36.30 / 36.70 ±0.27 / 37.07 ms │            36.45 / 37.14 ±0.45 / 37.69 ms │    no change │
│ QQuery 87 │              3.46 / 3.56 ±0.11 / 3.75 ms │               3.47 / 3.55 ±0.10 / 3.73 ms │    no change │
│ QQuery 88 │         98.77 / 101.37 ±2.48 / 104.79 ms │          99.22 / 100.74 ±1.62 / 102.73 ms │    no change │
│ QQuery 89 │        116.59 / 116.89 ±0.41 / 117.66 ms │         115.25 / 116.57 ±1.79 / 120.11 ms │    no change │
│ QQuery 90 │           22.09 / 22.87 ±1.08 / 24.97 ms │            21.90 / 22.38 ±0.39 / 23.01 ms │    no change │
│ QQuery 91 │           57.58 / 58.57 ±0.88 / 60.16 ms │            57.52 / 58.05 ±0.44 / 58.59 ms │    no change │
│ QQuery 92 │           55.62 / 56.15 ±0.46 / 56.96 ms │            55.77 / 56.00 ±0.20 / 56.36 ms │    no change │
│ QQuery 93 │        179.72 / 181.56 ±1.73 / 183.79 ms │         179.29 / 180.32 ±1.20 / 182.60 ms │    no change │
│ QQuery 94 │           59.38 / 59.84 ±0.35 / 60.44 ms │            59.78 / 60.13 ±0.31 / 60.65 ms │    no change │
│ QQuery 95 │        124.64 / 125.18 ±0.39 / 125.66 ms │         124.38 / 125.01 ±0.72 / 126.09 ms │    no change │
│ QQuery 96 │           67.18 / 68.19 ±0.89 / 69.73 ms │            67.47 / 69.43 ±0.99 / 70.05 ms │    no change │
│ QQuery 97 │        115.26 / 117.91 ±1.80 / 120.11 ms │         116.91 / 117.81 ±0.77 / 119.15 ms │    no change │
│ QQuery 98 │        148.12 / 148.76 ±0.56 / 149.48 ms │         147.22 / 150.06 ±1.76 / 152.05 ms │    no change │
│ QQuery 99 │ 10683.84 / 10723.25 ±39.24 / 10798.41 ms │ 10745.61 / 10858.61 ±104.40 / 11008.80 ms │    no change │
└───────────┴──────────────────────────────────────────┴───────────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 30029.32ms │
│ Total Time (row-group-morselization)   │ 30201.38ms │
│ Average Time (HEAD)                    │   303.33ms │
│ Average Time (row-group-morselization) │   305.06ms │
│ Queries Faster                         │          0 │
│ Queries Slower                         │          2 │
│ Queries with No Change                 │         97 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 155.0s
Peak memory 6.4 GiB
Avg memory 5.6 GiB
CPU user 252.0s
CPU sys 7.6s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 155.0s
Peak memory 6.5 GiB
Avg memory 5.6 GiB
CPU user 252.7s
CPU sys 8.2s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

Benchmark for this request hit the 7200s job deadline before finishing.

Benchmarks requested: tpch

Kubernetes message
Job was active longer than specified deadline

File an issue against this benchmark runner

…m directly

Moves `SplitAndDonate` from between `LoadMetadata` and `PrepareFilters`
to *after* `PruneWithBloomFilters`, and restructures stealer paths so
row-group morselization composes with the full pruning pipeline.

**Donor path**:
- Runs the existing pipeline unchanged: file-level pruning → metadata
  load → prepare filters → page index → stats pruning → bloom pruning.
- `SplitAndDonate` then runs `prune_by_limit` (moved out of
  `build_stream`) as a separate file-level pass, picks the first
  surviving row group, and packages each remaining one into a
  `ParquetOpenChunk` containing the access plan, loaded
  `ArrowReaderMetadata`, prepared `PruningPredicate`,
  `PagePruningAccessPlanFilter`, physical schema, and rewritten
  predicate/projection.

**Stealer path**:
- `ParquetMorselPlanner::try_new` detects a `ParquetOpenChunk` on the
  incoming `PartitionedFile` and constructs state directly at
  `BuildStream` via `build_stealer_state`. No metadata load, no
  predicate rebuild, no pruning traversal — the stealer just builds
  its reader against the donor's finalized access plan.

**Shared work queue split**:
- `SharedWorkSource` now has two queues: `morsels` (pre-prepared
  chunks with finalized state) and `files` (whole unopened files).
  `pop_front` drains morsels first so their latency stays low. Donor
  calls `push_morsels` instead of the old `push_front` convention.

**Removed state/guards** (no longer needed with direct-BuildStream
entry):
- `PreparedParquetOpen::is_donated_chunk` and
  `preloaded_reader_metadata` fields.
- The `is_donated_chunk` short-circuits in `prune_file`,
  `prepare_open_file`, `load`, `prune_row_groups`, and
  `split_and_donate`.

Limit-pruning tests (`test_limit_pruning_*` in
`datafusion/core/tests/parquet/row_group_pruning.rs`) pass — the
donor sees the full row-group picture for `prune_by_limit`, stealers
inherit the pruned plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Dandandan
Copy link
Copy Markdown
Contributor Author

run benchmarks

- Inline the `caller_range` construction in
  `row_group_split_within_caller_file_range` test and drop the
  vestigial `let _ = caller_range;` binding left over from the earlier
  file-range-based donation mechanism.
- Update `split_and_donate` docstring: the stale `is_donated_chunk`
  reference predates the direct-to-BuildStream entry path. Stealers
  now never reach this function.
- Drop `rg_metadata.to_vec()` in the LIMIT pruning pass —
  `prune_by_limit` takes `&[RowGroupMetaData]`, so the slice is enough
  and we save one allocation per limit-pruned file.
- Delete two "what-not-why" narrating comments from the donation
  path ("Bundle everything the stealer needs..." and "Narrow the
  donor's access plan...") — the code is self-explanatory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Dandandan
Copy link
Copy Markdown
Contributor Author

run benchmarks

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4291134705-1702-2shzr 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-group-morselization (16967cd) to 9a1ed57 (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4291134705-1700-wrcpf 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-group-morselization (16967cd) to 9a1ed57 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4291134705-1701-2gh92 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-group-morselization (16967cd) to 9a1ed57 (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4291026396-1705-7rg6q 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-group-morselization (16967cd) to 9a1ed57 (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4291026396-1703-xbkd8 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-group-morselization (16967cd) to 9a1ed57 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4291026396-1704-txlv8 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-group-morselization (16967cd) to 9a1ed57 (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and row-group-morselization
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃               row-group-morselization ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.17 / 4.53 ±6.65 / 17.83 ms │          1.19 / 4.53 ±6.62 / 17.78 ms │     no change │
│ QQuery 1  │        12.88 / 13.31 ±0.22 / 13.48 ms │        15.04 / 16.31 ±1.48 / 18.66 ms │  1.22x slower │
│ QQuery 2  │        37.48 / 37.75 ±0.24 / 38.14 ms │        38.62 / 39.29 ±0.54 / 39.83 ms │     no change │
│ QQuery 3  │        31.61 / 32.22 ±0.51 / 33.02 ms │        33.58 / 34.28 ±0.67 / 35.50 ms │  1.06x slower │
│ QQuery 4  │     239.72 / 244.48 ±4.60 / 250.39 ms │     243.70 / 246.58 ±2.18 / 249.48 ms │     no change │
│ QQuery 5  │     283.71 / 285.12 ±1.03 / 286.85 ms │     285.29 / 289.23 ±3.69 / 296.25 ms │     no change │
│ QQuery 6  │           6.70 / 7.32 ±0.49 / 7.93 ms │           6.76 / 7.27 ±0.27 / 7.54 ms │     no change │
│ QQuery 7  │        14.02 / 14.24 ±0.19 / 14.58 ms │        16.21 / 16.45 ±0.14 / 16.66 ms │  1.16x slower │
│ QQuery 8  │     329.12 / 331.26 ±1.83 / 333.85 ms │     330.16 / 332.51 ±2.73 / 337.72 ms │     no change │
│ QQuery 9  │     533.99 / 538.69 ±3.94 / 544.23 ms │    482.72 / 524.52 ±22.38 / 545.84 ms │     no change │
│ QQuery 10 │        74.73 / 75.79 ±1.41 / 78.55 ms │        80.43 / 81.09 ±0.68 / 82.04 ms │  1.07x slower │
│ QQuery 11 │        87.39 / 89.38 ±1.67 / 92.45 ms │        93.78 / 94.46 ±0.95 / 96.33 ms │  1.06x slower │
│ QQuery 12 │     281.35 / 286.51 ±4.12 / 291.16 ms │     296.13 / 301.32 ±4.44 / 306.89 ms │  1.05x slower │
│ QQuery 13 │     397.34 / 407.22 ±9.22 / 422.36 ms │     403.10 / 418.72 ±7.99 / 425.47 ms │     no change │
│ QQuery 14 │     286.77 / 289.85 ±2.10 / 292.26 ms │     289.70 / 297.26 ±5.31 / 305.82 ms │     no change │
│ QQuery 15 │     282.13 / 289.43 ±5.77 / 297.42 ms │     280.66 / 289.13 ±5.64 / 296.76 ms │     no change │
│ QQuery 16 │    619.38 / 642.00 ±19.43 / 666.44 ms │     628.17 / 631.52 ±4.62 / 640.68 ms │     no change │
│ QQuery 17 │    639.73 / 661.37 ±11.07 / 670.98 ms │    623.94 / 654.67 ±22.02 / 679.26 ms │     no change │
│ QQuery 18 │ 1263.19 / 1285.85 ±19.02 / 1316.35 ms │ 1269.63 / 1297.49 ±23.03 / 1335.62 ms │     no change │
│ QQuery 19 │        29.01 / 30.45 ±2.31 / 34.98 ms │        30.79 / 36.63 ±9.60 / 55.76 ms │  1.20x slower │
│ QQuery 20 │     521.20 / 524.62 ±4.33 / 533.05 ms │     493.38 / 499.33 ±4.89 / 507.68 ms │     no change │
│ QQuery 21 │     593.33 / 598.17 ±3.23 / 602.19 ms │     597.85 / 603.50 ±9.21 / 621.83 ms │     no change │
│ QQuery 22 │ 1059.57 / 1092.14 ±16.88 / 1107.82 ms │  1038.69 / 1045.26 ±7.75 / 1059.96 ms │     no change │
│ QQuery 23 │ 3361.32 / 3416.41 ±41.74 / 3489.55 ms │ 3228.38 / 3291.92 ±54.23 / 3371.34 ms │     no change │
│ QQuery 24 │        42.29 / 42.99 ±0.53 / 43.44 ms │        47.37 / 50.37 ±4.29 / 58.87 ms │  1.17x slower │
│ QQuery 25 │     116.78 / 119.86 ±3.39 / 126.32 ms │     117.16 / 118.15 ±1.02 / 120.08 ms │     no change │
│ QQuery 26 │        44.02 / 44.97 ±1.37 / 47.64 ms │        47.65 / 50.26 ±4.56 / 59.35 ms │  1.12x slower │
│ QQuery 27 │     674.30 / 679.97 ±4.55 / 686.64 ms │     661.70 / 668.53 ±4.05 / 672.61 ms │     no change │
│ QQuery 28 │ 3023.35 / 3082.51 ±33.21 / 3109.21 ms │ 2884.78 / 2916.44 ±27.82 / 2967.11 ms │ +1.06x faster │
│ QQuery 29 │        42.82 / 45.48 ±3.98 / 53.35 ms │        44.84 / 52.62 ±9.95 / 71.38 ms │  1.16x slower │
│ QQuery 30 │     332.14 / 336.85 ±3.47 / 341.46 ms │     307.29 / 314.59 ±5.38 / 321.28 ms │ +1.07x faster │
│ QQuery 31 │     324.74 / 332.03 ±8.72 / 348.32 ms │     307.14 / 310.85 ±3.22 / 315.91 ms │ +1.07x faster │
│ QQuery 32 │ 1029.83 / 1053.38 ±19.89 / 1085.12 ms │  1006.45 / 1013.49 ±5.73 / 1021.33 ms │     no change │
│ QQuery 33 │ 1440.14 / 1454.26 ±12.14 / 1475.18 ms │ 1453.04 / 1501.15 ±25.86 / 1526.86 ms │     no change │
│ QQuery 34 │ 1448.28 / 1533.01 ±44.76 / 1573.40 ms │ 1469.70 / 1505.32 ±37.21 / 1560.71 ms │     no change │
│ QQuery 35 │    290.48 / 316.31 ±17.30 / 343.10 ms │     285.88 / 294.59 ±5.84 / 301.17 ms │ +1.07x faster │
│ QQuery 36 │        62.20 / 67.70 ±4.78 / 73.75 ms │        61.83 / 70.72 ±6.25 / 77.46 ms │     no change │
│ QQuery 37 │        36.22 / 43.07 ±5.87 / 50.22 ms │        35.46 / 41.20 ±6.27 / 52.35 ms │     no change │
│ QQuery 38 │        42.32 / 44.87 ±2.64 / 49.98 ms │        40.63 / 42.79 ±1.99 / 46.55 ms │     no change │
│ QQuery 39 │     124.48 / 134.43 ±7.64 / 147.38 ms │     131.20 / 142.08 ±7.61 / 150.31 ms │  1.06x slower │
│ QQuery 40 │        14.62 / 15.39 ±1.14 / 17.61 ms │        15.49 / 16.25 ±1.33 / 18.90 ms │  1.06x slower │
│ QQuery 41 │        13.88 / 15.12 ±1.80 / 18.63 ms │        14.61 / 17.72 ±3.72 / 22.42 ms │  1.17x slower │
│ QQuery 42 │        13.40 / 13.67 ±0.18 / 13.88 ms │        14.20 / 14.56 ±0.27 / 14.99 ms │  1.06x slower │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 20573.92ms │
│ Total Time (row-group-morselization)   │ 20194.93ms │
│ Average Time (HEAD)                    │   478.46ms │
│ Average Time (row-group-morselization) │   469.65ms │
│ Queries Faster                         │          4 │
│ Queries Slower                         │         14 │
│ Queries with No Change                 │         25 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 105.0s
Peak memory 30.2 GiB
Avg memory 22.7 GiB
CPU user 1090.3s
CPU sys 66.6s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 105.0s
Peak memory 30.4 GiB
Avg memory 23.1 GiB
CPU user 1090.3s
CPU sys 66.7s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and row-group-morselization
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                      HEAD ┃                   row-group-morselization ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │               6.82 / 7.34 ±0.74 / 8.78 ms │               6.92 / 7.35 ±0.83 / 9.02 ms │     no change │
│ QQuery 2  │         148.66 / 149.60 ±0.64 / 150.35 ms │         143.32 / 143.73 ±0.23 / 143.98 ms │     no change │
│ QQuery 3  │         115.44 / 116.23 ±0.70 / 117.07 ms │         113.22 / 113.46 ±0.27 / 113.96 ms │     no change │
│ QQuery 4  │     1329.04 / 1422.20 ±49.48 / 1467.94 ms │      1315.95 / 1323.65 ±8.29 / 1338.75 ms │ +1.07x faster │
│ QQuery 5  │         172.16 / 173.81 ±2.16 / 177.98 ms │         172.87 / 173.82 ±0.56 / 174.47 ms │     no change │
│ QQuery 6  │        813.94 / 859.07 ±23.99 / 876.13 ms │        866.53 / 901.96 ±21.50 / 927.25 ms │     no change │
│ QQuery 7  │         332.86 / 346.53 ±7.57 / 354.62 ms │         339.59 / 344.78 ±4.04 / 351.55 ms │     no change │
│ QQuery 8  │         115.83 / 116.68 ±0.72 / 117.79 ms │         114.22 / 115.96 ±1.95 / 119.66 ms │     no change │
│ QQuery 9  │         102.91 / 107.79 ±2.68 / 110.59 ms │         101.46 / 108.07 ±5.11 / 116.62 ms │     no change │
│ QQuery 10 │         105.04 / 105.54 ±0.45 / 106.26 ms │         102.25 / 102.72 ±0.38 / 103.41 ms │     no change │
│ QQuery 11 │       872.96 / 929.08 ±66.00 / 1056.78 ms │       919.37 / 959.91 ±44.08 / 1037.01 ms │     no change │
│ QQuery 12 │            43.34 / 43.62 ±0.16 / 43.78 ms │            46.30 / 46.58 ±0.26 / 47.08 ms │  1.07x slower │
│ QQuery 13 │         385.94 / 389.14 ±2.83 / 394.27 ms │         407.11 / 411.56 ±2.55 / 414.38 ms │  1.06x slower │
│ QQuery 14 │        977.34 / 989.30 ±8.97 / 1004.28 ms │      992.87 / 1009.09 ±10.97 / 1023.50 ms │     no change │
│ QQuery 15 │            15.76 / 15.94 ±0.14 / 16.10 ms │            14.73 / 15.28 ±0.56 / 16.31 ms │     no change │
│ QQuery 16 │               7.87 / 8.01 ±0.12 / 8.19 ms │               7.29 / 7.45 ±0.17 / 7.77 ms │ +1.08x faster │
│ QQuery 17 │         232.43 / 233.70 ±1.07 / 235.43 ms │         222.41 / 224.63 ±1.43 / 226.67 ms │     no change │
│ QQuery 18 │         127.95 / 129.77 ±2.72 / 135.12 ms │         122.39 / 123.19 ±0.63 / 123.83 ms │ +1.05x faster │
│ QQuery 19 │         163.63 / 167.15 ±2.47 / 170.95 ms │         152.87 / 154.93 ±2.30 / 159.22 ms │ +1.08x faster │
│ QQuery 20 │            13.98 / 14.16 ±0.17 / 14.37 ms │            13.26 / 13.50 ±0.19 / 13.81 ms │     no change │
│ QQuery 21 │            20.28 / 20.42 ±0.18 / 20.77 ms │            19.26 / 19.51 ±0.18 / 19.80 ms │     no change │
│ QQuery 22 │        471.61 / 500.47 ±17.53 / 520.99 ms │        481.95 / 490.43 ±15.52 / 521.43 ms │     no change │
│ QQuery 23 │         813.06 / 824.55 ±8.80 / 838.05 ms │        854.21 / 908.91 ±27.73 / 930.01 ms │  1.10x slower │
│ QQuery 24 │         374.31 / 380.02 ±5.35 / 389.86 ms │         372.77 / 377.99 ±7.26 / 392.34 ms │     no change │
│ QQuery 25 │         332.67 / 336.37 ±2.45 / 340.25 ms │         331.23 / 333.47 ±1.67 / 336.21 ms │     no change │
│ QQuery 26 │            77.56 / 78.78 ±2.01 / 82.78 ms │            76.80 / 78.24 ±1.45 / 80.77 ms │     no change │
│ QQuery 27 │               6.88 / 7.00 ±0.12 / 7.23 ms │               6.93 / 7.08 ±0.16 / 7.39 ms │     no change │
│ QQuery 28 │         148.32 / 149.01 ±0.56 / 149.92 ms │         148.92 / 150.62 ±1.63 / 153.60 ms │     no change │
│ QQuery 29 │         272.18 / 275.92 ±5.97 / 287.82 ms │         269.89 / 274.45 ±4.57 / 282.82 ms │     no change │
│ QQuery 30 │            41.07 / 41.11 ±0.03 / 41.15 ms │            40.71 / 41.07 ±0.19 / 41.27 ms │     no change │
│ QQuery 31 │         162.30 / 165.51 ±2.36 / 168.85 ms │         163.91 / 165.16 ±1.25 / 167.27 ms │     no change │
│ QQuery 32 │            12.94 / 13.54 ±0.37 / 14.08 ms │            12.87 / 13.29 ±0.24 / 13.58 ms │     no change │
│ QQuery 33 │         137.16 / 139.48 ±1.81 / 142.35 ms │         138.24 / 139.49 ±1.10 / 141.18 ms │     no change │
│ QQuery 34 │               6.74 / 6.95 ±0.21 / 7.35 ms │               6.95 / 7.13 ±0.15 / 7.38 ms │     no change │
│ QQuery 35 │          99.28 / 100.45 ±1.20 / 102.56 ms │         101.27 / 102.18 ±0.62 / 103.14 ms │     no change │
│ QQuery 36 │               6.87 / 6.92 ±0.04 / 6.97 ms │               6.62 / 6.89 ±0.15 / 7.02 ms │     no change │
│ QQuery 37 │               8.09 / 8.21 ±0.07 / 8.28 ms │               8.10 / 8.20 ±0.10 / 8.39 ms │     no change │
│ QQuery 38 │            85.10 / 86.42 ±1.16 / 88.33 ms │            85.50 / 87.86 ±1.45 / 89.41 ms │     no change │
│ QQuery 39 │         117.21 / 118.28 ±0.70 / 119.37 ms │         116.27 / 118.54 ±1.46 / 120.27 ms │     no change │
│ QQuery 40 │         102.68 / 105.44 ±3.34 / 111.42 ms │         102.45 / 108.61 ±6.28 / 119.42 ms │     no change │
│ QQuery 41 │            13.96 / 14.26 ±0.44 / 15.13 ms │            15.10 / 15.23 ±0.13 / 15.44 ms │  1.07x slower │
│ QQuery 42 │         105.78 / 107.40 ±2.53 / 112.42 ms │         109.47 / 112.60 ±4.06 / 120.57 ms │     no change │
│ QQuery 43 │               5.57 / 5.73 ±0.22 / 6.17 ms │               6.18 / 6.37 ±0.13 / 6.57 ms │  1.11x slower │
│ QQuery 44 │            11.30 / 11.42 ±0.07 / 11.48 ms │            12.87 / 12.96 ±0.09 / 13.10 ms │  1.13x slower │
│ QQuery 45 │            48.63 / 49.41 ±0.78 / 50.89 ms │            50.74 / 51.31 ±0.42 / 51.91 ms │     no change │
│ QQuery 46 │               8.28 / 8.47 ±0.22 / 8.90 ms │               9.12 / 9.28 ±0.14 / 9.54 ms │  1.09x slower │
│ QQuery 47 │        728.81 / 819.29 ±45.82 / 846.89 ms │        816.36 / 847.00 ±16.34 / 859.58 ms │     no change │
│ QQuery 48 │         284.98 / 290.25 ±4.05 / 294.59 ms │         272.37 / 276.11 ±2.92 / 279.58 ms │     no change │
│ QQuery 49 │         250.11 / 252.79 ±2.47 / 256.48 ms │         249.63 / 251.47 ±1.17 / 253.24 ms │     no change │
│ QQuery 50 │         201.17 / 209.37 ±7.20 / 222.16 ms │         200.25 / 203.24 ±2.75 / 207.97 ms │     no change │
│ QQuery 51 │         175.57 / 179.12 ±3.09 / 184.60 ms │         175.34 / 177.09 ±1.27 / 179.21 ms │     no change │
│ QQuery 52 │         106.64 / 107.10 ±0.37 / 107.66 ms │         105.52 / 105.77 ±0.23 / 106.07 ms │     no change │
│ QQuery 53 │         101.80 / 103.04 ±1.67 / 106.32 ms │         101.17 / 101.92 ±0.92 / 103.40 ms │     no change │
│ QQuery 54 │         141.46 / 142.71 ±0.93 / 143.54 ms │         142.57 / 144.15 ±2.22 / 148.52 ms │     no change │
│ QQuery 55 │         104.84 / 106.06 ±1.34 / 108.63 ms │         105.38 / 106.14 ±0.52 / 106.81 ms │     no change │
│ QQuery 56 │         139.36 / 144.14 ±3.17 / 148.33 ms │         138.17 / 139.05 ±0.58 / 139.82 ms │     no change │
│ QQuery 57 │         170.61 / 172.51 ±1.50 / 174.43 ms │         163.22 / 164.53 ±1.29 / 166.52 ms │     no change │
│ QQuery 58 │         317.08 / 318.08 ±0.82 / 319.26 ms │         307.52 / 308.95 ±1.66 / 311.60 ms │     no change │
│ QQuery 59 │         204.97 / 206.87 ±1.67 / 209.78 ms │         192.82 / 196.08 ±2.46 / 199.59 ms │ +1.06x faster │
│ QQuery 60 │         145.12 / 147.44 ±2.39 / 151.91 ms │         139.90 / 141.32 ±1.24 / 143.13 ms │     no change │
│ QQuery 61 │            14.12 / 14.23 ±0.13 / 14.49 ms │            13.29 / 13.46 ±0.16 / 13.74 ms │ +1.06x faster │
│ QQuery 62 │        862.81 / 892.11 ±27.58 / 939.48 ms │        866.86 / 881.08 ±20.56 / 919.82 ms │     no change │
│ QQuery 63 │         101.60 / 103.05 ±0.98 / 104.57 ms │         103.89 / 105.07 ±1.28 / 107.40 ms │     no change │
│ QQuery 64 │        659.11 / 681.32 ±20.55 / 708.00 ms │         696.28 / 704.81 ±5.69 / 712.04 ms │     no change │
│ QQuery 65 │         272.73 / 276.42 ±2.36 / 279.38 ms │         245.86 / 257.99 ±9.13 / 267.19 ms │ +1.07x faster │
│ QQuery 66 │         227.95 / 235.80 ±7.12 / 247.78 ms │         211.98 / 218.50 ±8.35 / 234.60 ms │ +1.08x faster │
│ QQuery 67 │         299.55 / 303.25 ±4.14 / 310.44 ms │        287.16 / 294.77 ±11.60 / 317.85 ms │     no change │
│ QQuery 68 │               8.48 / 8.65 ±0.24 / 9.12 ms │               8.28 / 8.52 ±0.23 / 8.95 ms │     no change │
│ QQuery 69 │           96.00 / 97.73 ±2.23 / 102.13 ms │           96.45 / 97.84 ±2.22 / 102.20 ms │     no change │
│ QQuery 70 │         312.52 / 318.54 ±4.70 / 324.32 ms │        309.07 / 326.25 ±15.47 / 351.28 ms │     no change │
│ QQuery 71 │         131.87 / 133.31 ±1.27 / 135.66 ms │         135.30 / 137.82 ±2.53 / 142.36 ms │     no change │
│ QQuery 72 │         582.22 / 592.01 ±7.59 / 601.92 ms │         623.05 / 627.82 ±2.72 / 630.50 ms │  1.06x slower │
│ QQuery 73 │               6.62 / 6.78 ±0.16 / 7.06 ms │               7.23 / 7.35 ±0.13 / 7.58 ms │  1.08x slower │
│ QQuery 74 │         530.60 / 542.72 ±9.71 / 553.57 ms │        560.39 / 634.47 ±42.13 / 674.60 ms │  1.17x slower │
│ QQuery 75 │         265.75 / 269.17 ±2.80 / 273.29 ms │         264.22 / 268.90 ±4.39 / 277.10 ms │     no change │
│ QQuery 76 │         129.68 / 130.75 ±1.35 / 133.27 ms │         128.85 / 129.76 ±0.81 / 131.14 ms │     no change │
│ QQuery 77 │         190.80 / 191.79 ±1.28 / 194.30 ms │         186.32 / 187.11 ±1.10 / 189.20 ms │     no change │
│ QQuery 78 │         349.04 / 349.85 ±0.81 / 350.98 ms │         324.34 / 328.43 ±3.25 / 334.04 ms │ +1.07x faster │
│ QQuery 79 │         255.75 / 258.45 ±3.50 / 265.38 ms │         225.69 / 226.21 ±0.45 / 226.95 ms │ +1.14x faster │
│ QQuery 80 │         320.91 / 323.97 ±2.19 / 327.24 ms │         320.66 / 324.25 ±2.85 / 329.41 ms │     no change │
│ QQuery 81 │            26.42 / 26.71 ±0.25 / 27.01 ms │            27.34 / 27.65 ±0.25 / 28.10 ms │     no change │
│ QQuery 82 │            38.54 / 39.07 ±0.50 / 39.77 ms │            40.71 / 41.83 ±0.81 / 43.09 ms │  1.07x slower │
│ QQuery 83 │            36.39 / 36.79 ±0.38 / 37.50 ms │            38.26 / 38.77 ±0.51 / 39.74 ms │  1.05x slower │
│ QQuery 84 │            45.89 / 46.06 ±0.13 / 46.21 ms │            47.59 / 47.75 ±0.22 / 48.18 ms │     no change │
│ QQuery 85 │         138.86 / 139.88 ±0.98 / 141.74 ms │         144.02 / 145.38 ±1.54 / 148.26 ms │     no change │
│ QQuery 86 │            36.86 / 37.29 ±0.33 / 37.73 ms │            38.06 / 38.61 ±0.43 / 39.28 ms │     no change │
│ QQuery 87 │               3.40 / 3.51 ±0.16 / 3.83 ms │               3.76 / 3.88 ±0.14 / 4.13 ms │  1.10x slower │
│ QQuery 88 │           98.65 / 99.97 ±1.07 / 101.89 ms │         102.14 / 104.31 ±1.78 / 106.49 ms │     no change │
│ QQuery 89 │         115.71 / 118.80 ±5.21 / 129.18 ms │         120.04 / 120.42 ±0.36 / 120.95 ms │     no change │
│ QQuery 90 │            21.58 / 22.33 ±0.56 / 23.16 ms │            23.35 / 24.05 ±0.78 / 25.36 ms │  1.08x slower │
│ QQuery 91 │            57.05 / 58.20 ±0.84 / 59.40 ms │            58.05 / 59.73 ±1.50 / 61.76 ms │     no change │
│ QQuery 92 │            55.37 / 55.98 ±0.43 / 56.63 ms │            55.42 / 55.81 ±0.58 / 56.97 ms │     no change │
│ QQuery 93 │         178.28 / 180.31 ±1.30 / 182.14 ms │         178.19 / 182.38 ±3.14 / 186.66 ms │     no change │
│ QQuery 94 │            59.89 / 60.75 ±0.97 / 62.56 ms │            59.34 / 59.84 ±0.33 / 60.22 ms │     no change │
│ QQuery 95 │         123.54 / 124.64 ±0.76 / 125.93 ms │         124.50 / 125.60 ±0.97 / 127.37 ms │     no change │
│ QQuery 96 │            68.15 / 71.28 ±3.39 / 77.85 ms │            66.92 / 67.10 ±0.18 / 67.44 ms │ +1.06x faster │
│ QQuery 97 │         116.23 / 118.10 ±1.24 / 119.58 ms │         115.34 / 115.92 ±0.49 / 116.77 ms │     no change │
│ QQuery 98 │         150.22 / 151.96 ±2.03 / 155.38 ms │         147.86 / 149.79 ±1.10 / 151.10 ms │     no change │
│ QQuery 99 │ 10775.28 / 11017.01 ±136.34 / 11196.90 ms │ 10809.31 / 11003.35 ±115.32 / 11146.77 ms │     no change │
└───────────┴───────────────────────────────────────────┴───────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 31223.50ms │
│ Total Time (row-group-morselization)   │ 31303.93ms │
│ Average Time (HEAD)                    │   315.39ms │
│ Average Time (row-group-morselization) │   316.20ms │
│ Queries Faster                         │         11 │
│ Queries Slower                         │         14 │
│ Queries with No Change                 │         74 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 160.0s
Peak memory 6.2 GiB
Avg memory 5.5 GiB
CPU user 259.0s
CPU sys 8.7s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 160.0s
Peak memory 6.4 GiB
Avg memory 5.5 GiB
CPU user 264.1s
CPU sys 8.8s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and row-group-morselization
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃                  row-group-morselization ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │              6.72 / 7.25 ±0.87 / 8.98 ms │              7.26 / 7.71 ±0.73 / 9.16 ms │  1.06x slower │
│ QQuery 2  │        147.00 / 148.96 ±2.43 / 153.54 ms │        144.35 / 145.43 ±0.96 / 147.22 ms │     no change │
│ QQuery 3  │        112.44 / 113.35 ±0.83 / 114.76 ms │        112.57 / 114.10 ±1.01 / 115.30 ms │     no change │
│ QQuery 4  │    1301.57 / 1344.81 ±29.14 / 1392.29 ms │    1370.20 / 1428.81 ±32.41 / 1470.13 ms │  1.06x slower │
│ QQuery 5  │        172.26 / 174.32 ±1.68 / 176.15 ms │        175.06 / 177.63 ±2.26 / 181.16 ms │     no change │
│ QQuery 6  │       840.07 / 859.53 ±29.19 / 917.67 ms │       861.35 / 882.13 ±18.62 / 908.28 ms │     no change │
│ QQuery 7  │        332.54 / 336.78 ±3.90 / 343.49 ms │        334.02 / 341.50 ±4.20 / 345.41 ms │     no change │
│ QQuery 8  │        112.46 / 114.92 ±1.74 / 116.75 ms │        112.75 / 114.20 ±1.63 / 117.37 ms │     no change │
│ QQuery 9  │        101.43 / 109.90 ±9.16 / 127.31 ms │        100.65 / 103.99 ±3.82 / 110.08 ms │ +1.06x faster │
│ QQuery 10 │        101.20 / 104.05 ±1.79 / 106.55 ms │        101.93 / 104.87 ±3.89 / 112.44 ms │     no change │
│ QQuery 11 │      890.52 / 940.71 ±47.49 / 1029.79 ms │        912.43 / 923.02 ±6.34 / 931.86 ms │     no change │
│ QQuery 12 │           42.90 / 43.90 ±0.86 / 44.98 ms │           43.46 / 43.92 ±0.33 / 44.35 ms │     no change │
│ QQuery 13 │        386.48 / 394.44 ±6.82 / 405.92 ms │        386.36 / 395.62 ±7.13 / 405.85 ms │     no change │
│ QQuery 14 │        982.37 / 986.61 ±2.89 / 990.83 ms │      978.91 / 993.72 ±12.19 / 1014.43 ms │     no change │
│ QQuery 15 │           14.67 / 14.79 ±0.18 / 15.13 ms │           15.70 / 16.04 ±0.19 / 16.24 ms │  1.09x slower │
│ QQuery 16 │              7.28 / 7.48 ±0.15 / 7.74 ms │              7.99 / 8.09 ±0.09 / 8.23 ms │  1.08x slower │
│ QQuery 17 │        220.97 / 224.67 ±3.96 / 231.47 ms │        224.62 / 230.62 ±4.88 / 236.35 ms │     no change │
│ QQuery 18 │        123.42 / 125.95 ±1.99 / 128.16 ms │        122.15 / 125.07 ±2.53 / 129.00 ms │     no change │
│ QQuery 19 │        152.97 / 157.36 ±4.93 / 163.90 ms │        152.87 / 155.96 ±2.31 / 159.52 ms │     no change │
│ QQuery 20 │           12.79 / 13.47 ±0.68 / 14.60 ms │           14.11 / 14.25 ±0.08 / 14.33 ms │  1.06x slower │
│ QQuery 21 │           18.75 / 18.89 ±0.11 / 19.05 ms │           20.26 / 20.63 ±0.22 / 20.88 ms │  1.09x slower │
│ QQuery 22 │       472.40 / 483.39 ±12.48 / 500.26 ms │        484.16 / 490.97 ±6.87 / 502.51 ms │     no change │
│ QQuery 23 │       819.99 / 856.98 ±21.64 / 884.94 ms │        832.64 / 844.22 ±9.86 / 856.58 ms │     no change │
│ QQuery 24 │        374.84 / 388.09 ±9.57 / 400.63 ms │        379.71 / 387.02 ±6.25 / 395.91 ms │     no change │
│ QQuery 25 │        332.89 / 337.90 ±4.33 / 344.75 ms │        333.43 / 342.76 ±5.69 / 350.54 ms │     no change │
│ QQuery 26 │           78.65 / 79.10 ±0.26 / 79.45 ms │           77.66 / 79.24 ±1.42 / 81.34 ms │     no change │
│ QQuery 27 │              7.21 / 7.40 ±0.11 / 7.51 ms │              7.37 / 7.46 ±0.08 / 7.60 ms │     no change │
│ QQuery 28 │        152.99 / 154.60 ±1.83 / 158.13 ms │        148.58 / 151.17 ±2.38 / 154.30 ms │     no change │
│ QQuery 29 │        281.62 / 287.78 ±5.06 / 296.76 ms │        273.57 / 283.22 ±6.55 / 290.37 ms │     no change │
│ QQuery 30 │           43.21 / 43.88 ±0.79 / 45.39 ms │           41.64 / 42.12 ±0.48 / 42.95 ms │     no change │
│ QQuery 31 │        168.32 / 170.34 ±1.34 / 172.04 ms │        164.25 / 167.12 ±1.91 / 170.20 ms │     no change │
│ QQuery 32 │           14.33 / 14.53 ±0.17 / 14.78 ms │           13.33 / 13.57 ±0.25 / 14.05 ms │ +1.07x faster │
│ QQuery 33 │        142.37 / 143.12 ±0.63 / 144.15 ms │        137.87 / 140.96 ±2.25 / 144.08 ms │     no change │
│ QQuery 34 │              7.42 / 7.52 ±0.07 / 7.60 ms │              7.47 / 7.61 ±0.09 / 7.74 ms │     no change │
│ QQuery 35 │        102.59 / 104.51 ±1.63 / 107.22 ms │         99.84 / 101.76 ±2.31 / 106.03 ms │     no change │
│ QQuery 36 │              7.08 / 7.24 ±0.17 / 7.48 ms │              6.59 / 6.75 ±0.15 / 6.96 ms │ +1.07x faster │
│ QQuery 37 │              8.70 / 8.81 ±0.11 / 9.01 ms │              8.19 / 8.28 ±0.06 / 8.37 ms │ +1.06x faster │
│ QQuery 38 │           88.82 / 90.80 ±1.62 / 93.75 ms │           90.15 / 91.67 ±1.69 / 94.96 ms │     no change │
│ QQuery 39 │        127.23 / 129.42 ±1.19 / 130.60 ms │        116.36 / 120.01 ±3.50 / 124.51 ms │ +1.08x faster │
│ QQuery 40 │        106.69 / 113.99 ±5.11 / 119.75 ms │        102.16 / 112.01 ±5.98 / 118.34 ms │     no change │
│ QQuery 41 │           14.92 / 15.03 ±0.09 / 15.19 ms │           14.00 / 14.17 ±0.26 / 14.70 ms │ +1.06x faster │
│ QQuery 42 │        108.29 / 110.93 ±4.24 / 119.36 ms │        105.26 / 108.78 ±3.79 / 115.88 ms │     no change │
│ QQuery 43 │              6.12 / 6.19 ±0.05 / 6.24 ms │              6.13 / 6.28 ±0.11 / 6.45 ms │     no change │
│ QQuery 44 │           12.29 / 12.36 ±0.04 / 12.39 ms │           12.93 / 12.99 ±0.06 / 13.11 ms │  1.05x slower │
│ QQuery 45 │           49.43 / 50.09 ±0.51 / 50.98 ms │           50.81 / 51.24 ±0.31 / 51.68 ms │     no change │
│ QQuery 46 │              8.27 / 8.44 ±0.20 / 8.83 ms │              8.57 / 8.75 ±0.19 / 9.11 ms │     no change │
│ QQuery 47 │       714.16 / 765.04 ±29.35 / 802.87 ms │       713.81 / 732.01 ±16.87 / 755.31 ms │     no change │
│ QQuery 48 │        271.74 / 280.82 ±8.32 / 294.11 ms │        271.61 / 280.31 ±5.67 / 287.94 ms │     no change │
│ QQuery 49 │        248.31 / 251.25 ±3.28 / 256.59 ms │        249.90 / 251.43 ±0.87 / 252.31 ms │     no change │
│ QQuery 50 │       200.12 / 212.65 ±10.42 / 225.60 ms │        206.63 / 220.53 ±7.48 / 227.78 ms │     no change │
│ QQuery 51 │        176.20 / 179.10 ±3.09 / 184.51 ms │        175.74 / 177.86 ±1.75 / 180.04 ms │     no change │
│ QQuery 52 │        106.24 / 108.56 ±2.49 / 113.23 ms │        106.61 / 107.86 ±1.72 / 111.15 ms │     no change │
│ QQuery 53 │        101.60 / 102.18 ±0.50 / 102.85 ms │        103.02 / 104.50 ±1.23 / 106.35 ms │     no change │
│ QQuery 54 │        143.65 / 147.54 ±3.08 / 152.16 ms │        148.34 / 149.92 ±1.22 / 151.79 ms │     no change │
│ QQuery 55 │        105.65 / 107.27 ±1.33 / 108.77 ms │        106.92 / 108.41 ±1.14 / 109.95 ms │     no change │
│ QQuery 56 │        138.81 / 141.10 ±2.97 / 146.83 ms │        138.86 / 139.87 ±0.99 / 141.26 ms │     no change │
│ QQuery 57 │        164.71 / 168.90 ±4.61 / 177.16 ms │        165.46 / 168.47 ±1.70 / 170.44 ms │     no change │
│ QQuery 58 │        310.26 / 312.20 ±2.04 / 315.77 ms │        315.13 / 317.56 ±1.96 / 320.57 ms │     no change │
│ QQuery 59 │        195.51 / 199.68 ±3.96 / 206.16 ms │        202.05 / 204.04 ±1.71 / 207.03 ms │     no change │
│ QQuery 60 │        139.33 / 141.09 ±1.04 / 142.21 ms │        143.89 / 145.44 ±1.01 / 146.85 ms │     no change │
│ QQuery 61 │           13.74 / 13.96 ±0.14 / 14.12 ms │           14.15 / 14.25 ±0.12 / 14.43 ms │     no change │
│ QQuery 62 │        904.73 / 918.22 ±9.80 / 932.05 ms │        916.60 / 925.80 ±6.20 / 933.02 ms │     no change │
│ QQuery 63 │        101.24 / 103.65 ±1.51 / 105.59 ms │        102.83 / 105.25 ±1.28 / 106.56 ms │     no change │
│ QQuery 64 │       663.33 / 677.82 ±17.04 / 710.23 ms │        679.36 / 690.90 ±8.21 / 704.02 ms │     no change │
│ QQuery 65 │       240.18 / 255.20 ±11.58 / 268.75 ms │        245.66 / 254.84 ±8.87 / 268.56 ms │     no change │
│ QQuery 66 │       212.34 / 224.35 ±11.12 / 241.98 ms │       214.52 / 226.01 ±12.30 / 247.84 ms │     no change │
│ QQuery 67 │       288.49 / 306.64 ±15.23 / 329.33 ms │       290.39 / 308.84 ±15.92 / 332.90 ms │     no change │
│ QQuery 68 │              9.20 / 9.35 ±0.13 / 9.53 ms │              9.03 / 9.26 ±0.21 / 9.53 ms │     no change │
│ QQuery 69 │         97.01 / 100.50 ±4.38 / 109.05 ms │           97.41 / 98.15 ±0.57 / 99.11 ms │     no change │
│ QQuery 70 │       315.05 / 329.60 ±16.41 / 352.89 ms │        311.43 / 325.08 ±8.66 / 336.22 ms │     no change │
│ QQuery 71 │        133.50 / 135.07 ±1.04 / 136.70 ms │        132.02 / 134.09 ±2.00 / 137.26 ms │     no change │
│ QQuery 72 │       600.24 / 622.88 ±15.14 / 643.48 ms │       587.49 / 600.22 ±11.27 / 617.95 ms │     no change │
│ QQuery 73 │              6.61 / 6.80 ±0.24 / 7.27 ms │              7.27 / 7.37 ±0.07 / 7.48 ms │  1.08x slower │
│ QQuery 74 │       556.73 / 593.88 ±21.03 / 611.89 ms │       575.69 / 621.74 ±37.53 / 657.85 ms │     no change │
│ QQuery 75 │        269.70 / 272.30 ±2.17 / 275.15 ms │        267.51 / 272.12 ±4.37 / 277.90 ms │     no change │
│ QQuery 76 │        131.55 / 134.37 ±2.70 / 138.73 ms │        130.03 / 135.21 ±3.02 / 139.08 ms │     no change │
│ QQuery 77 │        187.98 / 190.25 ±1.41 / 191.91 ms │        186.14 / 190.49 ±3.65 / 196.20 ms │     no change │
│ QQuery 78 │        330.21 / 334.97 ±4.67 / 342.92 ms │        329.77 / 338.57 ±9.72 / 352.35 ms │     no change │
│ QQuery 79 │       228.09 / 239.95 ±11.69 / 254.41 ms │       225.47 / 235.52 ±11.52 / 257.32 ms │     no change │
│ QQuery 80 │        318.93 / 321.03 ±1.90 / 323.99 ms │        318.39 / 322.48 ±3.84 / 327.74 ms │     no change │
│ QQuery 81 │           26.03 / 26.83 ±0.87 / 28.37 ms │           27.30 / 27.91 ±1.00 / 29.91 ms │     no change │
│ QQuery 82 │           40.85 / 41.14 ±0.23 / 41.50 ms │           39.45 / 39.94 ±0.64 / 41.17 ms │     no change │
│ QQuery 83 │           37.35 / 38.65 ±0.71 / 39.39 ms │           36.29 / 36.81 ±0.33 / 37.27 ms │     no change │
│ QQuery 84 │           45.97 / 46.11 ±0.16 / 46.42 ms │           45.79 / 46.03 ±0.17 / 46.18 ms │     no change │
│ QQuery 85 │        140.03 / 141.10 ±0.93 / 142.61 ms │        139.13 / 142.40 ±2.55 / 145.90 ms │     no change │
│ QQuery 86 │           37.83 / 38.35 ±0.39 / 38.89 ms │           37.12 / 37.40 ±0.40 / 38.17 ms │     no change │
│ QQuery 87 │              3.70 / 3.79 ±0.10 / 3.96 ms │              3.49 / 3.58 ±0.12 / 3.82 ms │ +1.06x faster │
│ QQuery 88 │        101.34 / 102.02 ±0.82 / 103.62 ms │         99.34 / 102.50 ±2.67 / 106.09 ms │     no change │
│ QQuery 89 │        117.57 / 121.15 ±5.88 / 132.86 ms │        115.99 / 117.03 ±0.73 / 117.87 ms │     no change │
│ QQuery 90 │           22.84 / 23.17 ±0.27 / 23.57 ms │           21.78 / 22.29 ±0.37 / 22.71 ms │     no change │
│ QQuery 91 │           58.71 / 59.36 ±0.50 / 59.94 ms │           59.09 / 61.22 ±1.36 / 63.18 ms │     no change │
│ QQuery 92 │           55.65 / 56.46 ±0.60 / 57.19 ms │           55.69 / 56.22 ±0.58 / 57.32 ms │     no change │
│ QQuery 93 │        179.75 / 182.86 ±3.60 / 189.63 ms │        181.25 / 185.73 ±4.89 / 192.60 ms │     no change │
│ QQuery 94 │           59.76 / 60.25 ±0.29 / 60.57 ms │           59.96 / 60.46 ±0.43 / 61.19 ms │     no change │
│ QQuery 95 │        127.53 / 127.98 ±0.38 / 128.68 ms │        126.81 / 128.55 ±1.20 / 130.26 ms │     no change │
│ QQuery 96 │           68.08 / 69.24 ±0.88 / 70.71 ms │           67.79 / 68.74 ±1.02 / 70.53 ms │     no change │
│ QQuery 97 │        116.56 / 120.49 ±3.35 / 125.32 ms │        117.44 / 124.57 ±3.87 / 128.28 ms │     no change │
│ QQuery 98 │        149.80 / 152.24 ±2.20 / 155.71 ms │        157.39 / 159.23 ±1.35 / 161.52 ms │     no change │
│ QQuery 99 │ 10961.66 / 11025.99 ±43.05 / 11091.35 ms │ 10882.92 / 10985.28 ±90.19 / 11091.59 ms │     no change │
└───────────┴──────────────────────────────────────────┴──────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 31219.91ms │
│ Total Time (row-group-morselization)   │ 31277.71ms │
│ Average Time (HEAD)                    │   315.35ms │
│ Average Time (row-group-morselization) │   315.94ms │
│ Queries Faster                         │          7 │
│ Queries Slower                         │          8 │
│ Queries with No Change                 │         84 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 160.0s
Peak memory 6.2 GiB
Avg memory 5.5 GiB
CPU user 261.9s
CPU sys 8.7s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 160.0s
Peak memory 6.3 GiB
Avg memory 5.6 GiB
CPU user 262.0s
CPU sys 8.5s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and row-group-morselization
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃               row-group-morselization ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.20 / 4.54 ±6.64 / 17.82 ms │          1.16 / 4.52 ±6.65 / 17.82 ms │     no change │
│ QQuery 1  │        13.23 / 13.53 ±0.29 / 14.05 ms │        14.74 / 15.01 ±0.21 / 15.23 ms │  1.11x slower │
│ QQuery 2  │        38.36 / 38.77 ±0.31 / 39.23 ms │        39.43 / 40.04 ±0.63 / 41.02 ms │     no change │
│ QQuery 3  │        32.82 / 33.51 ±0.70 / 34.73 ms │        35.74 / 36.37 ±0.33 / 36.70 ms │  1.09x slower │
│ QQuery 4  │    239.46 / 257.84 ±11.20 / 273.98 ms │    240.90 / 256.68 ±13.47 / 276.98 ms │     no change │
│ QQuery 5  │     285.70 / 298.07 ±6.73 / 305.99 ms │     286.48 / 298.31 ±9.43 / 309.14 ms │     no change │
│ QQuery 6  │           6.32 / 7.53 ±0.75 / 8.52 ms │           6.86 / 7.90 ±0.68 / 8.86 ms │     no change │
│ QQuery 7  │        14.57 / 14.75 ±0.14 / 15.00 ms │        17.85 / 18.03 ±0.20 / 18.35 ms │  1.22x slower │
│ QQuery 8  │     330.39 / 340.82 ±8.96 / 353.99 ms │    328.72 / 338.04 ±13.19 / 363.63 ms │     no change │
│ QQuery 9  │    495.82 / 523.43 ±24.11 / 562.37 ms │    510.51 / 525.84 ±13.51 / 546.44 ms │     no change │
│ QQuery 10 │        74.59 / 75.67 ±1.17 / 77.82 ms │        76.93 / 77.37 ±0.67 / 78.70 ms │     no change │
│ QQuery 11 │        85.77 / 87.15 ±0.98 / 88.27 ms │        92.63 / 93.63 ±0.60 / 94.17 ms │  1.07x slower │
│ QQuery 12 │     286.81 / 297.30 ±8.39 / 308.05 ms │    275.98 / 288.78 ±10.47 / 304.75 ms │     no change │
│ QQuery 13 │     432.82 / 440.13 ±6.25 / 449.96 ms │     412.01 / 418.79 ±5.29 / 428.08 ms │     no change │
│ QQuery 14 │     307.11 / 311.45 ±4.86 / 319.61 ms │     297.26 / 305.27 ±5.65 / 312.04 ms │     no change │
│ QQuery 15 │     311.21 / 315.12 ±3.96 / 322.38 ms │    289.02 / 302.00 ±17.03 / 334.64 ms │     no change │
│ QQuery 16 │    650.76 / 669.58 ±13.01 / 690.36 ms │     645.45 / 657.77 ±7.75 / 665.55 ms │     no change │
│ QQuery 17 │    638.97 / 657.88 ±11.83 / 671.67 ms │    632.16 / 642.12 ±14.72 / 670.39 ms │     no change │
│ QQuery 18 │ 1295.60 / 1316.40 ±13.43 / 1336.78 ms │ 1302.97 / 1359.37 ±34.50 / 1409.67 ms │     no change │
│ QQuery 19 │        29.10 / 30.75 ±2.68 / 36.04 ms │       32.30 / 47.62 ±18.83 / 78.25 ms │  1.55x slower │
│ QQuery 20 │    523.92 / 534.95 ±12.39 / 558.15 ms │     503.89 / 511.81 ±6.62 / 522.76 ms │     no change │
│ QQuery 21 │     607.47 / 612.95 ±3.98 / 617.50 ms │     596.12 / 603.03 ±3.92 / 607.75 ms │     no change │
│ QQuery 22 │ 1081.86 / 1091.35 ±10.96 / 1110.96 ms │  1042.41 / 1051.65 ±8.56 / 1067.20 ms │     no change │
│ QQuery 23 │ 3381.18 / 3404.63 ±18.00 / 3425.77 ms │ 3278.51 / 3336.84 ±52.82 / 3412.53 ms │     no change │
│ QQuery 24 │       42.06 / 50.86 ±15.29 / 81.41 ms │        46.26 / 47.61 ±1.01 / 49.34 ms │ +1.07x faster │
│ QQuery 25 │     114.08 / 119.58 ±5.16 / 128.63 ms │     116.40 / 124.75 ±7.08 / 136.88 ms │     no change │
│ QQuery 26 │        42.67 / 44.24 ±1.03 / 45.41 ms │        46.87 / 48.92 ±3.06 / 54.99 ms │  1.11x slower │
│ QQuery 27 │     667.82 / 679.71 ±7.17 / 689.22 ms │     662.79 / 675.39 ±7.78 / 682.62 ms │     no change │
│ QQuery 28 │ 3038.66 / 3094.52 ±35.63 / 3136.19 ms │  2897.70 / 2911.16 ±9.19 / 2922.06 ms │ +1.06x faster │
│ QQuery 29 │        44.47 / 48.01 ±4.42 / 56.35 ms │        47.92 / 51.44 ±5.32 / 61.92 ms │  1.07x slower │
│ QQuery 30 │    312.54 / 327.85 ±12.58 / 342.48 ms │     313.26 / 324.56 ±7.38 / 333.77 ms │     no change │
│ QQuery 31 │    306.90 / 323.35 ±10.79 / 340.15 ms │     320.76 / 328.00 ±6.12 / 335.48 ms │     no change │
│ QQuery 32 │ 1036.43 / 1059.40 ±21.58 / 1098.12 ms │ 1020.10 / 1052.81 ±26.36 / 1097.36 ms │     no change │
│ QQuery 33 │ 1450.58 / 1503.71 ±36.86 / 1553.76 ms │ 1568.80 / 1599.31 ±21.82 / 1631.94 ms │  1.06x slower │
│ QQuery 34 │ 1508.04 / 1520.42 ±10.69 / 1539.25 ms │ 1486.04 / 1544.37 ±48.86 / 1630.63 ms │     no change │
│ QQuery 35 │    289.32 / 332.04 ±29.22 / 369.42 ms │    292.36 / 324.77 ±34.69 / 389.93 ms │     no change │
│ QQuery 36 │        65.51 / 70.41 ±5.24 / 79.62 ms │        63.54 / 66.29 ±2.15 / 69.07 ms │ +1.06x faster │
│ QQuery 37 │        35.76 / 38.80 ±3.16 / 43.78 ms │        35.32 / 42.46 ±7.29 / 54.98 ms │  1.09x slower │
│ QQuery 38 │        40.98 / 45.55 ±6.46 / 58.36 ms │        41.42 / 47.29 ±5.62 / 54.15 ms │     no change │
│ QQuery 39 │     135.23 / 142.74 ±5.79 / 152.85 ms │    120.20 / 132.07 ±10.80 / 150.86 ms │ +1.08x faster │
│ QQuery 40 │        14.62 / 14.97 ±0.39 / 15.70 ms │        14.63 / 17.51 ±5.09 / 27.67 ms │  1.17x slower │
│ QQuery 41 │        13.84 / 14.05 ±0.16 / 14.33 ms │        15.17 / 16.77 ±2.71 / 22.17 ms │  1.19x slower │
│ QQuery 42 │        13.46 / 15.11 ±3.01 / 21.12 ms │        14.66 / 17.05 ±2.57 / 21.83 ms │  1.13x slower │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 20823.44ms │
│ Total Time (row-group-morselization)   │ 20609.33ms │
│ Average Time (HEAD)                    │   484.27ms │
│ Average Time (row-group-morselization) │   479.29ms │
│ Queries Faster                         │          4 │
│ Queries Slower                         │         12 │
│ Queries with No Change                 │         27 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 105.0s
Peak memory 29.8 GiB
Avg memory 23.1 GiB
CPU user 1105.3s
CPU sys 65.0s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 105.0s
Peak memory 31.2 GiB
Avg memory 23.4 GiB
CPU user 1114.3s
CPU sys 67.6s
Peak spill 0 B

File an issue against this benchmark runner

@Dandandan
Copy link
Copy Markdown
Contributor Author

run benchmark clickbench_1

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4291385422-1709-g6qqr 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-group-morselization (16967cd) to 9a1ed57 (merge-base) diff using: clickbench_1
Results will be posted here when complete


File an issue against this benchmark runner

…ealers

`build_row_filter` does two things at very different costs:

1. Walk the predicate, split conjuncts, build a `FilterCandidate` per
   conjunct (resolves ProjectionMask + projected schema + required-bytes
   estimate), and optionally reorder by cost. This is schema/metadata
   work that is identical for every open of the same file.

2. Bind each candidate to the current open's metrics counters.

Before this change, both ran per open — so a 226-RG file split into
226 chunks paid the analysis cost 226×. After this change, the donor
(or an un-split file open) builds the `Vec<FilterCandidate>` once in
`prepare_filters`; donated chunks carry it through `ParquetOpenChunk`;
each `build_stream` does only the cheap metric binding via the new
`row_filter_from_candidates`.

Refactor:
- Split `row_filter::build_row_filter` into
  `build_row_filter_candidates` (expensive, metrics-free) and
  `row_filter_from_candidates` (cheap, per-open). `build_row_filter`
  becomes a thin wrapper.
- `FilterCandidate` now `Clone`.
- `FiltersPreparedParquetOpen` gains
  `row_filter_candidates: Option<Arc<Vec<FilterCandidate>>>`, built in
  `prepare_filters` from the donor's rewritten predicate.
- `ParquetOpenChunk` carries the same `Arc` across the handoff so
  stealers reuse it in `build_stream`.
- `build_stream` now calls `row_filter_from_candidates` on the cached
  vec instead of re-running the full builder.

Correctness: each open still gets its own metric bindings — only the
candidate analysis is shared. Existing tests pass (103 lib +
200 integration).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Dandandan
Copy link
Copy Markdown
Contributor Author

run benchmarks

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4291424691-1711-pg9h8 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-group-morselization (641e6cc) to 9a1ed57 (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4291424691-1710-pz7tx 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-group-morselization (641e6cc) to 9a1ed57 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4291424691-1712-s7zfm 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-group-morselization (641e6cc) to 9a1ed57 (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@Dandandan
Copy link
Copy Markdown
Contributor Author

run benchmarks

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4292764381-1722-zl8hz 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-group-morselization (26f09e4) to 9a1ed57 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4292764381-1723-xd4zj 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-group-morselization (26f09e4) to 9a1ed57 (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4292764381-1724-t58bh 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-group-morselization (26f09e4) to 9a1ed57 (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and row-group-morselization
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃                  row-group-morselization ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │              7.29 / 7.74 ±0.65 / 9.04 ms │              7.30 / 7.80 ±0.71 / 9.20 ms │     no change │
│ QQuery 2  │        149.72 / 150.98 ±1.12 / 153.07 ms │        146.01 / 146.47 ±0.47 / 147.34 ms │     no change │
│ QQuery 3  │        114.35 / 115.27 ±0.60 / 116.02 ms │        114.69 / 115.86 ±0.71 / 116.62 ms │     no change │
│ QQuery 4  │     1431.96 / 1437.05 ±4.45 / 1443.18 ms │     1433.83 / 1443.29 ±9.09 / 1457.92 ms │     no change │
│ QQuery 5  │        175.67 / 177.05 ±1.21 / 179.25 ms │        175.24 / 177.57 ±1.72 / 180.30 ms │     no change │
│ QQuery 6  │       870.09 / 886.77 ±13.14 / 909.50 ms │       863.94 / 894.30 ±22.17 / 921.09 ms │     no change │
│ QQuery 7  │        339.35 / 343.85 ±3.51 / 348.58 ms │        345.37 / 346.85 ±1.31 / 348.88 ms │     no change │
│ QQuery 8  │        115.22 / 115.96 ±0.70 / 117.27 ms │        114.85 / 116.15 ±1.17 / 117.76 ms │     no change │
│ QQuery 9  │        101.81 / 105.65 ±3.01 / 108.71 ms │        101.69 / 105.68 ±3.24 / 109.14 ms │     no change │
│ QQuery 10 │        104.44 / 105.08 ±0.40 / 105.60 ms │        104.29 / 105.29 ±0.72 / 106.15 ms │     no change │
│ QQuery 11 │    1006.37 / 1023.16 ±10.09 / 1035.90 ms │     1022.12 / 1033.20 ±7.32 / 1044.18 ms │     no change │
│ QQuery 12 │           45.47 / 45.92 ±0.35 / 46.41 ms │           46.66 / 46.86 ±0.13 / 47.07 ms │     no change │
│ QQuery 13 │        406.41 / 408.72 ±1.49 / 410.28 ms │        406.77 / 413.17 ±4.68 / 418.53 ms │     no change │
│ QQuery 14 │      998.78 / 1008.01 ±6.61 / 1017.36 ms │      996.09 / 1002.34 ±4.72 / 1009.63 ms │     no change │
│ QQuery 15 │           15.80 / 16.09 ±0.20 / 16.39 ms │           15.70 / 15.84 ±0.17 / 16.13 ms │     no change │
│ QQuery 16 │              8.01 / 8.14 ±0.11 / 8.30 ms │              7.91 / 8.02 ±0.15 / 8.32 ms │     no change │
│ QQuery 17 │        232.66 / 234.42 ±0.98 / 235.37 ms │        231.74 / 234.61 ±2.10 / 237.58 ms │     no change │
│ QQuery 18 │        127.13 / 128.18 ±0.87 / 129.63 ms │        126.86 / 128.13 ±1.38 / 129.83 ms │     no change │
│ QQuery 19 │        163.30 / 164.62 ±0.85 / 165.95 ms │        163.38 / 165.39 ±1.77 / 167.47 ms │     no change │
│ QQuery 20 │           13.66 / 14.03 ±0.31 / 14.59 ms │           13.90 / 14.16 ±0.14 / 14.33 ms │     no change │
│ QQuery 21 │           19.82 / 20.32 ±0.43 / 20.93 ms │           20.04 / 20.47 ±0.30 / 20.86 ms │     no change │
│ QQuery 22 │        510.46 / 522.13 ±8.88 / 537.31 ms │        545.94 / 555.59 ±6.59 / 563.77 ms │  1.06x slower │
│ QQuery 23 │        896.76 / 900.77 ±3.00 / 906.04 ms │       901.18 / 912.74 ±10.80 / 932.29 ms │     no change │
│ QQuery 24 │        391.87 / 395.81 ±3.86 / 402.95 ms │        396.11 / 399.72 ±3.03 / 404.46 ms │     no change │
│ QQuery 25 │        345.88 / 348.90 ±2.24 / 351.38 ms │        346.90 / 351.03 ±3.80 / 356.70 ms │     no change │
│ QQuery 26 │           78.75 / 79.66 ±0.58 / 80.49 ms │           79.72 / 81.40 ±3.07 / 87.53 ms │     no change │
│ QQuery 27 │              7.28 / 7.46 ±0.13 / 7.59 ms │              7.41 / 7.58 ±0.09 / 7.66 ms │     no change │
│ QQuery 28 │        152.69 / 154.33 ±1.33 / 156.65 ms │        152.70 / 153.64 ±1.51 / 156.65 ms │     no change │
│ QQuery 29 │        283.95 / 288.09 ±3.52 / 294.53 ms │        282.72 / 287.35 ±4.86 / 296.33 ms │     no change │
│ QQuery 30 │           42.73 / 44.11 ±1.69 / 47.41 ms │           42.43 / 43.00 ±0.45 / 43.69 ms │     no change │
│ QQuery 31 │        169.70 / 172.77 ±3.31 / 179.07 ms │        170.12 / 173.03 ±1.72 / 175.12 ms │     no change │
│ QQuery 32 │           14.62 / 14.90 ±0.17 / 15.15 ms │           14.47 / 14.80 ±0.30 / 15.25 ms │     no change │
│ QQuery 33 │        143.92 / 146.02 ±1.16 / 147.28 ms │        144.83 / 147.31 ±1.98 / 150.90 ms │     no change │
│ QQuery 34 │              7.55 / 7.69 ±0.11 / 7.83 ms │              7.64 / 7.76 ±0.09 / 7.86 ms │     no change │
│ QQuery 35 │        105.19 / 106.57 ±1.95 / 110.45 ms │        101.91 / 104.58 ±1.83 / 106.94 ms │     no change │
│ QQuery 36 │              6.97 / 7.10 ±0.08 / 7.18 ms │              7.04 / 7.34 ±0.20 / 7.66 ms │     no change │
│ QQuery 37 │              8.82 / 8.94 ±0.10 / 9.09 ms │              8.80 / 8.87 ±0.04 / 8.93 ms │     no change │
│ QQuery 38 │           92.38 / 94.38 ±1.99 / 96.85 ms │          91.72 / 96.45 ±6.07 / 108.39 ms │     no change │
│ QQuery 39 │        130.37 / 135.59 ±2.71 / 138.04 ms │        132.74 / 134.70 ±1.06 / 135.71 ms │     no change │
│ QQuery 40 │        107.14 / 112.61 ±4.03 / 117.58 ms │        108.81 / 111.70 ±2.50 / 115.33 ms │     no change │
│ QQuery 41 │           14.87 / 15.09 ±0.16 / 15.33 ms │           14.71 / 14.92 ±0.21 / 15.31 ms │     no change │
│ QQuery 42 │        108.74 / 110.22 ±2.33 / 114.86 ms │        108.70 / 109.38 ±0.39 / 109.89 ms │     no change │
│ QQuery 43 │              6.10 / 6.15 ±0.04 / 6.22 ms │              6.04 / 6.11 ±0.10 / 6.30 ms │     no change │
│ QQuery 44 │           12.41 / 12.54 ±0.10 / 12.69 ms │           12.43 / 12.59 ±0.17 / 12.88 ms │     no change │
│ QQuery 45 │           50.37 / 51.17 ±0.85 / 52.74 ms │           50.62 / 50.93 ±0.32 / 51.55 ms │     no change │
│ QQuery 46 │              9.08 / 9.36 ±0.21 / 9.71 ms │              8.88 / 9.11 ±0.20 / 9.37 ms │     no change │
│ QQuery 47 │        815.24 / 825.10 ±6.23 / 831.00 ms │        830.10 / 838.33 ±7.29 / 851.83 ms │     no change │
│ QQuery 48 │        282.83 / 286.78 ±2.56 / 290.41 ms │        286.62 / 291.14 ±4.64 / 298.93 ms │     no change │
│ QQuery 49 │        254.36 / 255.52 ±0.92 / 256.86 ms │        253.31 / 255.69 ±2.45 / 259.37 ms │     no change │
│ QQuery 50 │        218.88 / 227.19 ±4.50 / 232.48 ms │        220.15 / 225.40 ±4.04 / 231.69 ms │     no change │
│ QQuery 51 │        181.22 / 183.68 ±2.10 / 186.45 ms │        180.60 / 182.76 ±1.58 / 184.87 ms │     no change │
│ QQuery 52 │        107.39 / 108.40 ±0.87 / 109.86 ms │        109.30 / 109.51 ±0.22 / 109.91 ms │     no change │
│ QQuery 53 │        104.42 / 106.10 ±1.49 / 108.66 ms │        104.15 / 105.66 ±1.50 / 108.51 ms │     no change │
│ QQuery 54 │        149.30 / 151.26 ±1.40 / 153.65 ms │        150.62 / 153.27 ±2.80 / 158.62 ms │     no change │
│ QQuery 55 │        106.76 / 108.09 ±0.95 / 109.61 ms │        106.97 / 107.99 ±0.91 / 109.19 ms │     no change │
│ QQuery 56 │        143.88 / 147.24 ±2.22 / 150.88 ms │        144.80 / 145.76 ±0.88 / 147.20 ms │     no change │
│ QQuery 57 │        172.54 / 174.47 ±1.25 / 176.46 ms │        174.83 / 175.43 ±0.46 / 176.12 ms │     no change │
│ QQuery 58 │        315.75 / 317.51 ±1.50 / 319.74 ms │        316.15 / 317.39 ±1.30 / 319.61 ms │     no change │
│ QQuery 59 │        209.41 / 211.30 ±1.65 / 213.21 ms │        205.54 / 207.84 ±2.75 / 213.21 ms │     no change │
│ QQuery 60 │        145.79 / 147.56 ±1.28 / 149.21 ms │        145.89 / 147.66 ±1.12 / 149.18 ms │     no change │
│ QQuery 61 │           14.04 / 14.22 ±0.10 / 14.33 ms │           14.48 / 14.56 ±0.09 / 14.73 ms │     no change │
│ QQuery 62 │        915.71 / 925.94 ±9.58 / 943.25 ms │        927.69 / 931.75 ±4.09 / 939.33 ms │     no change │
│ QQuery 63 │        104.02 / 105.58 ±2.03 / 109.54 ms │        106.07 / 106.85 ±0.50 / 107.64 ms │     no change │
│ QQuery 64 │        699.32 / 703.06 ±2.69 / 706.37 ms │        707.00 / 713.09 ±9.21 / 730.88 ms │     no change │
│ QQuery 65 │        270.85 / 274.77 ±4.38 / 282.22 ms │        274.84 / 277.63 ±3.28 / 283.84 ms │     no change │
│ QQuery 66 │       221.84 / 232.29 ±10.24 / 246.22 ms │       222.47 / 236.80 ±13.23 / 253.67 ms │     no change │
│ QQuery 67 │        313.74 / 325.03 ±9.17 / 338.02 ms │        320.11 / 329.93 ±7.55 / 339.05 ms │     no change │
│ QQuery 68 │              9.23 / 9.47 ±0.14 / 9.63 ms │              9.36 / 9.60 ±0.15 / 9.74 ms │     no change │
│ QQuery 69 │        101.05 / 102.04 ±1.05 / 103.66 ms │        100.35 / 101.96 ±1.34 / 103.80 ms │     no change │
│ QQuery 70 │        319.39 / 336.90 ±9.27 / 346.89 ms │       323.69 / 336.80 ±13.53 / 360.29 ms │     no change │
│ QQuery 71 │        137.58 / 138.91 ±1.84 / 142.38 ms │        138.22 / 140.29 ±1.90 / 143.59 ms │     no change │
│ QQuery 72 │        625.03 / 633.67 ±8.48 / 645.30 ms │        622.13 / 635.11 ±7.59 / 643.52 ms │     no change │
│ QQuery 73 │              7.14 / 7.25 ±0.08 / 7.37 ms │              7.33 / 7.51 ±0.11 / 7.64 ms │     no change │
│ QQuery 74 │        655.44 / 659.96 ±4.36 / 668.01 ms │        667.26 / 673.12 ±4.47 / 680.40 ms │     no change │
│ QQuery 75 │        274.25 / 276.77 ±1.60 / 279.26 ms │        274.83 / 275.71 ±0.62 / 276.67 ms │     no change │
│ QQuery 76 │        135.59 / 138.02 ±2.55 / 142.69 ms │        133.93 / 136.67 ±1.83 / 138.77 ms │     no change │
│ QQuery 77 │        189.24 / 191.35 ±1.42 / 193.13 ms │        190.35 / 191.40 ±0.87 / 192.69 ms │     no change │
│ QQuery 78 │        342.53 / 346.83 ±3.60 / 350.86 ms │        343.98 / 349.11 ±3.17 / 353.25 ms │     no change │
│ QQuery 79 │        252.06 / 254.53 ±2.64 / 258.60 ms │        254.03 / 259.57 ±3.25 / 263.00 ms │     no change │
│ QQuery 80 │        322.50 / 324.35 ±1.95 / 327.22 ms │        325.82 / 327.66 ±2.02 / 331.47 ms │     no change │
│ QQuery 81 │           26.54 / 27.02 ±0.26 / 27.27 ms │           27.12 / 27.75 ±0.53 / 28.65 ms │     no change │
│ QQuery 82 │           40.01 / 40.35 ±0.25 / 40.73 ms │           41.00 / 41.51 ±0.43 / 42.20 ms │     no change │
│ QQuery 83 │           37.85 / 38.42 ±0.46 / 39.13 ms │           39.06 / 39.56 ±0.26 / 39.78 ms │     no change │
│ QQuery 84 │           47.03 / 47.27 ±0.21 / 47.65 ms │           47.91 / 48.31 ±0.29 / 48.70 ms │     no change │
│ QQuery 85 │        143.33 / 144.90 ±1.76 / 148.14 ms │        145.07 / 146.27 ±1.37 / 148.84 ms │     no change │
│ QQuery 86 │           37.91 / 38.01 ±0.09 / 38.12 ms │           38.80 / 39.36 ±0.47 / 40.07 ms │     no change │
│ QQuery 87 │              3.71 / 3.84 ±0.15 / 4.07 ms │              3.71 / 3.83 ±0.10 / 4.01 ms │     no change │
│ QQuery 88 │        101.76 / 104.10 ±1.60 / 106.33 ms │        103.39 / 104.77 ±1.69 / 108.01 ms │     no change │
│ QQuery 89 │        117.61 / 118.90 ±1.02 / 120.53 ms │        120.71 / 122.72 ±2.32 / 127.23 ms │     no change │
│ QQuery 90 │           23.25 / 25.03 ±2.88 / 30.76 ms │           23.30 / 23.37 ±0.08 / 23.52 ms │ +1.07x faster │
│ QQuery 91 │           59.30 / 60.01 ±0.59 / 61.11 ms │           60.53 / 61.06 ±0.45 / 61.62 ms │     no change │
│ QQuery 92 │           57.40 / 58.01 ±0.40 / 58.67 ms │           58.36 / 59.16 ±1.00 / 61.11 ms │     no change │
│ QQuery 93 │        190.26 / 191.33 ±0.92 / 192.61 ms │        190.27 / 192.63 ±1.83 / 195.00 ms │     no change │
│ QQuery 94 │           61.88 / 62.26 ±0.25 / 62.56 ms │           62.63 / 63.42 ±0.45 / 63.89 ms │     no change │
│ QQuery 95 │        128.09 / 128.95 ±0.44 / 129.28 ms │        129.54 / 130.58 ±1.21 / 132.91 ms │     no change │
│ QQuery 96 │           69.91 / 71.37 ±1.70 / 74.26 ms │           70.61 / 73.18 ±1.37 / 74.61 ms │     no change │
│ QQuery 97 │        123.62 / 125.99 ±2.31 / 130.32 ms │        127.09 / 129.34 ±1.55 / 131.86 ms │     no change │
│ QQuery 98 │        158.61 / 161.00 ±1.59 / 163.41 ms │        162.26 / 163.33 ±0.63 / 164.05 ms │     no change │
│ QQuery 99 │ 10915.36 / 10939.16 ±40.08 / 11019.13 ms │ 10956.65 / 11012.98 ±40.82 / 11080.90 ms │     no change │
└───────────┴──────────────────────────────────────────┴──────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 31906.36ms │
│ Total Time (row-group-morselization)   │ 32145.17ms │
│ Average Time (HEAD)                    │   322.29ms │
│ Average Time (row-group-morselization) │   324.70ms │
│ Queries Faster                         │          1 │
│ Queries Slower                         │          1 │
│ Queries with No Change                 │         97 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 160.0s
Peak memory 6.2 GiB
Avg memory 5.5 GiB
CPU user 268.4s
CPU sys 8.3s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 165.0s
Peak memory 6.3 GiB
Avg memory 5.5 GiB
CPU user 270.1s
CPU sys 8.6s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and row-group-morselization
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃               row-group-morselization ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.18 / 4.65 ±6.79 / 18.22 ms │          1.16 / 4.66 ±6.85 / 18.35 ms │     no change │
│ QQuery 1  │        12.77 / 13.25 ±0.35 / 13.83 ms │        14.51 / 14.87 ±0.19 / 15.09 ms │  1.12x slower │
│ QQuery 2  │        36.83 / 37.51 ±0.51 / 38.14 ms │        39.17 / 39.70 ±0.43 / 40.21 ms │  1.06x slower │
│ QQuery 3  │        31.23 / 31.82 ±0.73 / 33.23 ms │        33.44 / 34.17 ±0.55 / 35.05 ms │  1.07x slower │
│ QQuery 4  │     242.36 / 247.90 ±3.24 / 251.44 ms │     248.88 / 254.72 ±4.30 / 261.58 ms │     no change │
│ QQuery 5  │     287.55 / 290.17 ±2.67 / 294.81 ms │     290.25 / 293.71 ±2.01 / 295.52 ms │     no change │
│ QQuery 6  │          7.02 / 7.92 ±1.32 / 10.50 ms │           6.63 / 7.44 ±0.41 / 7.77 ms │ +1.06x faster │
│ QQuery 7  │        14.63 / 14.73 ±0.06 / 14.80 ms │        16.08 / 16.35 ±0.18 / 16.59 ms │  1.11x slower │
│ QQuery 8  │     335.15 / 338.14 ±2.27 / 340.86 ms │     338.95 / 343.87 ±5.70 / 354.98 ms │     no change │
│ QQuery 9  │    497.32 / 513.21 ±13.79 / 530.54 ms │    497.60 / 517.81 ±13.88 / 535.07 ms │     no change │
│ QQuery 10 │        74.92 / 76.55 ±0.94 / 77.70 ms │        76.99 / 77.86 ±0.87 / 78.93 ms │     no change │
│ QQuery 11 │        86.70 / 88.85 ±3.32 / 95.47 ms │        87.56 / 88.93 ±0.84 / 89.93 ms │     no change │
│ QQuery 12 │     278.13 / 284.04 ±4.21 / 289.11 ms │     279.90 / 285.32 ±3.37 / 289.80 ms │     no change │
│ QQuery 13 │     403.87 / 411.90 ±6.56 / 418.31 ms │     401.95 / 410.48 ±6.28 / 421.24 ms │     no change │
│ QQuery 14 │     292.43 / 295.85 ±2.61 / 299.00 ms │     291.49 / 293.51 ±1.86 / 296.37 ms │     no change │
│ QQuery 15 │     288.81 / 294.60 ±3.72 / 298.97 ms │     291.95 / 299.84 ±6.90 / 308.70 ms │     no change │
│ QQuery 16 │     634.30 / 639.32 ±3.69 / 644.11 ms │     635.42 / 641.41 ±8.56 / 658.32 ms │     no change │
│ QQuery 17 │     636.57 / 641.98 ±4.39 / 646.32 ms │     638.79 / 645.01 ±5.40 / 653.85 ms │     no change │
│ QQuery 18 │ 1276.43 / 1302.08 ±19.30 / 1330.48 ms │ 1286.49 / 1307.57 ±15.82 / 1331.98 ms │     no change │
│ QQuery 19 │        28.93 / 31.86 ±3.76 / 38.66 ms │        30.79 / 31.77 ±0.53 / 32.24 ms │     no change │
│ QQuery 20 │     515.99 / 522.23 ±6.63 / 534.29 ms │    493.76 / 506.95 ±19.03 / 544.69 ms │     no change │
│ QQuery 21 │     599.25 / 604.86 ±2.95 / 607.94 ms │     579.12 / 584.15 ±3.33 / 588.65 ms │     no change │
│ QQuery 22 │  1065.65 / 1075.95 ±6.19 / 1084.98 ms │  1029.58 / 1041.44 ±9.85 / 1059.11 ms │     no change │
│ QQuery 23 │ 3338.39 / 3367.36 ±18.06 / 3389.94 ms │ 3236.06 / 3256.54 ±18.75 / 3285.22 ms │     no change │
│ QQuery 24 │        42.18 / 42.63 ±0.63 / 43.83 ms │        46.39 / 48.36 ±3.11 / 54.54 ms │  1.13x slower │
│ QQuery 25 │     116.31 / 118.14 ±1.73 / 121.11 ms │     115.17 / 118.97 ±5.13 / 128.88 ms │     no change │
│ QQuery 26 │        42.54 / 43.95 ±1.65 / 46.91 ms │        46.84 / 47.09 ±0.19 / 47.31 ms │  1.07x slower │
│ QQuery 27 │     672.93 / 676.35 ±3.10 / 681.15 ms │     651.99 / 662.67 ±6.66 / 670.22 ms │     no change │
│ QQuery 28 │ 3021.91 / 3041.47 ±10.57 / 3050.69 ms │  2876.60 / 2884.81 ±4.88 / 2891.15 ms │ +1.05x faster │
│ QQuery 29 │       43.05 / 50.47 ±14.38 / 79.22 ms │        45.79 / 48.16 ±2.86 / 53.38 ms │     no change │
│ QQuery 30 │     315.48 / 318.95 ±2.25 / 321.52 ms │     314.30 / 320.49 ±3.60 / 324.73 ms │     no change │
│ QQuery 31 │     303.05 / 311.29 ±4.62 / 314.99 ms │     309.71 / 313.60 ±3.15 / 317.70 ms │     no change │
│ QQuery 32 │ 1020.34 / 1043.73 ±20.99 / 1083.13 ms │ 1028.85 / 1039.59 ±12.02 / 1061.60 ms │     no change │
│ QQuery 33 │ 1458.59 / 1484.82 ±14.21 / 1496.64 ms │ 1472.76 / 1490.99 ±10.11 / 1502.19 ms │     no change │
│ QQuery 34 │ 1483.77 / 1503.37 ±21.64 / 1535.16 ms │ 1472.77 / 1504.44 ±23.26 / 1541.91 ms │     no change │
│ QQuery 35 │    295.52 / 307.49 ±12.98 / 331.17 ms │    299.40 / 324.37 ±22.75 / 353.06 ms │  1.05x slower │
│ QQuery 36 │        63.37 / 67.12 ±3.09 / 71.41 ms │        64.07 / 70.20 ±6.47 / 80.15 ms │     no change │
│ QQuery 37 │        36.18 / 38.28 ±3.69 / 45.67 ms │        39.09 / 41.73 ±1.84 / 44.12 ms │  1.09x slower │
│ QQuery 38 │        41.54 / 46.17 ±3.75 / 51.87 ms │        39.23 / 42.74 ±4.80 / 52.04 ms │ +1.08x faster │
│ QQuery 39 │     124.15 / 132.74 ±6.58 / 143.82 ms │     132.69 / 139.85 ±5.99 / 147.96 ms │  1.05x slower │
│ QQuery 40 │        14.75 / 15.07 ±0.21 / 15.39 ms │        15.32 / 18.08 ±3.17 / 24.25 ms │  1.20x slower │
│ QQuery 41 │        14.28 / 16.70 ±3.14 / 22.73 ms │        14.63 / 14.86 ±0.17 / 15.03 ms │ +1.12x faster │
│ QQuery 42 │        13.81 / 16.90 ±3.28 / 21.50 ms │        13.97 / 14.17 ±0.18 / 14.47 ms │ +1.19x faster │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 20412.37ms │
│ Total Time (row-group-morselization)   │ 20143.24ms │
│ Average Time (HEAD)                    │   474.71ms │
│ Average Time (row-group-morselization) │   468.45ms │
│ Queries Faster                         │          5 │
│ Queries Slower                         │         10 │
│ Queries with No Change                 │         28 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 105.0s
Peak memory 30.4 GiB
Avg memory 23.2 GiB
CPU user 1085.0s
CPU sys 62.8s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 105.0s
Peak memory 30.4 GiB
Avg memory 23.2 GiB
CPU user 1088.7s
CPU sys 64.7s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

Benchmark for this request hit the 7200s job deadline before finishing.

Benchmarks requested: tpch

Kubernetes message
Job was active longer than specified deadline

File an issue against this benchmark runner

@adriangb
Copy link
Copy Markdown
Contributor

FYI tpch will continue to fail until #21625 is resolved

@Dandandan
Copy link
Copy Markdown
Contributor Author

Dandandan commented Apr 22, 2026

run benchmark tpch10

@adriangbot
Copy link
Copy Markdown

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4294299393-1741-8bn99 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-group-morselization (26f09e4) to 9a1ed57 (merge-base) diff
BENCH_NAME=tpch-10
BENCH_COMMAND=cargo bench --features=parquet --bench tpch-10
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
    struct_query_sql
    substr
    substr_index
    substring
    sum
    to_char
    to_hex
    to_local_time
    to_time
    to_timestamp
    topk_aggregate
    topk_repartition
    translate
    trim
    trunc
    unhex
    upper
    uuid
    window_query_sql
    with_hashes

File an issue against this benchmark runner

@Dandandan
Copy link
Copy Markdown
Contributor Author

run benchmark tpch10

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4294343524-1742-8kg55 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-group-morselization (26f09e4) to 9a1ed57 (merge-base) diff using: tpch10
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and row-group-morselization
--------------------
Benchmark tpch_sf10.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                               HEAD ┃            row-group-morselization ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │  328.05 / 330.49 ±1.60 / 332.91 ms │  329.76 / 330.88 ±0.97 / 332.26 ms │     no change │
│ QQuery 2  │  136.25 / 139.09 ±2.73 / 144.06 ms │  135.04 / 136.62 ±1.16 / 138.02 ms │     no change │
│ QQuery 3  │  290.71 / 296.70 ±3.26 / 300.44 ms │  287.84 / 291.20 ±2.70 / 295.80 ms │     no change │
│ QQuery 4  │  154.24 / 155.02 ±1.12 / 157.16 ms │  156.09 / 157.48 ±0.88 / 158.70 ms │     no change │
│ QQuery 5  │  427.45 / 432.77 ±4.06 / 439.01 ms │  423.54 / 424.22 ±0.71 / 425.58 ms │     no change │
│ QQuery 6  │  132.60 / 134.39 ±1.47 / 136.88 ms │  136.90 / 137.82 ±0.81 / 139.06 ms │     no change │
│ QQuery 7  │  543.34 / 547.12 ±4.67 / 555.78 ms │  536.39 / 540.55 ±2.99 / 545.61 ms │     no change │
│ QQuery 8  │  462.77 / 467.13 ±2.33 / 469.70 ms │  458.93 / 461.59 ±2.95 / 467.29 ms │     no change │
│ QQuery 9  │  652.12 / 657.60 ±4.58 / 665.84 ms │  643.96 / 652.89 ±7.10 / 665.05 ms │     no change │
│ QQuery 10 │  330.76 / 342.07 ±6.07 / 348.28 ms │  330.80 / 333.03 ±1.59 / 335.09 ms │     no change │
│ QQuery 11 │  105.07 / 110.54 ±7.90 / 126.15 ms │  102.92 / 105.61 ±3.74 / 113.04 ms │     no change │
│ QQuery 12 │  200.14 / 205.38 ±5.73 / 213.13 ms │  203.95 / 205.86 ±2.06 / 209.78 ms │     no change │
│ QQuery 13 │  319.44 / 322.39 ±4.76 / 331.79 ms │  297.38 / 298.85 ±1.86 / 302.24 ms │ +1.08x faster │
│ QQuery 14 │  183.59 / 187.12 ±3.27 / 191.15 ms │  187.09 / 188.06 ±0.94 / 189.81 ms │     no change │
│ QQuery 15 │  332.26 / 336.53 ±4.04 / 344.21 ms │  336.08 / 337.90 ±1.58 / 340.75 ms │     no change │
│ QQuery 16 │     79.16 / 83.63 ±4.12 / 91.34 ms │     79.71 / 83.38 ±6.90 / 97.17 ms │     no change │
│ QQuery 17 │  754.73 / 766.18 ±9.77 / 783.96 ms │  742.90 / 746.00 ±2.28 / 749.25 ms │     no change │
│ QQuery 18 │ 770.05 / 792.19 ±18.30 / 819.67 ms │ 765.83 / 782.54 ±19.75 / 820.12 ms │     no change │
│ QQuery 19 │ 270.85 / 293.73 ±34.50 / 361.56 ms │  270.80 / 273.10 ±2.58 / 278.13 ms │ +1.08x faster │
│ QQuery 20 │  307.33 / 318.76 ±7.05 / 326.97 ms │  298.38 / 303.27 ±4.01 / 309.41 ms │     no change │
│ QQuery 21 │  820.13 / 824.46 ±3.54 / 830.61 ms │  799.32 / 804.21 ±3.66 / 810.28 ms │     no change │
│ QQuery 22 │     82.54 / 84.31 ±1.96 / 87.71 ms │     79.38 / 80.30 ±0.86 / 81.56 ms │     no change │
└───────────┴────────────────────────────────────┴────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 7827.62ms │
│ Total Time (row-group-morselization)   │ 7675.37ms │
│ Average Time (HEAD)                    │  355.80ms │
│ Average Time (row-group-morselization) │  348.88ms │
│ Queries Faster                         │         2 │
│ Queries Slower                         │         0 │
│ Queries with No Change                 │        20 │
│ Queries with Failure                   │         0 │
└────────────────────────────────────────┴───────────┘

Resource Usage

tpch10 — base (merge-base)

Metric Value
Wall time 40.0s
Peak memory 12.2 GiB
Avg memory 8.5 GiB
CPU user 419.6s
CPU sys 20.0s
Peak spill 0 B

tpch10 — branch

Metric Value
Wall time 40.0s
Peak memory 11.9 GiB
Avg memory 8.4 GiB
CPU user 417.3s
CPU sys 19.8s
Peak spill 0 B

File an issue against this benchmark runner

…scan

Before: when every shared file is popped but no donor has reached its
split point yet, an idle sibling saw empty queues and returned Done.
Any row groups the donor subsequently pushed to the morsel queue were
missed by that sibling.

Now SharedWorkSource tracks an in-flight donor count via a FileLease
RAII guard. Idle siblings that find both queues empty check the count
and, if non-zero, wake_by_ref + Poll::Pending to re-poll. The lease
drops at the morsel-to-reader transition — once a file is streaming,
the donation window is closed, so we don't block siblings on the
donor's assigned row groups.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Dandandan
Copy link
Copy Markdown
Contributor Author

run benchmarks

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4296699660-1748-hqgfb 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-group-morselization (52d6bce) to 9a1ed57 (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4296699660-1747-s7f9g 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-group-morselization (52d6bce) to 9a1ed57 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4296699660-1749-wdn6q 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-group-morselization (52d6bce) to 9a1ed57 (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and row-group-morselization
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                      HEAD ┃                  row-group-morselization ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │               7.22 / 7.59 ±0.66 / 8.91 ms │              7.02 / 7.45 ±0.78 / 9.01 ms │     no change │
│ QQuery 2  │         147.99 / 148.41 ±0.31 / 148.95 ms │        145.03 / 146.01 ±0.66 / 146.80 ms │     no change │
│ QQuery 3  │         114.09 / 115.43 ±0.91 / 116.57 ms │        114.22 / 115.40 ±1.27 / 117.57 ms │     no change │
│ QQuery 4  │      1365.15 / 1374.92 ±8.90 / 1387.89 ms │    1323.76 / 1339.25 ±10.92 / 1352.13 ms │     no change │
│ QQuery 5  │         173.33 / 175.11 ±1.49 / 177.86 ms │        172.35 / 173.52 ±0.70 / 174.27 ms │     no change │
│ QQuery 6  │        849.63 / 859.99 ±12.96 / 885.52 ms │       847.13 / 867.09 ±21.99 / 902.13 ms │     no change │
│ QQuery 7  │         334.99 / 337.96 ±2.81 / 343.22 ms │        338.09 / 339.27 ±1.25 / 341.62 ms │     no change │
│ QQuery 8  │         114.47 / 115.35 ±0.82 / 116.50 ms │        112.46 / 114.79 ±1.73 / 117.02 ms │     no change │
│ QQuery 9  │         105.50 / 106.55 ±0.63 / 107.37 ms │        100.89 / 102.67 ±1.81 / 105.90 ms │     no change │
│ QQuery 10 │         102.77 / 103.41 ±0.46 / 104.10 ms │        103.29 / 104.42 ±1.83 / 108.05 ms │     no change │
│ QQuery 11 │         940.56 / 952.80 ±9.15 / 968.92 ms │        925.76 / 938.14 ±9.45 / 952.45 ms │     no change │
│ QQuery 12 │            44.03 / 44.20 ±0.13 / 44.39 ms │           43.86 / 44.24 ±0.41 / 44.77 ms │     no change │
│ QQuery 13 │         393.05 / 396.84 ±2.85 / 401.62 ms │        392.29 / 393.54 ±1.26 / 395.57 ms │     no change │
│ QQuery 14 │        977.85 / 989.17 ±8.87 / 1004.15 ms │        974.36 / 985.43 ±7.47 / 996.14 ms │     no change │
│ QQuery 15 │            15.05 / 15.33 ±0.24 / 15.78 ms │           14.92 / 15.34 ±0.39 / 16.06 ms │     no change │
│ QQuery 16 │               7.31 / 7.49 ±0.19 / 7.82 ms │              7.39 / 7.50 ±0.13 / 7.75 ms │     no change │
│ QQuery 17 │         222.27 / 223.46 ±1.48 / 226.31 ms │        220.43 / 224.07 ±1.84 / 225.37 ms │     no change │
│ QQuery 18 │         121.77 / 122.54 ±0.83 / 124.03 ms │        122.61 / 123.71 ±1.29 / 126.24 ms │     no change │
│ QQuery 19 │         153.26 / 154.16 ±0.59 / 154.87 ms │        153.95 / 155.79 ±1.21 / 157.25 ms │     no change │
│ QQuery 20 │            13.26 / 13.48 ±0.18 / 13.77 ms │           13.25 / 13.49 ±0.20 / 13.81 ms │     no change │
│ QQuery 21 │            19.43 / 19.65 ±0.19 / 19.90 ms │           19.58 / 19.83 ±0.47 / 20.77 ms │     no change │
│ QQuery 22 │         476.76 / 478.17 ±1.79 / 481.61 ms │        485.21 / 488.71 ±3.15 / 493.89 ms │     no change │
│ QQuery 23 │         830.16 / 835.55 ±3.47 / 840.09 ms │        834.47 / 840.12 ±5.26 / 848.10 ms │     no change │
│ QQuery 24 │         377.13 / 379.77 ±2.30 / 384.02 ms │        378.20 / 381.52 ±2.71 / 384.79 ms │     no change │
│ QQuery 25 │         335.19 / 338.21 ±2.71 / 342.43 ms │        334.49 / 336.25 ±1.30 / 338.19 ms │     no change │
│ QQuery 26 │            77.33 / 77.82 ±0.32 / 78.29 ms │           77.14 / 77.80 ±0.39 / 78.33 ms │     no change │
│ QQuery 27 │               6.80 / 6.99 ±0.16 / 7.28 ms │              6.98 / 7.40 ±0.59 / 8.54 ms │  1.06x slower │
│ QQuery 28 │         148.60 / 149.43 ±0.57 / 150.34 ms │        149.99 / 150.41 ±0.50 / 151.35 ms │     no change │
│ QQuery 29 │         271.25 / 276.45 ±5.31 / 286.46 ms │        273.81 / 275.14 ±0.83 / 276.15 ms │     no change │
│ QQuery 30 │            41.57 / 42.40 ±0.74 / 43.53 ms │           41.24 / 42.61 ±1.38 / 44.84 ms │     no change │
│ QQuery 31 │         164.04 / 167.06 ±2.17 / 170.46 ms │        165.56 / 167.17 ±1.15 / 168.55 ms │     no change │
│ QQuery 32 │            13.72 / 13.99 ±0.19 / 14.22 ms │           13.79 / 14.00 ±0.20 / 14.33 ms │     no change │
│ QQuery 33 │         137.95 / 139.60 ±0.87 / 140.38 ms │        138.92 / 140.61 ±1.16 / 141.87 ms │     no change │
│ QQuery 34 │               6.97 / 7.10 ±0.11 / 7.29 ms │              6.96 / 7.07 ±0.15 / 7.35 ms │     no change │
│ QQuery 35 │         101.09 / 102.75 ±2.22 / 107.07 ms │        100.71 / 101.70 ±0.72 / 102.85 ms │     no change │
│ QQuery 36 │               6.55 / 6.95 ±0.23 / 7.23 ms │              6.76 / 6.97 ±0.14 / 7.20 ms │     no change │
│ QQuery 37 │               8.43 / 8.49 ±0.05 / 8.57 ms │              8.26 / 8.35 ±0.07 / 8.44 ms │     no change │
│ QQuery 38 │            86.09 / 86.99 ±0.89 / 88.56 ms │           85.54 / 86.63 ±1.17 / 88.64 ms │     no change │
│ QQuery 39 │         120.05 / 122.42 ±4.07 / 130.55 ms │        118.00 / 120.89 ±3.38 / 127.38 ms │     no change │
│ QQuery 40 │         104.74 / 109.35 ±2.68 / 112.29 ms │        102.19 / 106.82 ±2.99 / 111.53 ms │     no change │
│ QQuery 41 │            14.44 / 14.55 ±0.12 / 14.79 ms │           14.21 / 15.65 ±2.52 / 20.69 ms │  1.08x slower │
│ QQuery 42 │         106.92 / 107.40 ±0.34 / 107.87 ms │        107.01 / 107.55 ±0.43 / 108.10 ms │     no change │
│ QQuery 43 │               5.78 / 5.93 ±0.12 / 6.14 ms │              5.53 / 5.74 ±0.17 / 6.04 ms │     no change │
│ QQuery 44 │            11.81 / 13.33 ±2.68 / 18.68 ms │           11.58 / 11.74 ±0.09 / 11.81 ms │ +1.14x faster │
│ QQuery 45 │            48.99 / 49.24 ±0.17 / 49.40 ms │           48.21 / 49.03 ±0.47 / 49.57 ms │     no change │
│ QQuery 46 │              8.70 / 9.37 ±1.07 / 11.50 ms │              8.50 / 8.63 ±0.17 / 8.95 ms │ +1.09x faster │
│ QQuery 47 │         725.84 / 735.91 ±7.71 / 745.94 ms │        732.36 / 740.38 ±5.35 / 749.11 ms │     no change │
│ QQuery 48 │         270.45 / 276.88 ±5.43 / 286.66 ms │        275.99 / 279.53 ±3.55 / 284.59 ms │     no change │
│ QQuery 49 │         247.81 / 249.47 ±1.25 / 251.36 ms │        250.88 / 252.05 ±0.92 / 253.56 ms │     no change │
│ QQuery 50 │         203.25 / 210.97 ±5.34 / 219.54 ms │        204.85 / 210.44 ±4.32 / 217.31 ms │     no change │
│ QQuery 51 │         176.21 / 180.23 ±3.33 / 185.58 ms │        179.11 / 180.05 ±0.48 / 180.42 ms │     no change │
│ QQuery 52 │         106.66 / 107.23 ±0.31 / 107.50 ms │        107.29 / 108.67 ±1.64 / 111.85 ms │     no change │
│ QQuery 53 │         101.76 / 103.50 ±1.98 / 107.31 ms │        102.43 / 103.52 ±1.17 / 105.79 ms │     no change │
│ QQuery 54 │         143.38 / 144.05 ±0.63 / 144.97 ms │        144.41 / 145.78 ±0.78 / 146.81 ms │     no change │
│ QQuery 55 │         105.58 / 106.44 ±0.53 / 107.23 ms │        106.27 / 107.25 ±0.85 / 108.26 ms │     no change │
│ QQuery 56 │         137.43 / 139.15 ±1.40 / 140.78 ms │        139.92 / 141.13 ±1.06 / 142.56 ms │     no change │
│ QQuery 57 │         164.90 / 167.23 ±2.41 / 171.87 ms │        163.96 / 165.44 ±1.10 / 167.07 ms │     no change │
│ QQuery 58 │         307.49 / 310.61 ±1.87 / 312.50 ms │        310.72 / 311.88 ±0.97 / 313.17 ms │     no change │
│ QQuery 59 │         196.32 / 198.12 ±1.86 / 201.54 ms │        195.25 / 196.11 ±0.75 / 197.49 ms │     no change │
│ QQuery 60 │         139.54 / 141.57 ±1.11 / 142.55 ms │        140.96 / 142.25 ±1.31 / 144.49 ms │     no change │
│ QQuery 61 │            13.47 / 13.59 ±0.10 / 13.72 ms │           13.35 / 13.53 ±0.15 / 13.73 ms │     no change │
│ QQuery 62 │         871.74 / 882.32 ±8.00 / 893.49 ms │        875.37 / 891.50 ±9.18 / 902.61 ms │     no change │
│ QQuery 63 │         101.73 / 102.92 ±1.58 / 106.05 ms │        102.63 / 105.77 ±3.85 / 113.31 ms │     no change │
│ QQuery 64 │         664.56 / 671.25 ±3.90 / 676.26 ms │        667.23 / 673.00 ±4.63 / 680.60 ms │     no change │
│ QQuery 65 │         246.69 / 248.78 ±1.93 / 251.14 ms │        252.43 / 256.47 ±3.25 / 261.76 ms │     no change │
│ QQuery 66 │        211.27 / 224.49 ±11.73 / 240.12 ms │       221.29 / 231.23 ±12.42 / 252.32 ms │     no change │
│ QQuery 67 │         301.30 / 307.40 ±9.07 / 325.37 ms │        302.35 / 311.02 ±6.11 / 320.48 ms │     no change │
│ QQuery 68 │               8.65 / 8.83 ±0.14 / 9.08 ms │              8.63 / 8.82 ±0.20 / 9.20 ms │     no change │
│ QQuery 69 │          97.77 / 100.58 ±3.31 / 106.65 ms │          97.18 / 98.77 ±1.89 / 102.28 ms │     no change │
│ QQuery 70 │         316.55 / 325.69 ±7.59 / 335.45 ms │        319.01 / 327.92 ±8.74 / 343.39 ms │     no change │
│ QQuery 71 │         131.99 / 136.68 ±4.42 / 142.40 ms │        132.74 / 133.64 ±0.66 / 134.42 ms │     no change │
│ QQuery 72 │         587.18 / 595.59 ±5.12 / 600.79 ms │        598.11 / 602.99 ±3.30 / 607.18 ms │     no change │
│ QQuery 73 │               6.62 / 6.76 ±0.17 / 7.09 ms │              6.76 / 6.91 ±0.15 / 7.20 ms │     no change │
│ QQuery 74 │         602.32 / 608.39 ±5.85 / 619.24 ms │        609.78 / 616.03 ±4.24 / 621.76 ms │     no change │
│ QQuery 75 │         267.38 / 269.38 ±2.53 / 274.32 ms │        267.56 / 269.36 ±1.35 / 271.57 ms │     no change │
│ QQuery 76 │         129.82 / 131.68 ±2.37 / 136.30 ms │        131.42 / 133.15 ±0.98 / 134.45 ms │     no change │
│ QQuery 77 │         186.06 / 187.84 ±2.02 / 191.77 ms │        186.74 / 188.92 ±1.54 / 191.05 ms │     no change │
│ QQuery 78 │         331.06 / 332.84 ±1.24 / 334.32 ms │        333.28 / 336.50 ±1.85 / 338.97 ms │     no change │
│ QQuery 79 │         235.24 / 238.48 ±3.24 / 244.43 ms │        234.19 / 236.68 ±1.30 / 238.02 ms │     no change │
│ QQuery 80 │         318.68 / 322.38 ±2.66 / 326.90 ms │        319.97 / 320.48 ±0.49 / 321.29 ms │     no change │
│ QQuery 81 │            25.68 / 26.30 ±0.42 / 26.94 ms │           26.09 / 27.28 ±1.79 / 30.83 ms │     no change │
│ QQuery 82 │            39.66 / 39.98 ±0.25 / 40.43 ms │           39.79 / 40.29 ±0.43 / 40.86 ms │     no change │
│ QQuery 83 │            37.36 / 37.84 ±0.53 / 38.79 ms │           37.15 / 37.44 ±0.45 / 38.31 ms │     no change │
│ QQuery 84 │            46.21 / 46.83 ±0.65 / 47.97 ms │           46.61 / 46.77 ±0.16 / 47.00 ms │     no change │
│ QQuery 85 │         139.95 / 141.42 ±2.11 / 145.61 ms │        141.13 / 142.25 ±0.63 / 142.84 ms │     no change │
│ QQuery 86 │            37.67 / 38.03 ±0.36 / 38.61 ms │           37.63 / 37.89 ±0.20 / 38.13 ms │     no change │
│ QQuery 87 │               3.43 / 3.54 ±0.17 / 3.87 ms │              3.50 / 3.62 ±0.18 / 3.98 ms │     no change │
│ QQuery 88 │            99.12 / 99.46 ±0.37 / 99.99 ms │         99.92 / 102.23 ±2.52 / 106.96 ms │     no change │
│ QQuery 89 │         116.41 / 118.83 ±3.27 / 125.29 ms │        117.22 / 118.02 ±0.56 / 118.81 ms │     no change │
│ QQuery 90 │            22.35 / 23.07 ±0.41 / 23.49 ms │           22.49 / 22.84 ±0.26 / 23.10 ms │     no change │
│ QQuery 91 │            57.84 / 58.92 ±0.68 / 59.93 ms │           58.80 / 59.93 ±0.94 / 61.54 ms │     no change │
│ QQuery 92 │            56.91 / 57.65 ±1.10 / 59.80 ms │           57.08 / 57.72 ±0.35 / 58.10 ms │     no change │
│ QQuery 93 │         180.95 / 182.86 ±1.54 / 185.48 ms │        181.86 / 185.13 ±1.84 / 187.16 ms │     no change │
│ QQuery 94 │            60.58 / 61.52 ±0.55 / 62.23 ms │           60.70 / 61.25 ±0.38 / 61.90 ms │     no change │
│ QQuery 95 │         125.68 / 127.41 ±1.42 / 130.00 ms │        127.10 / 127.98 ±0.90 / 129.69 ms │     no change │
│ QQuery 96 │            68.32 / 69.44 ±0.92 / 70.63 ms │           67.46 / 69.36 ±1.09 / 70.43 ms │     no change │
│ QQuery 97 │         117.24 / 118.69 ±1.18 / 119.96 ms │        117.71 / 119.17 ±1.09 / 120.77 ms │     no change │
│ QQuery 98 │         151.77 / 153.97 ±1.57 / 156.25 ms │        150.95 / 152.14 ±1.39 / 154.77 ms │     no change │
│ QQuery 99 │ 10766.52 / 10937.58 ±134.68 / 11143.05 ms │ 10848.51 / 10878.73 ±25.54 / 10909.14 ms │     no change │
└───────────┴───────────────────────────────────────────┴──────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 30963.75ms │
│ Total Time (row-group-morselization)   │ 30926.69ms │
│ Average Time (HEAD)                    │   312.77ms │
│ Average Time (row-group-morselization) │   312.39ms │
│ Queries Faster                         │          2 │
│ Queries Slower                         │          2 │
│ Queries with No Change                 │         95 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 160.0s
Peak memory 6.3 GiB
Avg memory 5.5 GiB
CPU user 258.2s
CPU sys 8.6s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 155.0s
Peak memory 6.8 GiB
Avg memory 5.7 GiB
CPU user 260.0s
CPU sys 8.3s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and row-group-morselization
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃               row-group-morselization ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.19 / 4.75 ±6.95 / 18.66 ms │          1.17 / 4.68 ±6.85 / 18.38 ms │     no change │
│ QQuery 1  │        12.95 / 13.33 ±0.21 / 13.57 ms │        14.22 / 14.87 ±0.40 / 15.33 ms │  1.12x slower │
│ QQuery 2  │        38.19 / 38.48 ±0.28 / 38.96 ms │        38.84 / 39.26 ±0.35 / 39.89 ms │     no change │
│ QQuery 3  │        32.09 / 32.68 ±0.57 / 33.44 ms │        33.74 / 34.43 ±0.70 / 35.28 ms │  1.05x slower │
│ QQuery 4  │     253.47 / 256.22 ±3.29 / 262.22 ms │     250.66 / 256.17 ±3.23 / 260.06 ms │     no change │
│ QQuery 5  │     292.18 / 294.25 ±1.40 / 296.05 ms │     291.33 / 293.67 ±1.63 / 296.20 ms │     no change │
│ QQuery 6  │           6.40 / 7.06 ±0.48 / 7.71 ms │           7.08 / 7.61 ±0.32 / 8.00 ms │  1.08x slower │
│ QQuery 7  │        14.56 / 14.70 ±0.12 / 14.87 ms │        16.34 / 16.49 ±0.14 / 16.69 ms │  1.12x slower │
│ QQuery 8  │     340.99 / 342.98 ±1.89 / 345.76 ms │     333.99 / 337.92 ±3.28 / 342.59 ms │     no change │
│ QQuery 9  │     522.05 / 529.69 ±6.49 / 538.62 ms │     515.71 / 523.63 ±7.05 / 535.73 ms │     no change │
│ QQuery 10 │        76.67 / 78.66 ±1.39 / 80.74 ms │        76.91 / 79.89 ±3.68 / 87.10 ms │     no change │
│ QQuery 11 │        87.38 / 88.88 ±0.99 / 89.85 ms │       88.97 / 91.69 ±4.22 / 100.10 ms │     no change │
│ QQuery 12 │     283.95 / 289.01 ±4.74 / 297.05 ms │     283.14 / 286.53 ±4.16 / 294.27 ms │     no change │
│ QQuery 13 │    408.96 / 425.43 ±17.81 / 456.28 ms │     403.10 / 408.49 ±4.66 / 414.94 ms │     no change │
│ QQuery 14 │     290.80 / 293.37 ±1.90 / 296.00 ms │     294.98 / 299.03 ±3.18 / 302.89 ms │     no change │
│ QQuery 15 │     286.88 / 292.34 ±3.43 / 296.61 ms │     296.14 / 302.16 ±6.27 / 312.84 ms │     no change │
│ QQuery 16 │     628.48 / 636.01 ±5.08 / 642.64 ms │     640.70 / 643.16 ±2.05 / 646.52 ms │     no change │
│ QQuery 17 │     631.81 / 636.53 ±5.08 / 645.61 ms │     639.67 / 645.62 ±4.87 / 652.93 ms │     no change │
│ QQuery 18 │ 1269.23 / 1283.28 ±14.05 / 1309.47 ms │ 1292.37 / 1315.59 ±17.82 / 1337.16 ms │     no change │
│ QQuery 19 │        29.19 / 30.92 ±1.34 / 32.76 ms │        30.96 / 31.61 ±0.81 / 33.17 ms │     no change │
│ QQuery 20 │     522.08 / 528.42 ±7.92 / 543.37 ms │     503.21 / 507.69 ±4.82 / 516.56 ms │     no change │
│ QQuery 21 │     601.36 / 602.92 ±1.23 / 604.31 ms │     586.64 / 591.13 ±3.52 / 595.34 ms │     no change │
│ QQuery 22 │  1068.17 / 1073.47 ±3.44 / 1078.34 ms │  1044.75 / 1051.97 ±5.85 / 1061.18 ms │     no change │
│ QQuery 23 │ 3335.06 / 3363.60 ±20.42 / 3392.84 ms │ 3266.02 / 3288.56 ±12.34 / 3302.42 ms │     no change │
│ QQuery 24 │        41.92 / 42.22 ±0.31 / 42.78 ms │        46.21 / 48.56 ±2.82 / 54.01 ms │  1.15x slower │
│ QQuery 25 │     115.78 / 119.35 ±5.01 / 129.25 ms │     114.76 / 118.24 ±4.27 / 126.61 ms │     no change │
│ QQuery 26 │        42.64 / 44.63 ±1.49 / 46.19 ms │        47.33 / 48.13 ±0.88 / 49.66 ms │  1.08x slower │
│ QQuery 27 │     667.24 / 672.99 ±3.22 / 676.71 ms │     654.62 / 662.03 ±6.12 / 673.16 ms │     no change │
│ QQuery 28 │ 3028.30 / 3044.27 ±12.19 / 3064.17 ms │  2864.62 / 2877.38 ±8.98 / 2890.09 ms │ +1.06x faster │
│ QQuery 29 │        42.86 / 48.12 ±9.33 / 66.76 ms │        45.10 / 46.30 ±1.40 / 49.02 ms │     no change │
│ QQuery 30 │     312.42 / 316.94 ±4.43 / 325.14 ms │     313.68 / 318.79 ±5.84 / 329.80 ms │     no change │
│ QQuery 31 │     306.71 / 315.85 ±6.48 / 324.38 ms │     305.62 / 315.62 ±6.95 / 323.49 ms │     no change │
│ QQuery 32 │ 1020.32 / 1036.81 ±19.98 / 1074.38 ms │ 1032.22 / 1061.03 ±35.67 / 1123.85 ms │     no change │
│ QQuery 33 │ 1469.18 / 1487.10 ±17.94 / 1514.02 ms │ 1466.22 / 1482.31 ±17.87 / 1511.25 ms │     no change │
│ QQuery 34 │ 1468.18 / 1487.12 ±13.51 / 1503.51 ms │ 1468.93 / 1546.67 ±78.33 / 1687.76 ms │     no change │
│ QQuery 35 │     294.45 / 299.20 ±3.76 / 304.23 ms │    299.49 / 327.04 ±35.71 / 395.97 ms │  1.09x slower │
│ QQuery 36 │        61.68 / 65.22 ±2.17 / 67.54 ms │        62.18 / 65.50 ±2.98 / 70.32 ms │     no change │
│ QQuery 37 │        35.68 / 37.76 ±3.47 / 44.69 ms │        35.70 / 39.15 ±3.11 / 43.11 ms │     no change │
│ QQuery 38 │        39.42 / 43.79 ±4.00 / 49.31 ms │        39.17 / 42.63 ±2.90 / 47.81 ms │     no change │
│ QQuery 39 │     126.01 / 136.73 ±5.41 / 140.68 ms │     134.34 / 142.38 ±5.05 / 148.03 ms │     no change │
│ QQuery 40 │        14.69 / 14.98 ±0.18 / 15.24 ms │        14.36 / 20.30 ±8.29 / 36.16 ms │  1.35x slower │
│ QQuery 41 │        14.40 / 16.19 ±3.20 / 22.58 ms │        13.71 / 15.85 ±3.62 / 23.08 ms │     no change │
│ QQuery 42 │        13.79 / 16.09 ±2.45 / 19.58 ms │        13.56 / 15.05 ±2.56 / 20.15 ms │ +1.07x faster │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 20402.34ms │
│ Total Time (row-group-morselization)   │ 20264.78ms │
│ Average Time (HEAD)                    │   474.47ms │
│ Average Time (row-group-morselization) │   471.27ms │
│ Queries Faster                         │          2 │
│ Queries Slower                         │          8 │
│ Queries with No Change                 │         33 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 105.0s
Peak memory 30.2 GiB
Avg memory 23.0 GiB
CPU user 1082.4s
CPU sys 65.1s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 105.0s
Peak memory 30.3 GiB
Avg memory 23.1 GiB
CPU user 1090.1s
CPU sys 68.7s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

Benchmark for this request hit the 7200s job deadline before finishing.

Benchmarks requested: tpch

Kubernetes message
Job was active longer than specified deadline

File an issue against this benchmark runner

@adriangb
Copy link
Copy Markdown
Contributor

@Dandandan can you rebase / merge main to get #21625 and we can see if that fixes tcph timing out?

@Dandandan
Copy link
Copy Markdown
Contributor Author

run benchmark tpch

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4297752534-1755-2lpf9 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-group-morselization (a5c02d8) to 4bff17e (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 22, 2026

This looks really cool @Dandandan -- please let me know when it is ready for review (or if you would like help, etc)

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and row-group-morselization
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃        row-group-morselization ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1  │ 40.77 / 41.86 ±1.09 / 43.41 ms │ 40.55 / 41.53 ±0.90 / 42.83 ms │ no change │
│ QQuery 2  │ 21.03 / 21.61 ±0.61 / 22.60 ms │ 21.05 / 21.35 ±0.21 / 21.61 ms │ no change │
│ QQuery 3  │ 38.27 / 40.28 ±2.15 / 43.87 ms │ 38.41 / 39.22 ±0.48 / 39.72 ms │ no change │
│ QQuery 4  │ 18.56 / 18.78 ±0.38 / 19.54 ms │ 18.65 / 19.14 ±0.25 / 19.40 ms │ no change │
│ QQuery 5  │ 48.15 / 50.33 ±1.65 / 52.43 ms │ 48.44 / 49.37 ±1.18 / 51.71 ms │ no change │
│ QQuery 6  │ 17.31 / 17.52 ±0.20 / 17.86 ms │ 18.02 / 18.22 ±0.17 / 18.43 ms │ no change │
│ QQuery 7  │ 54.22 / 55.36 ±1.11 / 57.28 ms │ 54.52 / 55.60 ±0.76 / 56.68 ms │ no change │
│ QQuery 8  │ 47.91 / 48.23 ±0.21 / 48.54 ms │ 48.70 / 48.95 ±0.14 / 49.12 ms │ no change │
│ QQuery 9  │ 53.58 / 55.19 ±1.88 / 58.85 ms │ 53.48 / 53.78 ±0.25 / 54.10 ms │ no change │
│ QQuery 10 │ 66.07 / 66.29 ±0.18 / 66.50 ms │ 66.88 / 67.40 ±0.27 / 67.66 ms │ no change │
│ QQuery 11 │ 14.36 / 14.90 ±0.84 / 16.57 ms │ 14.18 / 14.36 ±0.16 / 14.65 ms │ no change │
│ QQuery 12 │ 27.86 / 28.27 ±0.53 / 29.25 ms │ 27.82 / 28.19 ±0.45 / 29.07 ms │ no change │
│ QQuery 13 │ 37.29 / 38.28 ±0.73 / 39.31 ms │ 37.03 / 37.29 ±0.22 / 37.60 ms │ no change │
│ QQuery 14 │ 28.08 / 28.44 ±0.35 / 29.10 ms │ 28.85 / 29.03 ±0.15 / 29.25 ms │ no change │
│ QQuery 15 │ 33.51 / 34.05 ±0.75 / 35.52 ms │ 34.90 / 35.04 ±0.12 / 35.23 ms │ no change │
│ QQuery 16 │ 15.30 / 15.91 ±0.78 / 17.46 ms │ 15.46 / 15.50 ±0.03 / 15.53 ms │ no change │
│ QQuery 17 │ 79.85 / 81.18 ±1.65 / 84.32 ms │ 79.13 / 79.84 ±0.39 / 80.30 ms │ no change │
│ QQuery 18 │ 76.34 / 76.85 ±0.54 / 77.79 ms │ 74.84 / 75.67 ±0.44 / 76.14 ms │ no change │
│ QQuery 19 │ 37.82 / 38.12 ±0.29 / 38.63 ms │ 37.98 / 38.38 ±0.40 / 39.05 ms │ no change │
│ QQuery 20 │ 39.57 / 40.60 ±1.23 / 43.00 ms │ 40.38 / 40.86 ±0.40 / 41.59 ms │ no change │
│ QQuery 21 │ 62.19 / 65.21 ±1.75 / 66.78 ms │ 63.04 / 63.60 ±0.62 / 64.75 ms │ no change │
│ QQuery 22 │ 17.48 / 17.63 ±0.19 / 18.00 ms │ 17.30 / 17.43 ±0.10 / 17.54 ms │ no change │
└───────────┴────────────────────────────────┴────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                      ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 894.89ms │
│ Total Time (row-group-morselization)   │ 889.74ms │
│ Average Time (HEAD)                    │  40.68ms │
│ Average Time (row-group-morselization) │  40.44ms │
│ Queries Faster                         │        0 │
│ Queries Slower                         │        0 │
│ Queries with No Change                 │       22 │
│ Queries with Failure                   │        0 │
└────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric Value
Wall time 5.0s
Peak memory 5.4 GiB
Avg memory 4.8 GiB
CPU user 33.8s
CPU sys 2.5s
Peak spill 0 B

tpch — branch

Metric Value
Wall time 5.0s
Peak memory 5.5 GiB
Avg memory 4.8 GiB
CPU user 34.0s
CPU sys 2.4s
Peak spill 0 B

File an issue against this benchmark runner

@Dandandan
Copy link
Copy Markdown
Contributor Author

This looks really cool @Dandandan -- please let me know when it is ready for review (or if you would like help, etc)

I think the perf results look good - I think we mostly need to find out the right API that will also extend to further splitting/merging and other optimizations...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce morsel-driven Parquet scan

4 participants