Skip to content

[arrow-ord]: add REE slice offset regression test#5

Draft
alamb wants to merge 3 commits intopolarsignals:asubiotto/reecmpfrom
alamb:alamb/ree-slice-offset-regression
Draft

[arrow-ord]: add REE slice offset regression test#5
alamb wants to merge 3 commits intopolarsignals:asubiotto/reecmpfrom
alamb:alamb/ree-slice-offset-regression

Conversation

@alamb
Copy link
Copy Markdown

@alamb alamb commented Apr 14, 2026

asubiotto and others added 2 commits April 3, 2026 23:42
This commit implements native comparisons on REE-encoded arrays which are
treated similarly to dictionary indirection.

This commit implements REE to scalar comparisons by operating on the physical
values only then bulk expanding the boolean result.

REE-to-REE comparisons are also optimized by computing aligned physical value
runs to minimize comparisons.

Mixed cases (REE vs flat) materialize a logical index mapping similar to
dictionaries.

This commit also supports REE<Dict>.

For comparison, here are the benchmark results with flat arrays as a reference
on my local machine:
```
eq Int32                time:   [14.955 µs 15.162 µs 15.396 µs]
eq scalar Int32         time:   [11.379 µs 11.418 µs 11.459 µs]

ree_comparison/eq_ree_scalar(phys=64,log=65536)     time:   [453.31 ns
454.88 ns 456.43 ns]
ree_comparison/eq_ree_scalar(phys=1024,log=65536)   time:   [4.1224 µs
4.1298 µs 4.1368 µs]
ree_comparison/eq_ree_scalar(phys=32768,log=65536)  time:   [93.506 µs
94.085 µs 94.993 µs]
ree_comparison/eq_ree_ree(phys=64,log=65536)        time:   [413.96 ns
414.82 ns 415.87 ns]
ree_comparison/eq_ree_ree(phys=1024,log=65536)      time:   [4.1597 µs
4.1660 µs 4.1749 µs]
ree_comparison/eq_ree_ree(phys=32768,log=65536)     time:   [128.74 µs
144.40 µs 161.53 µs]
```

As is expected, the more we take advantage of REE encoding, the faster the
comparisons are.

Signed-off-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com>
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Comment thread arrow-ord/src/cmp.rs
}

#[test]
fn test_ree_sliced_different_offsets() {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should pass, but fails like


  cargo test -p arrow-ord test_ree_sliced_different_offsets -- --nocapture

     Compiling arrow-ord v58.1.0 (/private/tmp/arrow-pr9621-review/arrow-ord)
      Finished `test` profile [unoptimized + debuginfo] target(s) in 0.75s
       Running unittests src/lib.rs (target/debug/deps/arrow_ord-144a95337e3b3020)

  running 1 test

  thread 'cmp::tests::test_ree_sliced_different_offsets' (36129756) panicked at arrow-ord/src/cmp.rs:1396:9:
  assertion `left == right` failed
    left: BooleanArray
  [
    true,
    false,
    true,
    true,
  ]
   right: BooleanArray
  [
    true,
    true,
    true,
    true,
  ]
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
  test cmp::tests::test_ree_sliced_different_offsets ... FAILED

  failures:

  failures:
      cmp::tests::test_ree_sliced_different_offsets

  test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 308 filtered out; finished in 0.00s

  error: test failed, to rerun pass `-p arrow-ord --lib`

@asubiotto asubiotto force-pushed the asubiotto/reecmp branch 3 times, most recently from cfc2a0a to a0a7521 Compare April 17, 2026 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants