[Json] Use `partition` and `take` in RunEndEncoded decoder by liamzwbao · Pull Request #9658 · apache/arrow-rs

liamzwbao · 2026-04-03T01:14:23Z

Which issue does this PR close?

Closes [Json] RunEndEncoded decoder optimization #9645 .

Rationale for this change

What changes are included in this PR?

Optimize RunEndEncoded decoder to use partition and take, substantially improving performance (over 2x speedup).

Are these changes tested?

Covered by existing tests

Are there any user-facing changes?

No

liamzwbao · 2026-04-03T21:56:49Z

Local benchmark results

Benchmarking decode_short_ree_runs_json/131072: Collecting 50 samples in estimated 5.1127 s (650 ite
decode_short_ree_runs_json/131072
        time:   [7.7892 ms 7.8779 ms 7.9584 ms]
        thrpt:  [217.23 MiB/s 219.45 MiB/s 221.95 MiB/s]
 change:
        time:   [−49.865% −49.220% −48.647%] (p = 0.00 < 0.05)
        thrpt:  [+94.730% +96.929% +99.463%]
        Performance has improved.

Benchmarking decode_long_ree_runs_json/131072: Collecting 50 samples in estimated 5.1862 s (700 iter
decode_long_ree_runs_json/131072
        time:   [6.5677 ms 6.6715 ms 6.7831 ms]
        thrpt:  [223.96 MiB/s 227.71 MiB/s 231.31 MiB/s]
 change:
        time:   [−56.851% −56.093% −55.362%] (p = 0.00 < 0.05)
        thrpt:  [+124.03% +127.75% +131.75%]
        Performance has improved.

Benchmarking decode_short_ree_runs_serialize: Collecting 100 samples in estimated 5.8899 s (600 iter
decode_short_ree_runs_serialize
        time:   [8.9996 ms 9.0555 ms 9.1159 ms]
        change: [−49.254% −48.848% −48.441%] (p = 0.00 < 0.05)
        Performance has improved.

Benchmarking decode_long_ree_runs_serialize: Collecting 100 samples in estimated 5.2928 s (600 itera
decode_long_ree_runs_serialize
        time:   [8.4019 ms 8.4652 ms 8.5338 ms]
        change: [−52.747% −52.330% −51.944%] (p = 0.00 < 0.05)
        Performance has improved.

alamb

Looks good to me -- thank you @liamzwbao

alamb · 2026-04-16T10:30:05Z

 arrow-buffer = { workspace = true }
 arrow-cast = { workspace = true }
 arrow-data = { workspace = true }
+arrow-ord = { workspace = true }


These are fairly non trivial crates (ord and select) so it is sad to see the dependencies being added here

That being said, I think it is becoming clear that anything involving REE benefits from those two

Maybe we could split them up or osmething into new crates with the "core" parts (specifically partition and take) 🤔

arrow-take and arrow-partition maybe 🤔

I'll file a ticket to consider this

New crates for take and partition kernels (for use by REE) #9737

Worth noting that arrow-cast already brings in ord & select in it's dependencies, so I don't think this actually affects the build too much (compared to previous state)

Worth noting that arrow-cast already brings in ord & select in it's dependencies, so I don't think this actually affects the build too much (compared to previous state)

yeah, adding REE to cast required those new dependencies. We have some ideas of how to avoid it but they aren't great

Remove arrow-ord dependency in arrow-cast due to RunEndEncoded casting #8708

alamb · 2026-04-16T10:35:17Z

-        let values_data = mutable.freeze();
-        let run_ends_data =
-            PrimitiveArray::<R>::new(ScalarBuffer::from(run_ends), None).into_data();
+        let indices = UInt32Array::from_iter_values(indices.into_iter().map(|i| i as u32));


In theory the old code could also handle usize indices not ust u32 but I think in practice it won't matter

Yes, also indices are bounded by the tape pos (which is &[u32]), so they should never exceed u32::MAX IIUC

Jefffrey

This behaviour of efficiently generating a REE from a flat array seems generic enough to be worth splitting into a separate function (that maybe other users might need), but could be a followup 🤔

alamb · 2026-04-20T14:01:39Z

This behaviour of efficiently generating a REE from a flat array seems generic enough to be worth splitting into a separate function (that maybe other users might need), but could be a followup 🤔

It also exists in the cast kernels -- so refactoring that out (or reusing the cast) would be good

github-actions Bot added the arrow Changes to the arrow crate label Apr 3, 2026

[Json] Use partition and take in RunEndEncoded decoder

ffd5849

liamzwbao force-pushed the issue-9645-ree-optimization branch from 568cceb to ffd5849 Compare April 3, 2026 21:46

liamzwbao marked this pull request as ready for review April 3, 2026 21:49

liamzwbao added 2 commits April 6, 2026 18:17

Merge branch 'main' into issue-9645-ree-optimization

573c6f8

Merge branch 'main' into issue-9645-ree-optimization

100c421

alamb added the performance label Apr 16, 2026

alamb approved these changes Apr 16, 2026

View reviewed changes

alamb reviewed Apr 16, 2026

View reviewed changes

liamzwbao added 2 commits April 16, 2026 08:53

Merge branch 'main' into issue-9645-ree-optimization

3b64d9d

Move arrow-data to dev-dependencies

db8f95f

Jefffrey approved these changes Apr 19, 2026

View reviewed changes

Jefffrey merged commit fa1cc58 into apache:main Apr 19, 2026
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Json] Use `partition` and `take` in RunEndEncoded decoder#9658

[Json] Use `partition` and `take` in RunEndEncoded decoder#9658
Jefffrey merged 5 commits intoapache:mainfrom
liamzwbao:issue-9645-ree-optimization

liamzwbao commented Apr 3, 2026 •

edited

Loading

Uh oh!

liamzwbao commented Apr 3, 2026

Uh oh!

alamb left a comment

Uh oh!

alamb Apr 16, 2026

Uh oh!

alamb Apr 16, 2026

Uh oh!

Jefffrey Apr 19, 2026

Uh oh!

alamb Apr 20, 2026

Uh oh!

alamb Apr 16, 2026

Uh oh!

liamzwbao Apr 16, 2026

Uh oh!

Jefffrey left a comment

Uh oh!

Uh oh!

alamb commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

liamzwbao commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

liamzwbao commented Apr 3, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

alamb Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Jefffrey Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

alamb Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

alamb Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

liamzwbao Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alamb commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

liamzwbao commented Apr 3, 2026 •

edited

Loading