Skip to content

[Json] Use partition and take in RunEndEncoded decoder#9658

Merged
Jefffrey merged 5 commits intoapache:mainfrom
liamzwbao:issue-9645-ree-optimization
Apr 19, 2026
Merged

[Json] Use partition and take in RunEndEncoded decoder#9658
Jefffrey merged 5 commits intoapache:mainfrom
liamzwbao:issue-9645-ree-optimization

Conversation

@liamzwbao
Copy link
Copy Markdown
Contributor

@liamzwbao liamzwbao commented Apr 3, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Optimize RunEndEncoded decoder to use partition and take, substantially improving performance (over 2x speedup).

Are these changes tested?

Covered by existing tests

Are there any user-facing changes?

No

@github-actions github-actions Bot added the arrow Changes to the arrow crate label Apr 3, 2026
@liamzwbao liamzwbao force-pushed the issue-9645-ree-optimization branch from 568cceb to ffd5849 Compare April 3, 2026 21:46
@liamzwbao liamzwbao marked this pull request as ready for review April 3, 2026 21:49
@liamzwbao
Copy link
Copy Markdown
Contributor Author

Local benchmark results

Benchmarking decode_short_ree_runs_json/131072: Collecting 50 samples in estimated 5.1127 s (650 ite
decode_short_ree_runs_json/131072
        time:   [7.7892 ms 7.8779 ms 7.9584 ms]
        thrpt:  [217.23 MiB/s 219.45 MiB/s 221.95 MiB/s]
 change:
        time:   [−49.865% −49.220% −48.647%] (p = 0.00 < 0.05)
        thrpt:  [+94.730% +96.929% +99.463%]
        Performance has improved.

Benchmarking decode_long_ree_runs_json/131072: Collecting 50 samples in estimated 5.1862 s (700 iter
decode_long_ree_runs_json/131072
        time:   [6.5677 ms 6.6715 ms 6.7831 ms]
        thrpt:  [223.96 MiB/s 227.71 MiB/s 231.31 MiB/s]
 change:
        time:   [−56.851% −56.093% −55.362%] (p = 0.00 < 0.05)
        thrpt:  [+124.03% +127.75% +131.75%]
        Performance has improved.

Benchmarking decode_short_ree_runs_serialize: Collecting 100 samples in estimated 5.8899 s (600 iter
decode_short_ree_runs_serialize
        time:   [8.9996 ms 9.0555 ms 9.1159 ms]
        change: [−49.254% −48.848% −48.441%] (p = 0.00 < 0.05)
        Performance has improved.

Benchmarking decode_long_ree_runs_serialize: Collecting 100 samples in estimated 5.2928 s (600 itera
decode_long_ree_runs_serialize
        time:   [8.4019 ms 8.4652 ms 8.5338 ms]
        change: [−52.747% −52.330% −51.944%] (p = 0.00 < 0.05)
        Performance has improved.

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @liamzwbao

Comment thread arrow-json/Cargo.toml
arrow-buffer = { workspace = true }
arrow-cast = { workspace = true }
arrow-data = { workspace = true }
arrow-ord = { workspace = true }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are fairly non trivial crates (ord and select) so it is sad to see the dependencies being added here

That being said, I think it is becoming clear that anything involving REE benefits from those two

Maybe we could split them up or osmething into new crates with the "core" parts (specifically partition and take) 🤔

arrow-take and arrow-partition maybe 🤔

I'll file a ticket to consider this

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting that arrow-cast already brings in ord & select in it's dependencies, so I don't think this actually affects the build too much (compared to previous state)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting that arrow-cast already brings in ord & select in it's dependencies, so I don't think this actually affects the build too much (compared to previous state)

yeah, adding REE to cast required those new dependencies. We have some ideas of how to avoid it but they aren't great

let values_data = mutable.freeze();
let run_ends_data =
PrimitiveArray::<R>::new(ScalarBuffer::from(run_ends), None).into_data();
let indices = UInt32Array::from_iter_values(indices.into_iter().map(|i| i as u32));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory the old code could also handle usize indices not ust u32 but I think in practice it won't matter

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, also indices are bounded by the tape pos (which is &[u32]), so they should never exceed u32::MAX IIUC

Copy link
Copy Markdown
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behaviour of efficiently generating a REE from a flat array seems generic enough to be worth splitting into a separate function (that maybe other users might need), but could be a followup 🤔

@Jefffrey Jefffrey merged commit fa1cc58 into apache:main Apr 19, 2026
24 checks passed
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 20, 2026

This behaviour of efficiently generating a REE from a flat array seems generic enough to be worth splitting into a separate function (that maybe other users might need), but could be a followup 🤔

It also exists in the cast kernels -- so refactoring that out (or reusing the cast) would be good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Json] RunEndEncoded decoder optimization

3 participants