perf: Optimize split_part for scalar args#21238
Merged
alamb merged 6 commits intoapache:mainfrom Apr 6, 2026
Merged
Conversation
Contributor
Author
Member
|
I'll review it later! Thanks for the ping! |
Contributor
Author
Amazing, thank you! |
martin-g
reviewed
Apr 2, 2026
martin-g
approved these changes
Apr 5, 2026
alamb
reviewed
Apr 6, 2026
| string_array.as_string_view(), | ||
| delimiter, | ||
| position, | ||
| StringViewBuilder::with_capacity(string_array.len()), |
Contributor
There was a problem hiding this comment.
I think this implementation still copies strings for StringView -- however, you can probably just adjust the view portions if you want to avoid a copy
As another PR perhaps
Contributor
Author
There was a problem hiding this comment.
Yep! I wanted to land this first, I'll take a look at avoiding copies for StringView shortly. I filed #21410 for this.
Contributor
|
Thanks @martin-g and @neilconway |
Dandandan
pushed a commit
to Dandandan/arrow-datafusion
that referenced
this pull request
Apr 8, 2026
## Which issue does this PR close? - Closes apache#21204. ## Rationale for this change In practice, `split_part(string, delimiter, position)` is often invoked with constant values for `delimiter` and `position`. We can take advantage of that to hoist some conditional branches out of the per-row hot loop; more importantly, we can switch from using `str::split` to building a `memchr::memmem::Finder` and using it for each row. Building a `Finder` is relatively expensive but it's a clear win when we can amortize that one-time cost over an entire input batch. Benchmarks (M4 Max): - `scalar_utf8_single_char/pos_first`: 105 µs → 41 µs, -61% - `scalar_utf8_single_char/pos_middle`: 358 µs → 97 µs, -73% - `scalar_utf8_single_char/pos_negative`: 110 µs → 46 µs, -58% - `scalar_utf8_multi_char/pos_middle`: 355 µs → 132 µs, -63% - `scalar_utf8_long_strings/pos_middle`: 1.97 ms → 1.11 ms, -43% - `scalar_utf8view_long_parts/pos_middle`: 467 µs → 169 µs, -63% - `array_utf8_single_char/pos_middle`: 351 µs → 357 µs, no change - `array_utf8_multi_char/pos_middle`: 366 µs → 357 µs, -2.6% ## What changes are included in this PR? * Add benchmarks for `split_part` with scalar delimiter and position * Add new fast-path for `split_part` with scalar delimiter and position * Add SLT tests for `split_part` with scalar delimiter and position ## Are these changes tested? Yes. ## Are there any user-facing changes? No.
Rich-T-kid
pushed a commit
to Rich-T-kid/datafusion
that referenced
this pull request
Apr 21, 2026
## Which issue does this PR close? - Closes apache#21204. ## Rationale for this change In practice, `split_part(string, delimiter, position)` is often invoked with constant values for `delimiter` and `position`. We can take advantage of that to hoist some conditional branches out of the per-row hot loop; more importantly, we can switch from using `str::split` to building a `memchr::memmem::Finder` and using it for each row. Building a `Finder` is relatively expensive but it's a clear win when we can amortize that one-time cost over an entire input batch. Benchmarks (M4 Max): - `scalar_utf8_single_char/pos_first`: 105 µs → 41 µs, -61% - `scalar_utf8_single_char/pos_middle`: 358 µs → 97 µs, -73% - `scalar_utf8_single_char/pos_negative`: 110 µs → 46 µs, -58% - `scalar_utf8_multi_char/pos_middle`: 355 µs → 132 µs, -63% - `scalar_utf8_long_strings/pos_middle`: 1.97 ms → 1.11 ms, -43% - `scalar_utf8view_long_parts/pos_middle`: 467 µs → 169 µs, -63% - `array_utf8_single_char/pos_middle`: 351 µs → 357 µs, no change - `array_utf8_multi_char/pos_middle`: 366 µs → 357 µs, -2.6% ## What changes are included in this PR? * Add benchmarks for `split_part` with scalar delimiter and position * Add new fast-path for `split_part` with scalar delimiter and position * Add SLT tests for `split_part` with scalar delimiter and position ## Are these changes tested? Yes. ## Are there any user-facing changes? No.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
split_partfor scalar args #21204.Rationale for this change
In practice,
split_part(string, delimiter, position)is often invoked with constant values fordelimiterandposition. We can take advantage of that to hoist some conditional branches out of the per-row hot loop; more importantly, we can switch from usingstr::splitto building amemchr::memmem::Finderand using it for each row. Building aFinderis relatively expensive but it's a clear win when we can amortize that one-time cost over an entire input batch.Benchmarks (M4 Max):
scalar_utf8_single_char/pos_first: 105 µs → 41 µs, -61%scalar_utf8_single_char/pos_middle: 358 µs → 97 µs, -73%scalar_utf8_single_char/pos_negative: 110 µs → 46 µs, -58%scalar_utf8_multi_char/pos_middle: 355 µs → 132 µs, -63%scalar_utf8_long_strings/pos_middle: 1.97 ms → 1.11 ms, -43%scalar_utf8view_long_parts/pos_middle: 467 µs → 169 µs, -63%array_utf8_single_char/pos_middle: 351 µs → 357 µs, no changearray_utf8_multi_char/pos_middle: 366 µs → 357 µs, -2.6%What changes are included in this PR?
split_partwith scalar delimiter and positionsplit_partwith scalar delimiter and positionsplit_partwith scalar delimiter and positionAre these changes tested?
Yes.
Are there any user-facing changes?
No.