Skip to content

perf: Optimize split_part for scalar args#21238

Merged
alamb merged 6 commits intoapache:mainfrom
neilconway:neilc/optimize-split-part-scalar
Apr 6, 2026
Merged

perf: Optimize split_part for scalar args#21238
alamb merged 6 commits intoapache:mainfrom
neilconway:neilc/optimize-split-part-scalar

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

@neilconway neilconway commented Mar 29, 2026

Which issue does this PR close?

Rationale for this change

In practice, split_part(string, delimiter, position) is often invoked with constant values for delimiter and position. We can take advantage of that to hoist some conditional branches out of the per-row hot loop; more importantly, we can switch from using str::split to building a memchr::memmem::Finder and using it for each row. Building a Finder is relatively expensive but it's a clear win when we can amortize that one-time cost over an entire input batch.

Benchmarks (M4 Max):

  • scalar_utf8_single_char/pos_first: 105 µs → 41 µs, -61%
  • scalar_utf8_single_char/pos_middle: 358 µs → 97 µs, -73%
  • scalar_utf8_single_char/pos_negative: 110 µs → 46 µs, -58%
  • scalar_utf8_multi_char/pos_middle: 355 µs → 132 µs, -63%
  • scalar_utf8_long_strings/pos_middle: 1.97 ms → 1.11 ms, -43%
  • scalar_utf8view_long_parts/pos_middle: 467 µs → 169 µs, -63%
  • array_utf8_single_char/pos_middle: 351 µs → 357 µs, no change
  • array_utf8_multi_char/pos_middle: 366 µs → 357 µs, -2.6%

What changes are included in this PR?

  • Add benchmarks for split_part with scalar delimiter and position
  • Add new fast-path for split_part with scalar delimiter and position
  • Add SLT tests for split_part with scalar delimiter and position

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Mar 29, 2026
@neilconway
Copy link
Copy Markdown
Contributor Author

@martin-g Any interest in reviewing this PR? It's a follow-on to the initial split_work work that was done in #21119

@martin-g
Copy link
Copy Markdown
Member

martin-g commented Apr 2, 2026

I'll review it later! Thanks for the ping!

@neilconway
Copy link
Copy Markdown
Contributor Author

I'll review it later! Thanks for the ping!

Amazing, thank you!

Comment thread datafusion/sqllogictest/test_files/expr.slt
Comment thread datafusion/functions/src/string/split_part.rs Outdated
Comment thread datafusion/functions/src/string/split_part.rs Outdated
Comment thread datafusion/functions/src/string/split_part.rs Outdated
Comment thread datafusion/functions/src/string/split_part.rs
string_array.as_string_view(),
delimiter,
position,
StringViewBuilder::with_capacity(string_array.len()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this implementation still copies strings for StringView -- however, you can probably just adjust the view portions if you want to avoid a copy

As another PR perhaps

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! I wanted to land this first, I'll take a look at avoiding copies for StringView shortly. I filed #21410 for this.

@alamb alamb added the performance Make DataFusion faster label Apr 6, 2026
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 6, 2026

Thanks @martin-g and @neilconway

@alamb alamb added this pull request to the merge queue Apr 6, 2026
Merged via the queue into apache:main with commit 7fa7fe0 Apr 6, 2026
31 checks passed
@neilconway neilconway deleted the neilc/optimize-split-part-scalar branch April 6, 2026 16:19
Dandandan pushed a commit to Dandandan/arrow-datafusion that referenced this pull request Apr 8, 2026
## Which issue does this PR close?

- Closes apache#21204.

## Rationale for this change

In practice, `split_part(string, delimiter, position)` is often invoked
with constant values for `delimiter` and `position`. We can take
advantage of that to hoist some conditional branches out of the per-row
hot loop; more importantly, we can switch from using `str::split` to
building a `memchr::memmem::Finder` and using it for each row. Building
a `Finder` is relatively expensive but it's a clear win when we can
amortize that one-time cost over an entire input batch.

Benchmarks (M4 Max):

  - `scalar_utf8_single_char/pos_first`: 105 µs → 41 µs, -61%
  - `scalar_utf8_single_char/pos_middle`: 358 µs → 97 µs, -73%
  - `scalar_utf8_single_char/pos_negative`: 110 µs → 46 µs, -58%
  - `scalar_utf8_multi_char/pos_middle`: 355 µs → 132 µs, -63%
  - `scalar_utf8_long_strings/pos_middle`: 1.97 ms → 1.11 ms, -43%
  - `scalar_utf8view_long_parts/pos_middle`: 467 µs → 169 µs, -63%
  - `array_utf8_single_char/pos_middle`: 351 µs → 357 µs, no change
  - `array_utf8_multi_char/pos_middle`: 366 µs → 357 µs, -2.6%

## What changes are included in this PR?

* Add benchmarks for `split_part` with scalar delimiter and position
* Add new fast-path for `split_part` with scalar delimiter and position
* Add SLT tests for `split_part` with scalar delimiter and position

## Are these changes tested?

Yes.

## Are there any user-facing changes?

No.
Rich-T-kid pushed a commit to Rich-T-kid/datafusion that referenced this pull request Apr 21, 2026
## Which issue does this PR close?

- Closes apache#21204.

## Rationale for this change

In practice, `split_part(string, delimiter, position)` is often invoked
with constant values for `delimiter` and `position`. We can take
advantage of that to hoist some conditional branches out of the per-row
hot loop; more importantly, we can switch from using `str::split` to
building a `memchr::memmem::Finder` and using it for each row. Building
a `Finder` is relatively expensive but it's a clear win when we can
amortize that one-time cost over an entire input batch.

Benchmarks (M4 Max):

  - `scalar_utf8_single_char/pos_first`: 105 µs → 41 µs, -61%
  - `scalar_utf8_single_char/pos_middle`: 358 µs → 97 µs, -73%
  - `scalar_utf8_single_char/pos_negative`: 110 µs → 46 µs, -58%
  - `scalar_utf8_multi_char/pos_middle`: 355 µs → 132 µs, -63%
  - `scalar_utf8_long_strings/pos_middle`: 1.97 ms → 1.11 ms, -43%
  - `scalar_utf8view_long_parts/pos_middle`: 467 µs → 169 µs, -63%
  - `array_utf8_single_char/pos_middle`: 351 µs → 357 µs, no change
  - `array_utf8_multi_char/pos_middle`: 366 µs → 357 µs, -2.6%

## What changes are included in this PR?

* Add benchmarks for `split_part` with scalar delimiter and position
* Add new fast-path for `split_part` with scalar delimiter and position
* Add SLT tests for `split_part` with scalar delimiter and position

## Are these changes tested?

Yes.

## Are there any user-facing changes?

No.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation performance Make DataFusion faster sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize split_part for scalar args

3 participants