Skip to content

Skip per-row convertTypes() in FunctionOperand when types already match#17730

Draft
yashmayya wants to merge 2 commits intoapache:masterfrom
yashmayya:optimize-function-operand-convert-types
Draft

Skip per-row convertTypes() in FunctionOperand when types already match#17730
yashmayya wants to merge 2 commits intoapache:masterfrom
yashmayya:optimize-function-operand-convert-types

Conversation

@yashmayya
Copy link
Contributor

@yashmayya yashmayya commented Feb 19, 2026


Summary

  • Skip per-row convertTypes() in FunctionOperand.apply() when operand types already match function parameter types
  • At construction time, compare each operand's ColumnDataType against the function parameter's ColumnDataType via FunctionUtils.getColumnDataType(). When all match (the common case), set _needsConversion = false and bypass convertTypes() in the hot path
  • Add JMH benchmark (BenchmarkFunctionOperand) to measure the impact

Motivation

  • The TODO on the original code noted: "Optimize per record conversion." Every call to convertTypes() performs isAssignableFrom checks, HashMap lookups (FunctionUtils.getArgumentType), and no-op PinotDataType.convert() calls per argument per row, even when the Calcite planner has already ensured type alignment.

Benchmark Results

  Benchmark                                            Mode  Cnt  Score   Error   Units                                                                                                                                                                                                                                                       
  BenchmarkFunctionOperand.invokeWithoutConvertTypes  thrpt    5  0.166 ± 0.003  ops/us                                                                                                                                                                                                                                                       
  BenchmarkFunctionOperand.invokeWithConvertTypes     thrpt    5  0.041 ± 0.001  ops/us                                                                                                                                                                                                                                                       
  BenchmarkFunctionOperand.applyTypesMatch            thrpt    5  0.022 ± 0.001  ops/us                                                                                                                                                                                                                                                       
  BenchmarkFunctionOperand.applyTypesNeedConversion   thrpt    5  0.019 ± 0.009  ops/us

Isolated, skipping convertTypes() on matching types yields ~4x throughput (0.166 vs 0.041 ops/μs). End-to-end in apply(), the improvement is ~15% as the cost is diluted by toExternal(), reflection-based Method.invoke(), and toInternal().


Edit: This optimization is skipped for array types. In the multi-stage query engine, when data flows between stages via DataBlock serialization, array types can be deserialized into a different Java class than expected (e.g., Double[] instead of double[] for DOUBLE_ARRAY). Our ColumnDataType-level comparison can't detect this because both map to DOUBLE_ARRAY and this is an issue because arrays don't have autoboxing in Java.

@yashmayya yashmayya added enhancement performance multi-stage Related to the multi-stage query engine labels Feb 19, 2026
@codecov-commenter
Copy link

codecov-commenter commented Feb 19, 2026

Codecov Report

❌ Patch coverage is 81.81818% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.19%. Comparing base (7d103b6) to head (14bd6ab).
⚠️ Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
...ery/runtime/operator/operands/FunctionOperand.java 81.81% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master   #17730   +/-   ##
=========================================
  Coverage     63.18%   63.19%           
- Complexity     1499     1502    +3     
=========================================
  Files          3181     3181           
  Lines        190816   190832   +16     
  Branches      29164    29171    +7     
=========================================
+ Hits         120566   120595   +29     
+ Misses        60864    60847   -17     
- Partials       9386     9390    +4     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.16% <81.81%> (-0.02%) ⬇️
java-21 63.15% <81.81%> (+29.12%) ⬆️
temurin 63.19% <81.81%> (+<0.01%) ⬆️
unittests 63.19% <81.81%> (+<0.01%) ⬆️
unittests1 55.59% <81.81%> (+0.01%) ⬆️
unittests2 34.04% <0.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement multi-stage Related to the multi-stage query engine performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants