-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Support Dictionary Arrays in MIN/MAX Aggregates #21315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kosiew
wants to merge
22
commits into
apache:main
Choose a base branch
from
kosiew:dictionary-coercion-21150
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+380
−44
Open
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
fe226dd
Update min_max.rs to support dictionary scalars
kosiew caafe1c
Refactor dictionary min/max logic and tests
kosiew 0bbc56e
Simplify min/max flow in dictionary handling
kosiew 9240400
Fix dictionary min/max behavior in DataFusion
kosiew ed2d3fd
Refactor min/max logic for shared row-wise handling
kosiew dad6e02
Refactor dictionary handling and simplify batch logic
kosiew b92aeef
fix(min_max): rename helper to scalar_row_extreme and update document…
kosiew a80fc77
feat(min_max): rename predicate to requires_logical_row_scan
kosiew 7ea7cb4
feat(min_max): enhance documentation and clarify error messages
kosiew 377fb5d
feat(min_max): add dictionary key-type validation and improve error h…
kosiew 150bc6f
feat(min_max): rename row-scan helper and update match arms
kosiew 7bd29e1
feat: enhance dictionary comparison logic and add unit tests
kosiew 47f75b2
fix: extract scalar comparison logic into min_max_scalar function
kosiew bd8f1ad
feat(aggregate): simplify min/max helper and enhance testing for Dict…
kosiew bca94be
chore: rename variables in min_max.rs for clarity
kosiew ccbff59
feat: refactor min_max to utilize choose_min_max for improved interna…
kosiew 77a518e
feat: reintroduce min_max_batch_generic function for dictionary array…
kosiew 0b8592d
feat: reorder imports in min_max.rs for improved clarity
kosiew a34ddf1
docs: update helper documentation in min_max.rs for dictionary routin…
kosiew ba96f77
feat(min_max): update min_max_batch_generic to handle raw values from…
kosiew 2669a30
Revert "feat(min_max): update min_max_batch_generic to handle raw val…
kosiew e716c92
feat(tests): enhance dictionary array tests with raw values extraction
kosiew File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still do not understand why this function is being changed. The provided reasoning makes no sense considering the original version already works as intended. Especially considering it is not the responsibility of this function that
dictionary.values()isn't semantically correct; it is the responsibility of the caller, which is not being fixed by refactoring this function. Can we please revert the changes to this function.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I amended the comment and locked this down with a new test.
The changed
min_max_batch_genericshape is a secondary improvement (independent of the bug fix): it first finds the first non-null row and only then enters comparison.The two-phase version is easier to follow because it separates setup from comparison.
After phase 1,
extremeis always non-null, so the loop only needs to:currentvalues.The old single-loop mixed these concerns in every iteration (
currentnull,extremenull, or compare), which made the logic more branchy and harder to read.