Skip to content

feat: implement IsNotNull expression in vortex expression library#6969

Merged
robert3005 merged 2 commits intovortex-data:developfrom
xiaoxuandev:fix-6040
Apr 13, 2026
Merged

feat: implement IsNotNull expression in vortex expression library#6969
robert3005 merged 2 commits intovortex-data:developfrom
xiaoxuandev:fix-6040

Conversation

@xiaoxuandev
Copy link
Copy Markdown
Contributor

Summary

Closes: #6040

Add a first-class IsNotNull scalar function, replacing the previous Not(IsNull(...)) composition pattern. This simplifies the expression tree and enables direct stat_falsification for zone map pruning.

Changes:

New is_not_null.rs with ScalarFnVTable implementation, including stat_falsification using is_constant && null_count > 0 (with TODO for future RowCount stat)
Updated all integration points: DataFusion, DuckDB, Python/Substrait to use is_not_null(...) directly
Replaced the Not(IsNull(...)) fallback in erased.rs validity with IsNotNull
Registered IsNotNull in ScalarFnSession and ExprBuiltins/ArrayBuiltins

AI Assistance Disclosure

This PR was developed with AI assistance (Kiro). AI was used for code review, implementing stat_falsification, writing tests, and drafting the PR description. All output was reviewed and validated by the author.

API Changes
New public APIs:

vortex_array::expr::is_not_null(child) — creates an IsNotNull expression
Expression::is_not_null() / ArrayRef::is_not_null() via ExprBuiltins/ArrayBuiltins traits
Python: vortex._lib.expr.is_not_null(child)

Testing

9 unit tests covering: return dtype, child replacement, mixed/all-valid/all-invalid evaluation, struct field access, display formatting, null sensitivity, and stat falsification pruning expression generation.

Comment thread vortex-array/src/scalar_fn/fns/is_not_null.rs Outdated
Comment thread vortex-array/src/scalar_fn/fns/is_not_null.rs Outdated
Comment thread vortex-array/src/scalar_fn/fns/is_not_null.rs Outdated
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Apr 1, 2026

Merging this PR will degrade performance by 11.42%

❌ 1 regressed benchmark
✅ 1122 untouched benchmarks
⏩ 1455 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation take_map[(0.1, 0.5)] 966.3 µs 1,090.8 µs -11.42%

Comparing xiaoxuandev:fix-6040 (427e02c) with develop (44c511d)

Open in CodSpeed

Footnotes

  1. 1455 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Comment thread vortex-array/src/scalar_fn/fns/is_not_null.rs Outdated
@a10y
Copy link
Copy Markdown
Contributor

a10y commented Apr 1, 2026

I think if you rebase should fix the wasm-integration check

xiaoxuandev and others added 2 commits April 13, 2026 16:29
Add a first-class IsNotNull scalar function instead of composing
Not(IsNull(...)). This simplifies the expression tree, enables direct
stat_falsification for zone map pruning, and updates all integration
points (DataFusion, DuckDB, Python/Substrait).

The stat_falsification uses is_constant && null_count > 0 as an
approximation since there is no RowCount stat yet.

Closes: vortex-data#6040
Signed-off-by: Xiaoxuan Li <xioxuan@amazon.com>
Signed-off-by: Robert Kruszewski <github@robertk.io>
@robert3005
Copy link
Copy Markdown
Contributor

@xiaoxuandev I have rebased your pr and added java bindings to it. I will do one more pass on the pr so we can hopefully merge it

@robert3005 robert3005 added the changelog/feature A new feature label Apr 13, 2026
@robert3005 robert3005 merged commit 71089dd into vortex-data:develop Apr 13, 2026
111 of 114 checks passed
@xiaoxuandev
Copy link
Copy Markdown
Contributor Author

@robert3005 Thanks for merging the PR! @joseph-isaacs @a10y Thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add an IsNotNull expression

4 participants