Skip to content

⚡ Bolt: optimize NestedLoopJoin by reducing redundant materialization#287

Open
Dandandan wants to merge 1 commit intomainfrom
bolt/nlj-optimizations-6234975214165188266
Open

⚡ Bolt: optimize NestedLoopJoin by reducing redundant materialization#287
Dandandan wants to merge 1 commit intomainfrom
bolt/nlj-optimizations-6234975214165188266

Conversation

@Dandandan
Copy link
Copy Markdown
Owner

💡 What: Optimized NestedLoopJoinExec by filtering indices before take and performing lazy column-wise filtering for projected outputs.
🎯 Why: The previous implementation performed redundant data materialization by filtering the entire Cartesian product or the whole probe batch, even for discarded rows or columns.
📊 Impact: Improves join performance and reduces memory usage for joins with selective filters and projections.
🔬 Measurement: Verified using cargo test -p datafusion-physical-plan and cargo clippy. Added detailed comments to the optimized code sections.


PR created automatically by Jules for task 6234975214165188266 started by @Dandandan

This patch implements two performance optimizations in `NestedLoopJoinExec`:

1.  **Index-First Filtering in `process_left_range_join`**: Instead of materializing all potential join result rows (the full Cartesian product) and then filtering them, we now filter the join indices first. This ensures that the expensive `take` operation is only performed for rows that actually satisfy the join condition.
2.  **Projection-Aware Filtering in `build_row_join_batch`**: We now apply probe-side filters lazily within the column materialization loop. This avoids filtering the entire `RecordBatch` when only a subset of columns is required for the output projection.

Impact: Reduces CPU cycles and memory allocations for Nested Loop Joins, especially when filters are selective or when many columns are projected out.
Measurement: Verified with existing unit tests and clippy.Cardianality of intermediate batches is reduced.

Co-authored-by: Dandandan <163737+Dandandan@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant