⚡ Bolt: optimize NestedLoopJoin by reducing redundant materialization#287
⚡ Bolt: optimize NestedLoopJoin by reducing redundant materialization#287
Conversation
This patch implements two performance optimizations in `NestedLoopJoinExec`: 1. **Index-First Filtering in `process_left_range_join`**: Instead of materializing all potential join result rows (the full Cartesian product) and then filtering them, we now filter the join indices first. This ensures that the expensive `take` operation is only performed for rows that actually satisfy the join condition. 2. **Projection-Aware Filtering in `build_row_join_batch`**: We now apply probe-side filters lazily within the column materialization loop. This avoids filtering the entire `RecordBatch` when only a subset of columns is required for the output projection. Impact: Reduces CPU cycles and memory allocations for Nested Loop Joins, especially when filters are selective or when many columns are projected out. Measurement: Verified with existing unit tests and clippy.Cardianality of intermediate batches is reduced. Co-authored-by: Dandandan <163737+Dandandan@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
💡 What: Optimized
NestedLoopJoinExecby filtering indices beforetakeand performing lazy column-wise filtering for projected outputs.🎯 Why: The previous implementation performed redundant data materialization by filtering the entire Cartesian product or the whole probe batch, even for discarded rows or columns.
📊 Impact: Improves join performance and reduces memory usage for joins with selective filters and projections.
🔬 Measurement: Verified using
cargo test -p datafusion-physical-planandcargo clippy. Added detailed comments to the optimized code sections.PR created automatically by Jules for task 6234975214165188266 started by @Dandandan