perf(fp-history): batch false positive history processing#14449
Open
valentijnscholten wants to merge 4 commits intoDefectDojo:devfrom
Open
perf(fp-history): batch false positive history processing#14449valentijnscholten wants to merge 4 commits intoDefectDojo:devfrom
valentijnscholten wants to merge 4 commits intoDefectDojo:devfrom
Conversation
Replaces the N+1 query pattern in false positive history with a single product-scoped DB query per batch, and switches per-finding save() calls to QuerySet.update() to eliminate redundant signal overhead. Changes: - Extract _fp_candidates_qs() as the single algorithm-dispatch helper shared by both single-finding and batch lookup paths - Add do_false_positive_history_batch() which fetches all FP candidates in one query and marks findings with a single UPDATE - do_false_positive_history() now delegates to the batch function - post_process_findings_batch (import/reimport) calls the batch function instead of a per-finding loop - _bulk_update_finding_status_and_severity (bulk edit) groups findings by (product, dedup_alg) and calls the batch function once per group; retroactive reactivation also batched the same way - Fix dead-code bug in process_false_positive_history: the condition finding.false_p and not finding.false_p was always False because form.save(commit=False) mutates the finding in place; fixed by capturing old_false_p before the form save - Replace all per-finding save()/save_no_options() in FP history paths with QuerySet.update() (bypasses signals identically to the old calls) - Move all FP history helpers from dojo/utils.py to dojo/finding/deduplication.py alongside the matching dedupe helpers All update() calls carry a comment explaining the signal-bypass equivalence with the previous save(skip_validation=True) calls. Adds 4 unit tests covering: batch single-query behaviour, retroactive batch FP marking, retroactive reactivation (previously dead code), and the no-reactivation guard.
Limit _fetch_fp_candidates_for_batch to only the fields actually read from candidate objects (id, false_p, active, hash_code, unique_id_from_tool, title, severity), avoiding loading unused columns. Correct update() comments to clarify that .only() does not constrain QuerySet.update() — Django generates UPDATE SQL independently — so the sync requirement is only for fields *read* from candidate objects.
assertNumQueries(7) on both batch tests covers: System_Settings, 4 lazy-load chain (test/engagement/product/test_type from findings[0]), candidates SELECT with .only(), and the bulk UPDATE — fixed regardless of batch size or number of retroactively marked findings.
New test creates 5 pre-existing findings and asserts the batch still uses exactly 7 queries regardless — proving the old O(N) per-finding save loop is gone and a single bulk UPDATE covers all affected rows.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
match_finding_to_existing_findings()calls with a single product-scoped batch query (_fetch_fp_candidates_for_batch) shared across all findings in the batchsave()/save_no_options()calls in false positive history paths withQuerySet.update(), which bypasses Django signals identically to the previous calls (all update sites carry a comment explaining this)post_process_findings_batch(import/reimport): now callsdo_false_positive_history_batch()instead of a per-finding loop — one DB query instead of N_bulk_update_finding_status_and_severity(bulk edit): findings grouped by(product, dedup_alg)and processed with a single batch call per group; retroactive reactivation also batchedprocess_false_positive_historyin the single-finding edit view had the conditionfinding.false_p and not finding.false_p(alwaysFalse) becauseform.save(commit=False)withinstance=findingmutates the object in place. Fixed by capturingold_false_p = finding.false_pbefore the form save and passing it as a keyword argument_fp_candidates_qs()as the single source of truth for hash_code / unique_id_from_tool / unique_id_or_hash / legacy query building, shared by bothmatch_finding_to_existing_findings(returns lazy QS for chaining) and_fetch_fp_candidates_for_batch(evaluates into a keyed dict)deduplication.py: all FP history helpers relocated fromdojo/utils.pytodojo/finding/deduplication.pyalongside the equivalent dedupe helpers; import sites inhelper.py,views.py, and tests updated accordinglyNeeds a Pro PR to cater for the moved/renamed methods.