Skip to content

perf(fp-history): batch false positive history processing#14449

Open
valentijnscholten wants to merge 4 commits intoDefectDojo:devfrom
valentijnscholten:fp-history-batching
Open

perf(fp-history): batch false positive history processing#14449
valentijnscholten wants to merge 4 commits intoDefectDojo:devfrom
valentijnscholten:fp-history-batching

Conversation

@valentijnscholten
Copy link
Member

@valentijnscholten valentijnscholten commented Mar 5, 2026

Summary

  • N+1 query fix: replaced per-finding match_finding_to_existing_findings() calls with a single product-scoped batch query (_fetch_fp_candidates_for_batch) shared across all findings in the batch
  • Bulk update: replaced all per-finding save() / save_no_options() calls in false positive history paths with QuerySet.update(), which bypasses Django signals identically to the previous calls (all update sites carry a comment explaining this)
  • post_process_findings_batch (import/reimport): now calls do_false_positive_history_batch() instead of a per-finding loop — one DB query instead of N
  • _bulk_update_finding_status_and_severity (bulk edit): findings grouped by (product, dedup_alg) and processed with a single batch call per group; retroactive reactivation also batched
  • Dead-code fix: process_false_positive_history in the single-finding edit view had the condition finding.false_p and not finding.false_p (always False) because form.save(commit=False) with instance=finding mutates the object in place. Fixed by capturing old_false_p = finding.false_p before the form save and passing it as a keyword argument
  • Algorithm dispatch unified: extracted _fp_candidates_qs() as the single source of truth for hash_code / unique_id_from_tool / unique_id_or_hash / legacy query building, shared by both match_finding_to_existing_findings (returns lazy QS for chaining) and _fetch_fp_candidates_for_batch (evaluates into a keyed dict)
  • Moved to deduplication.py: all FP history helpers relocated from dojo/utils.py to dojo/finding/deduplication.py alongside the equivalent dedupe helpers; import sites in helper.py, views.py, and tests updated accordingly
  • 4 new unit tests: batch single-query behaviour, retroactive batch FP marking, retroactive reactivation (previously unreachable), and the no-reactivation guard
  • ##Query counts**: added some asserts on query counts to make sure we don't regress to N+1 in the future. Didn't go the full monty as with the import/reimport performance test as FP History is much less used.

Needs a Pro PR to cater for the moved/renamed methods.

Replaces the N+1 query pattern in false positive history with a single
product-scoped DB query per batch, and switches per-finding save() calls
to QuerySet.update() to eliminate redundant signal overhead.

Changes:
- Extract _fp_candidates_qs() as the single algorithm-dispatch helper
  shared by both single-finding and batch lookup paths
- Add do_false_positive_history_batch() which fetches all FP candidates
  in one query and marks findings with a single UPDATE
- do_false_positive_history() now delegates to the batch function
- post_process_findings_batch (import/reimport) calls the batch function
  instead of a per-finding loop
- _bulk_update_finding_status_and_severity (bulk edit) groups findings
  by (product, dedup_alg) and calls the batch function once per group;
  retroactive reactivation also batched the same way
- Fix dead-code bug in process_false_positive_history: the condition
  finding.false_p and not finding.false_p was always False because
  form.save(commit=False) mutates the finding in place; fixed by
  capturing old_false_p before the form save
- Replace all per-finding save()/save_no_options() in FP history paths
  with QuerySet.update() (bypasses signals identically to the old calls)
- Move all FP history helpers from dojo/utils.py to
  dojo/finding/deduplication.py alongside the matching dedupe helpers

All update() calls carry a comment explaining the signal-bypass
equivalence with the previous save(skip_validation=True) calls.

Adds 4 unit tests covering: batch single-query behaviour, retroactive
batch FP marking, retroactive reactivation (previously dead code), and
the no-reactivation guard.
Limit _fetch_fp_candidates_for_batch to only the fields actually read
from candidate objects (id, false_p, active, hash_code,
unique_id_from_tool, title, severity), avoiding loading unused columns.

Correct update() comments to clarify that .only() does not constrain
QuerySet.update() — Django generates UPDATE SQL independently — so the
sync requirement is only for fields *read* from candidate objects.
assertNumQueries(7) on both batch tests covers: System_Settings,
4 lazy-load chain (test/engagement/product/test_type from findings[0]),
candidates SELECT with .only(), and the bulk UPDATE — fixed regardless
of batch size or number of retroactively marked findings.
New test creates 5 pre-existing findings and asserts the batch still
uses exactly 7 queries regardless — proving the old O(N) per-finding
save loop is gone and a single bulk UPDATE covers all affected rows.
@valentijnscholten valentijnscholten added this to the 2.57.0 milestone Mar 5, 2026
@valentijnscholten valentijnscholten added the affects_pro PRs that affect Pro and need a coordinated release/merge moment. label Mar 5, 2026
@valentijnscholten valentijnscholten marked this pull request as ready for review March 6, 2026 06:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

affects_pro PRs that affect Pro and need a coordinated release/merge moment. unittests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant