Skip to content

Refactor search post-processing to default-processor traits#817

Open
narendatha wants to merge 17 commits intomainfrom
u/narendatha/post_process_refactor_3
Open

Refactor search post-processing to default-processor traits#817
narendatha wants to merge 17 commits intomainfrom
u/narendatha/post_process_refactor_3

Conversation

@narendatha
Copy link
Contributor

@narendatha narendatha commented Mar 9, 2026

Summary

This PR refactors search post-processing to an explicit processor model, adds determinant-diversity reranking support in benchmark paths, and extends disk strategy support with end-to-end benchmark wiring and inputs.

What

1) Search post-processing architecture refactor

  • Decoupled post-processing from SearchStrategy into explicit processor contracts.
  • Introduced processor traits/types including PostProcess, HasDefaultProcessor, and default delegation mechanisms.
  • Updated graph search flows (knn, multihop, range, and related paths) to use explicit post-process dispatch.

2) Search API updates

  • Added explicit processor entrypoint search_with(...).
  • Kept default search(...) behavior via default processor delegation for compatibility.

3) Determinant-diversity support (core + benchmark)

  • Added determinant-diversity post-process in graph search.
  • Integrated determinant-diversity into benchmark-core graph search paths.
  • Added benchmark input/config/result plumbing for determinant-diversity execution.

4) Disk strategy integration

  • Added disk provider post-process support for DeterminantDiversitySearchParams.
  • Added disk searcher API for determinant-diversity path.
  • Wired disk benchmark runtime to conditionally run determinant-diversity when parameters are present.
  • Extended disk benchmark input schema with:
    • determinant_diversity_eta
    • determinant_diversity_power
    • determinant_diversity_results_k
  • Updated disk benchmark build path metadata access to use accessor methods (ndims(), npoints()).
  • Added disk benchmark example inputs for build/load and baseline-vs-detdiv comparisons.

Credits

All credit goes to Mark for brainstorming and proposing best design that solves many tradeoffs. Thanks to his detailed design template that enabled copilot to do most heavy lifting.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors DiskANN graph search post-processing by removing the post-processor associated type from SearchStrategy and replacing it with a default-processor delegation model (HasDefaultProcessor) plus an explicit, call-site selectable bridge trait (PostProcess).

Changes:

  • Introduce PostProcess, HasDefaultProcessor, and DefaultPostProcess in diskann/src/graph/glue.rs, including a helper macro delegate_default_post_process!.
  • Update KNN/multihop/range/diverse search flows to use post_process_with (and add KnnWith<PP> for caller-supplied post-processing).
  • Migrate providers/decorators and async wrappers/benchmarks to the new HasDefaultProcessor bounds; update inplace delete to construct a delete-search processor explicitly.

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
diskann/src/graph/test/provider.rs Updates test strategy to implement HasDefaultProcessor and wires delete-search post-processor.
diskann/src/graph/search/range_search.rs Switches range search to post_process_with(&DefaultPostProcess, ...).
diskann/src/graph/search/multihop_search.rs Switches multihop search to post_process_with(&DefaultPostProcess, ...).
diskann/src/graph/search/mod.rs Re-exports KnnWith for explicit post-processing usage.
diskann/src/graph/search/knn_search.rs Adds KnnWith<PP> and centralizes shared KNN logic in search_core.
diskann/src/graph/search/diverse_search.rs Updates diverse search to PostProcess-based post-processing.
diskann/src/graph/index.rs Adapts flat search and delete-search flows to HasDefaultProcessor / PostProcess.
diskann/src/graph/glue.rs Introduces the new post-processing traits, macro, and DefaultPostProcess marker.
diskann-providers/src/model/graph/provider/layers/betafilter.rs Delegates default post-processor construction via HasDefaultProcessor + Pipeline.
diskann-providers/src/model/graph/provider/async_/inmem/test.rs Updates test strategies to HasDefaultProcessor.
diskann-providers/src/model/graph/provider/async_/inmem/spherical.rs Migrates in-mem spherical strategies to HasDefaultProcessor.
diskann-providers/src/model/graph/provider/async_/inmem/scalar.rs Migrates in-mem scalar strategies to HasDefaultProcessor.
diskann-providers/src/model/graph/provider/async_/inmem/product.rs Migrates product-quantized strategies and delete-search processor typing.
diskann-providers/src/model/graph/provider/async_/inmem/full_precision.rs Migrates full-precision strategies and delete-search processor typing.
diskann-providers/src/model/graph/provider/async_/debug_provider.rs Migrates debug provider strategies and delete-search processor typing.
diskann-providers/src/model/graph/provider/async_/caching/provider.rs Delegates default processor through caching layer and propagates delete-search processor.
diskann-providers/src/model/graph/provider/async_/bf_tree/provider.rs Migrates BF-tree strategies and delete-search processor typing.
diskann-providers/src/index/wrapped_async.rs Propagates HasDefaultProcessor bound through async wrapper APIs.
diskann-providers/src/index/diskann_async.rs Updates async index tests/constraints to require HasDefaultProcessor.
diskann-label-filter/src/inline_beta_search/inline_beta_filter.rs Delegates default processor via HasDefaultProcessor for inline beta strategy.
diskann-disk/src/search/provider/disk_provider.rs Implements HasDefaultProcessor for disk search strategy (e.g., RerankAndFilter).
diskann-benchmark/src/backend/index/benchmarks.rs Updates benchmark generic bounds to include HasDefaultProcessor.
diskann-benchmark-core/src/search/graph/range.rs Updates benchmark-core range search bounds to include HasDefaultProcessor.
diskann-benchmark-core/src/search/graph/multihop.rs Updates benchmark-core multihop bounds to include HasDefaultProcessor.
diskann-benchmark-core/src/search/graph/knn.rs Updates benchmark-core KNN bounds to include HasDefaultProcessor.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov-commenter
Copy link

codecov-commenter commented Mar 9, 2026

Codecov Report

❌ Patch coverage is 50.68376% with 577 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.60%. Comparing base (965d31a) to head (68e4bfd).

Files with missing lines Patch % Lines
diskann-disk/src/search/provider/disk_provider.rs 0.75% 132 Missing ⚠️
...ark-core/src/search/graph/determinant_diversity.rs 0.00% 102 Missing ⚠️
diskann-benchmark/src/backend/index/search/knn.rs 15.21% 78 Missing ⚠️
.../src/model/graph/provider/async_/debug_provider.rs 0.00% 50 Missing ⚠️
diskann-garnet/src/provider.rs 4.65% 41 Missing ⚠️
...odel/graph/provider/async_/inmem/full_precision.rs 59.13% 38 Missing ⚠️
diskann-benchmark/src/backend/index/result.rs 0.00% 31 Missing ⚠️
diskann-benchmark/src/backend/index/benchmarks.rs 50.90% 27 Missing ⚠️
diskann-benchmark/src/inputs/disk.rs 0.00% 22 Missing ⚠️
diskann-benchmark/src/inputs/async_.rs 27.58% 21 Missing ⚠️
... and 4 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #817      +/-   ##
==========================================
- Coverage   90.23%   88.60%   -1.64%     
==========================================
  Files         443      445       +2     
  Lines       83313    84286     +973     
==========================================
- Hits        75181    74678     -503     
- Misses       8132     9608    +1476     
Flag Coverage Δ
miri 88.60% <50.68%> (-1.64%) ⬇️
unittests 88.44% <50.68%> (-1.76%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-benchmark-core/src/search/graph/knn.rs 94.91% <ø> (ø)
diskann-benchmark-core/src/search/graph/mod.rs 100.00% <ø> (ø)
...iskann-benchmark-core/src/search/graph/multihop.rs 98.69% <ø> (ø)
diskann-benchmark-core/src/search/graph/range.rs 93.70% <ø> (ø)
diskann-providers/src/index/diskann_async.rs 96.37% <100.00%> (+<0.01%) ⬆️
diskann-providers/src/index/wrapped_async.rs 32.75% <100.00%> (ø)
...roviders/src/model/graph/provider/async_/common.rs 86.54% <ø> (ø)
...s/src/model/graph/provider/async_/inmem/product.rs 100.00% <100.00%> (ø)
...rs/src/model/graph/provider/async_/inmem/scalar.rs 95.95% <ø> (-0.07%) ⬇️
...src/model/graph/provider/async_/inmem/spherical.rs 90.56% <ø> (-0.14%) ⬇️
... and 23 more

... and 37 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@hildebrandmw hildebrandmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @narendatha - this is headed in the right direction, but I found some pretty large structural issues. Basically, you're helping to untangle a pretty large knot here (thank you so much!) and we should lean into the simplification that this approach can allow.

- Simplify Search trait: move processor/output buffer to method-level generics
- Remove Internal<FullPrecision> strategy split; use RemoveDeletedIdsAndCopy for delete ops
- Add DefaultSearchStrategy aggregate trait combining SearchStrategy + HasDefaultProcessor
- Update benchmark-core helpers to use aggregate trait (reduce recurring bounds)
- Wire range search output buffer through to caller (support dynamic output handling)
- Add no-op SearchOutputBuffer impl for () type to preserve compatibility
…roviders

This commit moves the determinant_diversity_post_process module from diskann
to diskann-providers, as it does not depend on diskann internals and logically
belongs with other post-processing logic in the providers layer.

Changes:
- Move determinant_diversity_post_process.rs from diskann/src/graph/search/
  to diskann-providers/src/model/graph/provider/async_/
- Update all imports across workspace to use diskann_providers location
- Add diskann-providers dependency to diskann-benchmark-core (required for
  DeterminantDiversitySearchParams access)
- Remove old module reference from diskann/src/graph/search/mod.rs
- Update diskann-benchmark, diskann-disk imports to use new location

Validated with:
- cargo clippy --workspace --all-targets -- -D warnings
- cargo fmt --all

This results in cleaner architectural separation where determinant-diversity
search parameters stay with the provider infrastructure that implements them.
…uce clones

- Add DeterminantDiversityError enum for parameter validation
- Convert DeterminantDiversitySearchParams::new() to return Result<Self, Error>
- Validate top_k > 0, eta >= 0.0 and finite, power > 0.0 and finite
- Optimize post_process_with_eta_f32: precompute projections to eliminate vector clones
- Optimize post_process_greedy_orthogonalization_f32: single r_star_copy before projection loop
- Expand test suite from 3 to 11 tests (7 validation + 4 algorithm tests)
- Update callsites in disk_index/search.rs and index/search/knn.rs for error handling
- Add early validation checks in main router function
- Extract shared run-loop logic into reusable helpers
- Route both knn and determinant-diversity through closure-based parameter builders
- Preserve determinant-diversity parameter validation/error propagation
- Reduce duplicated benchmark orchestration code
- Promote DelegateDefaultPostProcessor as the canonical trait in glue
- Remove compatibility alias layer for HasDefaultProcessor
- Rename all trait bounds/impls/usages across diskann, providers, disk, benchmark, and label-filter
- Keep delegate_default_post_process! macro usage aligned with trait naming
- Add runtime filter_start_points flag to RemoveDeletedIdsAndCopy and Rerank
- Route default search through runtime-configurable processors (no FilterStartPoints pipeline)
- Set inplace-delete search processors to filter_start_points=false
- Remove Internal<T> strategy indirection and update async providers accordingly
- Preserve inner search_post_processor for Cached<S> inplace-delete path
- Add CachedPostProcess wrapper to avoid PostProcess impl overlap
- Keep default post-processing delegation unchanged for normal search
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants