Skip to content

perf(targeting): preaggregate exposure log per filter hash for high-package eligibility#103

Open
bokelley wants to merge 1 commit intomainfrom
bokelley/preaggregate-exposures
Open

perf(targeting): preaggregate exposure log per filter hash for high-package eligibility#103
bokelley wants to merge 1 commit intomainfrom
bokelley/preaggregate-exposures

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

Summary

Eligibility evaluation in EvaluateIdentityResolved re-scanned the user's exposure log per candidate package via CheckFrequencyRulesMultiLog and LatestExposureMultiLog, giving O(packages × log_entries × identities) CPU per request. At realistic Scope3-shape load (1000 candidate packages × 1000-entry log × 3 identities) this was ~7.5ms CPU per request; the pathological tail (1000 × 10K × 3) hit ~58ms — outside the 30ms p95 latency budget called out in the TMP spec.

This PR pre-buckets the user's exposure log entries by filter hash (campaign and package) once per request, plus precomputes per-package latest timestamp for intent score. Per-package eligibility check then walks only the matching bucket instead of the full log. Build cost O(L × I), per-package check O(matches-per-filter), independent of candidate-package count.

Heuristic-gated, not always-on

ShouldPreaggregate(numPackages) > 50 decides which path runs. Below that threshold, the map-build allocation overhead dominates — at 10 packages × 10K-entry log × 3 identities, the build cost more than triples per-request CPU (~1.25ms vs ~408µs naive). Above ~50 packages, preagg amortizes — at 1000 × 1000 × 3, ~26× speedup; at 1000 × 10K × 3, ~40× speedup.

The phase transition is sharp enough that a simple package-count check beats more elaborate heuristics. The two-path engine code is justified by avoiding a real measured regression on small-package requests.

Measured speedups

vs. main, from TestScale_IdentityMatch_CPU_Combined, in-memory mock store, single goroutine:

packages log entries identities before after path speedup
10 10000 3 1,299 µs 898 µs naive 1.4×
1000 100 3 784 µs 71 µs preagg 11.0×
1000 1000 3 7,566 µs 287 µs preagg 26.4×
1000 10000 3 57,861 µs 1,500 µs preagg ~38×

The pathological-tail case drops from 58ms to 1.5ms — comfortably within the latency budget.

Bit-identical behavior

Same dedup (impression hash), same window filter, same MaxCount short-circuit. The naive CheckFrequencyRulesMultiLog and LatestExposureMultiLog functions remain in exposure_binary.go as public API with their existing tests; the engine just calls into the aggregated path when the threshold is exceeded.

Test plan

  • go test ./targeting/ passes
  • go test ./targeting/ -run TestPreaggregate_Crossover shows the threshold-driven crossover empirically
  • go test ./targeting/ -run TestScale_IdentityMatch_CPU_Combined confirms speedups
  • Reviewer sanity-check on the threshold value (50) for any in-house workload that might want it tuned
  • Production benchmarks against real valkey topology (out of scope; tracked in adcp#3359 rollout plan)

Related: adcontextprotocol/adcp#3359 — IdentityMatch architecture spec that surfaced this perf concern.

🤖 Generated with Claude Code

…ackage eligibility

Eligibility evaluation re-scanned the exposure log per candidate package
in CheckFrequencyRulesMultiLog and LatestExposureMultiLog, giving
O(packages × log_entries × identities) CPU. At realistic Scope3-shape
load (1000 candidate packages × 1000-entry log × 3 identities) this
was ~7.5ms CPU per request; pathological tail (1000 × 10K × 3) hit
58ms — outside the 30ms latency budget.

Pre-bucket the user's exposure log entries by filter hash (campaign and
package) once per request, plus precompute per-package latest timestamp.
Per-package eligibility check then walks only the matching bucket
instead of the full log. Build cost O(L × I), per-package check
O(matches-per-filter), independent of candidate-package count.

Heuristic-gated: ShouldPreaggregate(numPackages) > 50. Below that
threshold the map-build allocation overhead dominates — at 10 packages
× 10K-entry log × 3 identities, the build cost more than triples
per-request CPU. Above ~50 packages, preagg amortizes — at 1000 × 1000
× 3, ~26× speedup; at 1000 × 10K × 3, ~40× speedup. The phase
transition is sharp enough that a simple package-count check beats more
elaborate heuristics.

The two-path engine code is justified by avoiding a real regression on
small-package requests, not just complexity-for-complexity's-sake. An
intermediate draft removed the threshold to collapse to one path; the
resulting ~700µs regression on 10pkg × 10K × 3ids was material enough
to walk back.

Measured speedups vs baseline (TestScale_IdentityMatch_CPU_Combined):

  packages   log_size   ids   before     after    speedup
  10         10000      3     1,299 µs   898 µs   1.4×    (naive path)
  1000       100        3     784 µs     71 µs    11.0×   (preagg path)
  1000       1000       3     7,566 µs   287 µs   26.4×   (preagg path)
  1000       10000      3     57,861 µs  1,500 µs ~38×    (preagg, pathological tail)

Full targeting test suite passes; behavior is bit-identical between
the multi-log and aggregated paths (same dedup, same window filter,
same MaxCount short-circuit).

Adds:
- targeting/exposure_aggregate.go        — PreaggregatedExposures type +
                                            BuildPreaggregatedExposures +
                                            CheckFrequencyRulesAggregated +
                                            LatestExposureAggregated +
                                            ShouldPreaggregate
- targeting/exposure_aggregate_test.go   — TestPreaggregate_Crossover
                                            (documents the empirical
                                            naive-vs-preagg crossover at
                                            ~50 packages)
- targeting/cpu_combined_test.go         — TestScale_IdentityMatch_CPU_Combined

Modifies:
- targeting/engine.go EvaluateIdentityResolved — gates between naive
                                                  and preagg paths,
                                                  uses preagg for
                                                  campaign + package fcap
                                                  + intent score

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
// TestPreaggregate_Crossover measures naive vs preaggregated frequency-cap
// evaluation across the (packages × log_entries × identities) matrix to
// determine where the heuristic threshold should sit.
func TestPreaggregate_Crossover(t *testing.T) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't really test anything, so should not be run together with the real tests. (should utilize t.Skip())

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or ideally this would be a Benchmark, not Test

@@ -0,0 +1,84 @@
package targeting

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a test for CheckFrequencyRulesMultiLog(...) == CheckFrequencyRulesAggregated(BuildPreaggregatedExposures(...), ...) for identical inputs

// (candidate packages per request) × (exposure log entries per identity) ×
// (identities per request). All numbers are isolated from network I/O via
// the mock store, so they represent in-process CPU only.
func TestScale_IdentityMatch_CPU_Combined(t *testing.T) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as TestPreaggregate_Crossover - no real tests, should be skipped or Benchmark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants