Skip to content

Proj/curate dag cutover#578

Open
ngduyanhece wants to merge 5 commits intomainfrom
proj/curate-dag-cutover
Open

Proj/curate dag cutover#578
ngduyanhece wants to merge 5 commits intomainfrom
proj/curate-dag-cutover

Conversation

@ngduyanhece
Copy link
Copy Markdown
Contributor

Summary

  • Problem: The brv curate loop was a monolithic 50-iteration generator with no isolation, no parallelism, and no extensibility surface. Separately, the brv-query MCP tool tightly coupled deterministic tier 0/1/2 lookup with LLM synthesis, forcing every agent through ByteRover's model even when the agent had its own.
  • Why it matters: Foundation for the agent-driven-graph roadmap. Without (a) a typed-slot DAG with per-slot sandboxing and (b) an LLM-free MCP query surface, downstream phases (agent code contribution, learning loop, harness) cannot ship safely. Also unlocks external agents driving the search→synthesize→cache loop with their own model — significant cost/UX win.
  • What changed:
    • (5d54c4f7) — Replaced the curate-executor monolith with a typed-slot DAG (src/agent/core/curation/flow/): 7 default nodes, TopologicalCurationRunner, slot-contract Zod schemas, snapshot parity fixtures.
    • (101afce8) — Per-slot vm sandboxing with tool allowlists, parallel ExtractNode fan-out (concurrency 4), per-slot timeout escalation, and review-cycle hardening (R-1..R-6 + NEW-1).
    • (ed5ee2e8) — Split brv-query into three LLM-free MCP tools: brv-search (tier 0/1/2 cache+BM25), brv-gather (prefetched bundle), brv-record-answer (close the cache loop). Added matching brv gather / brv record-answer oclif CLI commands. brv-query kept with [deprecated] marker + mcp.query.legacy_invocations telemetry counter.
  • What did NOT change (scope boundary):
    • brv-query MCP tool is not removed — deprecation marker only.
    • Provider auth / token-refresh UX (see CF-1) — pre-existing behavior.
    • Cache backend / fingerprint algorithm — reuses existing QueryResultCache.
    • Curate --format json envelope shape — unchanged.

Type of change

  • Bug fix
  • New feature
  • Refactor (no behavior change)
  • Documentation
  • Test
  • Chore (build, dependencies, CI)

Scope (select all touched areas)

  • TUI / REPL
  • Agent / Tools
  • LLM Providers
  • Server / Daemon
  • Shared (constants, types, transport events)
  • CLI Commands (oclif)
  • Hub / Connectors
  • Cloud Sync
  • CI/CD / Infra

Linked issues

  • Closes #
  • Related #

Root cause (bug fixes only, otherwise write N/A)

  • Root cause: N/A (feature/refactor PR). One incidental fix included: brv search "" was exiting 0 on empty input (UAT-14 blocker B3); fixed by setting process.exitCode = 1 in the validation path of search.ts, gather.ts, and record-answer.ts.
  • Why this was not caught earlier: N/A

Test plan

  • Coverage added:
    • Unit test
    • Integration test
    • Manual verification only
  • Test file(s):
    • test/unit/server/infra/dispatcher/query-dispatcher.test.ts
    • test/unit/server/infra/executor/gather-executor.test.ts
    • test/unit/server/infra/executor/record-answer-executor.test.ts
    • test/unit/server/infra/mcp/tools/brv-search-tool.test.ts
    • test/unit/server/infra/mcp/tools/brv-gather-tool.test.ts
    • test/unit/server/infra/mcp/tools/brv-record-answer-tool.test.ts
    • test/unit/server/infra/mcp/tools/deprecation-telemetry.test.ts
    • test/integration/mcp/query-split-mcp-harness.test.ts — drives all 5 tools through InMemoryTransport
    • test/unit/agent/curate-flow/runner.test.ts, flag.test.ts, sandbox/*.test.ts (Phases 1–2)
  • Key scenario(s) covered:
    • Tier 0 (exact cache) → Tier 1 (Jaccard fuzzy) → Tier 2 (BM25) dispatch with discriminated-union results
    • End-to-end loop via MCP harness: brv-search (needs_synthesis) → brv-gather → simulate-synthesize → brv-record-answer → re-brv-search (returns tier:0, cached_answer)
    • brv-query deprecation: [deprecated] description prefix, _meta.deprecated:true, _meta.replacedBy:[…], telemetry JSONL written to <dataDir>/telemetry/mcp-deprecation.jsonl
    • Provider-less LLM-free invariant: LLM_FREE_TASK_TYPES allowlist permits gather / mcp-search / record-answer / search to bypass provider validation
    • Curate snapshot parity: Phase 1 DAG output ≥ 0.9 Jaccard against pre-cutover baseline fixtures
    • Sandboxed slot execution with allowlist enforcement and timeout escalation
    • UAT round 2 (2026-04-28): 17/20 PASS, 3 carry-forwards (CF-1/CF-2/CF-3) — none blocking

User-visible changes

  • New MCP tools (default-registered):
    • brv-search — returns {tier, status, passages?, cached_answer?, fingerprint, total_found, timing_ms} on the _meta channel
    • brv-gather — returns {prefetched_context, manifest_context?, total_tokens_estimated, search_metadata, follow_up_hints?}
    • brv-record-answer — returns {recorded, fingerprint}
  • New CLI commands: brv gather "<query>" and brv record-answer "<query>" "<answer>" --fingerprint <fp>. Both support --format text|json.
  • brv-query MCP tool — title now suffixed (deprecated); description prefixed [deprecated]; functional behavior unchanged.
  • brv search "" (and gather/record-answer) now exits non-zero on empty input (was exit 0).
  • SKILL.md template — new ## MCP Workflow section before ## Commands; CLI sections renumbered (4–5 added, 6–13 shifted).
  • Curate behavior: identical to pre-cutover within snapshot tolerance (no observable user-facing change, only internal architecture).

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording

Checklist

  • Tests added or updated and passing (npm test)
  • Lint passes (npm run lint)
  • Type check passes (npm run typecheck)
  • Build succeeds (npm run build)
  • Commits follow Conventional Commits format
  • Documentation updated (if applicable)
  • No breaking changes (or clearly documented above)
  • Branch is up to date with main

Risks and mitigations

  • Risk: Phase 1 curate DAG cutover changes the internal execution path for every brv curate call.
    • Mitigation: Snapshot parity fixtures captured pre-cutover; PR asserts new path produces facts within Jaccard ≥ 0.9 of the baseline. Behavior is observably identical.
  • Risk: Sandboxed slot execution could regress nodes that relied on ambient state.
    • Mitigation: Per-slot tool allowlists explicitly enumerated; default nodes ported and snapshot-tested; review passes R-1..R-6 + NEW-1 hardened the sandbox boundary.
  • Risk: Three new MCP task types (mcp-search, gather, record-answer) bypass provider validation via LLM_FREE_TASK_TYPES allowlist — if the allowlist drifts, an LLM-bound task could silently skip auth.
    • Mitigation: Allowlist is a Set constant in agent-process.ts with co-located unit tests; new task types must be deliberately added.
  • Risk: brv-query deprecation may surprise existing MCP clients reading the tool description.
    • Mitigation: Functional behavior is fully preserved — only metadata (description, _meta.deprecated, _meta.replacedBy) changed. Telemetry counter mcp.query.legacy_invocations lets us track adoption before any future removal.
  • Risk: CF-2 — Tier-0 vs Tier-2 latency gap not observable on the 6-leaf UAT corpus (IPC dominates at ~25ms).
    • Mitigation: Absolute latency targets all met (<50ms / <100ms / <500ms). Corpus-size retest scheduled for Phase 6 perf docket; tracked in CARRY-FORWARDS.md.

Appendix A — Architecture: Curation pipeline

A.1 Node DAG

flowchart LR
    INPUT([source text + existing memory]) --> RECON
    RECON[recon<br/>10s · tools.curation.recon] --> CHUNK
    CHUNK[chunk<br/>5s · pure] --> EXTRACT
    EXTRACT[extract<br/>60s · tools.curation.mapExtract<br/><i>fan-out, concurrency=4</i>] --> GROUP
    GROUP[group<br/>5s · pure] --> DEDUP
    DEDUP[dedup<br/>5s · pure] --> CONFLICT
    CONFLICT[conflict<br/>30s · tools.curation.conflict] --> WRITE
    WRITE[write<br/>30s · tools.curate] --> OUTPUT([context-tree mutations])

    classDef llm fill:#fff4e0,stroke:#d97706,color:#7c2d12
    classDef pure fill:#e0f2fe,stroke:#0369a1,color:#0c4a6e
    classDef io fill:#f3e8ff,stroke:#7c3aed,color:#4c1d95
    class RECON,EXTRACT,CONFLICT llm
    class CHUNK,GROUP,DEDUP pure
    class WRITE io
Loading

NODE_SLOT_ORDER is enforced as ['recon', 'chunk', 'extract', 'group', 'dedup', 'conflict', 'write'] in flow/types.ts.

A.2 Slot contracts

Every node executes inside a per-slot vm sandbox with a tool allowlist and timeout. If the allowlist is [] the node is pure — no tool access, no I/O.

Slot Timeout Tool allowlist Schema gate Concurrency
recon 10 s tools.curation.recon reconInputSchemareconOutputSchema 1
chunk 5 s (pure) chunkInputSchemachunkOutputSchema 1
extract 60 s tools.curation.mapExtract extractInputSchemaextractOutputSchema 4 (p-map fan-out)
group 5 s (pure) groupInputSchemagroupOutputSchema 1
dedup 5 s (pure) dedupInputSchemadedupOutputSchema 1
conflict 30 s tools.curation.conflict conflictInputSchemaconflictOutputSchema 1
write 30 s tools.curate writeInputSchemawriteOutputSchema 1

Source: flow/slots/contracts.ts.

A.3 TopologicalCurationRunner algorithm

Kahn's topological sort with pMap-bounded concurrency at each level. Per-node fail-open: if a node throws, the failure is recorded and downstream branches fed by its output are skipped, while parallel branches continue.

1. Compute in-degree for every node N in DAG
2. Q ← { nodes with in-degree 0 }
3. while Q non-empty:
     level   ← drain Q                          // all currently-runnable
     results ← pMap(level, runOne, { concurrency: 4 })
     for each (node, result) in results:
       if result is failure:
         mark node failed; mark all transitive successors as skipped
       else:
         store result in ctx.outputs[node.id]
         for each successor S of node:
           inDegree[S] -= 1
           if inDegree[S] == 0 and S not skipped: Q.push(S)
4. if any node has inDegree > 0: throw CycleDetectedError
5. return { outputs, failures, metrics }

runOne(node):
  contract  ← slotContracts[node.slot]
  input     ← assembleInput(node, ctx)         // 0/1/N predecessors
  validate input against contract.inputSchema  // soft-mode: records issues
  sandboxed ← buildSlotSandbox({
    allowlist: contract.toolAllowlist,
    timeoutMs: contract.timeoutMs,
    parentSignal: ctx.signal,
  })
  output    ← await sandboxed(node.execute, input, ctx)
  validate output against contract.outputSchema
  return output

Input plumbing convention (runner.ts):

  • 0 predecessors → input is ctx.initialInput
  • 1 predecessor → input is the predecessor's output verbatim
  • N predecessors → input is Record<predecessorNodeId, output> (forward-compat for Phase 8 positional insertions)

A.4 Sandbox boundary

sequenceDiagram
    participant R as Runner
    participant SB as SlotSandbox
    participant SG as SchemaGate
    participant N as NodeFn
    participant T as ToolsProxy

    R->>SG: validate input vs inputSchema
    SG-->>R: ok (or soft-record issue)
    R->>SB: run(NodeFn, input, ctx)
    SB->>SB: AbortController(timeoutMs) ⊕ parent signal
    SB->>T: build Proxy gated by toolAllowlist
    SB->>N: execute({signal, tools})
    N->>T: tools.curation.<x>(...)
    alt path in allowlist
        T-->>N: real implementation result
    else path NOT in allowlist
        T-->>N: throws SandboxToolDeniedError
    end
    N-->>SB: output
    SB->>SG: validate output vs outputSchema
    SG-->>R: output (+ any soft validation issues)
Loading

The tools parameter is a Proxy whose get trap rejects any dotted path not present in the slot's toolAllowlist. Source: flow/sandbox/slot-sandbox-builder.ts.


Appendix B — Architecture: Query pipeline

B.1 Tiered dispatch

flowchart TD
    Q(["brv-search query"]) --> FP["compute context-tree<br/>fingerprint mtime hash"]
    FP --> T0{"Tier 0<br/>exact cache hit?"}
    T0 -- yes --> R0[/"return cached_answer<br/>tier=0"/]
    T0 -- no --> T1{"Tier 1<br/>Jaccard ≥ 0.6?"}
    T1 -- yes --> R1[/"return cached_answer<br/>tier=1"/]
    T1 -- no --> T2["BM25 search<br/>limit=10"]
    T2 --> T2D{"Tier 2<br/>direct-answer<br/>thresholds met?"}
    T2D -- yes --> R2A[/"return passages<br/>status=direct_passages<br/>cache.set"/]
    T2D -- "no, has results" --> R2B[/"return passages<br/>status=needs_synthesis"/]
    T2D -- empty --> R2C[/"return empty passages<br/>status=no_results"/]

    R2B -. LLM-free path .-> G["brv-gather"]
    G --> EXT["external agent LLM"]
    EXT --> RA["brv-record-answer<br/>cache.set"]
    RA -. next equivalent query .-> T0

    R2B -. legacy path .-> T3["brv-query: Tier 3 LLM synthesis"]
    T3 --> T4["Tier 4 agentic loop"]

    classDef cache fill:#dcfce7,stroke:#16a34a,color:#14532d
    classDef bm25 fill:#dbeafe,stroke:#2563eb,color:#1e3a8a
    classDef llm fill:#fef3c7,stroke:#d97706,color:#78350f
    class T0,T1,R0,R1,RA cache
    class T2,T2D,R2A,R2B,R2C bm25
    class EXT,T3,T4 llm
Loading

B.2 Latency targets

Tier Target Round-2 measured
0 (exact cache) < 50 ms 26 ms ✅
1 (fuzzy cache) < 100 ms 25 ms ✅
2 (BM25) < 500 ms 25 ms ✅

The 5× tier-0/tier-2 gap sub-criterion is not observable on the 6-leaf test corpus (IPC dominates). Tracked as CF-2.

B.3 Algorithm constants

Constant Value Source
DEFAULT_LIMIT (BM25 result count) 10 query-dispatcher.ts
DIRECT_RESPONSE_MAX_DOCS 5 query-dispatcher.ts
DIRECT_RESPONSE_SCORE_FLOOR 0.7 query-dispatcher.ts
DIRECT_RESPONSE_SCORE_THRESHOLD (top result min) 0.85 direct-search-responder.ts
DIRECT_RESPONSE_HIGH_CONFIDENCE_THRESHOLD (auto-answer) 0.93 direct-search-responder.ts
DIRECT_RESPONSE_MIN_GAP (top vs runner-up) 0.08 direct-search-responder.ts
FUZZY_SIMILARITY_THRESHOLD (Jaccard) 0.6 query-similarity.ts
QueryResultCache TTL 60 000 ms query-result-cache.ts

B.4 QueryDispatcher.dispatch() algorithm

dispatch({query, fingerprint, scope?, limit = 10}):
  start ← now()

  # ── Tier 0: exact cache ────────────────────────────────
  hit ← cache.getExact(query, fingerprint)
  if hit: return { tier: 0, status: 'cached_answer', cached_answer: hit, fingerprint, timing_ms }

  # ── Tier 1: fuzzy cache (Jaccard) ──────────────────────
  fuzzy ← cache.getFuzzy(query, fingerprint)        # max similarity ≥ 0.6
  if fuzzy: return { tier: 1, status: 'cached_answer', cached_answer: fuzzy, fingerprint, timing_ms }

  # ── Tier 2: BM25 search ────────────────────────────────
  results ← searchService.search(query, { scope, limit })
  if results.length == 0:
    return { tier: 2, status: 'no_results', passages: [], fingerprint, total_found: 0, timing_ms }

  candidates ← results
                 .filter(r => r.score ≥ DIRECT_RESPONSE_SCORE_FLOOR)   # 0.7
                 .slice(0, DIRECT_RESPONSE_MAX_DOCS)                   # 5
  direct ← tryDirectSearchResponse(query, candidates)                  # 0.85 / 0.93 / 0.08 gap rules
  if direct:
    cache.set(query, fingerprint, direct)                              # close the loop on tier 2
    return { tier: 2, status: 'direct_passages', passages, fingerprint, total_found, timing_ms }

  return { tier: 2, status: 'needs_synthesis', passages, fingerprint, total_found, timing_ms }

B.5 Context-tree fingerprint

Cache entries are keyed by (query, fingerprint). The fingerprint is a deterministic hash of path:mtime pairs over every leaf in the context tree, sorted by path, plus the sources-file mtime. Any add/remove/edit changes the fingerprint → all prior cache entries become stale (no TTL race needed for correctness; TTL is a safety net for clock skew).

computeFingerprint(files):
  sorted ← files sorted by path
  return sha256(sorted.map(f => `${f.path}:${f.mtime}`).join('|'))

Source: query-result-cache.ts.

B.6 LLM-free MCP loop

sequenceDiagram
    participant Agent as External agent
    participant S as brv-search
    participant G as brv-gather
    participant L as Agent's own LLM
    participant R as brv-record-answer
    participant C as QueryResultCache

    Agent->>S: query
    S->>C: lookup (fingerprint, query)
    alt Tier 0/1 hit
        C-->>S: cached_answer
        S-->>Agent: { tier, cached_answer, fingerprint }
    else Tier 2 needs_synthesis
        S-->>Agent: { tier:2, status:needs_synthesis, passages, fingerprint }
        Agent->>G: query (+ scope?, token-budget?)
        G-->>Agent: { prefetched_context, total_tokens_estimated, follow_up_hints }
        Agent->>L: prompt = prefetched_context + query
        L-->>Agent: synthesized answer
        Agent->>R: { query, answer, fingerprint }
        R->>C: cache.set
        R-->>Agent: { recorded:true, fingerprint }
    end

    Note over Agent,C: Next equivalent query → tier 0 hit
Loading

The LLM_FREE_TASK_TYPES allowlist ({ 'gather', 'mcp-search', 'record-answer', 'search' }) in agent-process.ts bypasses provider validation for these task types — no API key required.

B.7 Surface boundary: CLI vs MCP

Surface Tool / command Tier coverage Returns
MCP brv-search 0 / 1 / 2 _meta channel with BrvSearchResult
MCP brv-gather 2 prep prefetched_context, manifest_context?, follow_up_hints?
MCP brv-record-answer writes cache {recorded, fingerprint}
MCP brv-query (deprecated) 0–4 (full) legacy synthesized answer
CLI brv search 2 only (BM25, no cache plumbing) ranked results
CLI brv gather 2 prep bundle (text or --format json)
CLI brv record-answer writes cache {recorded, fingerprint}
CLI brv query 0–4 (legacy synthesis path) answer (no tier field on --format json, see CF-3)

Source: brv-search-tool.ts, brv-gather-tool.ts, brv-record-answer-tool.ts, gather.ts, record-answer.ts.

ngduyanhece and others added 4 commits April 27, 2026 20:15
Replaces the 50-iteration agent loop in `brv curate` with a deterministic
7-slot DAG executed via Kahn's algorithm + bounded pMap concurrency.
Slot order: recon → chunk → extract → group → dedup → conflict → write.

Pure-JS slots (recon, chunk, group, dedup) wrap existing curation
helpers without adding LLM cost. Service-bound slots delegate via
NodeServices, which the live adapter wires to:
  - extract       → agent.generate per chunk (no agent loop)
  - detectConflicts → SearchKnowledgeService subject lookup, with
                     existing-memory loader returning matched paths so
                     conflict-node can emit `update` decisions
  - write          → executeCurate against the curate-tool

Per-curate internal LLM calls drop from ~50 (loop iterations) to ~5–12
(one per LLM-bound slot). Headline "external agent contributes node
code" capability lands in Phase 3; this commit ships the substrate.

Architecture boundary: abstractions (NodeSlot, runner, MetricsCollector,
slot contracts) live in `src/agent/core/curation/flow/`; concrete node
implementations + dag-builder + services-adapter + existing-memory-loader
live in `src/agent/infra/curation/flow/` (preserves the core→infra
one-way dependency rule).

Includes:
  - Pre-compaction parallel hoist with orphan-session guard
    (~5–10s latency win on large contexts).
  - Snapshot-parity test for DAG plumbing consistency across runs.
  - Live integration tests against real `executeCurate` writing to a
    tempdir (catches schema mismatches like path-format and UPDATE
    existingId handling that stub-only tests would miss).
  - UUID-variable-naming regression tests moved to LocalSandbox suite
    where they actually belong (post-cutover the executor no longer
    injects sandbox variables).

Verification: 135 tests passing across all curate-flow + adjacent +
sandbox + workspace + live-write suites; npm run typecheck clean;
npm run lint shows 0 errors (227 pre-existing warnings unchanged).

Plan + design + code-review notes: see plan/agent-driven-graph/.
The documented scope gap on old-vs-new behavioral parity is mitigated
by manual smoke testing during the dogfood window before Phase 2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-6, NEW-1)

Builds on the Phase 1 cutover (5d54c4f) with three layered phases bundled
into a single PR. Real-LLM UAT (Phase 4) passed 7/7 scenarios.

Phase 2 — slot sandboxing + parallel extract fan-out
- Per-slot sandbox in src/agent/core/curation/flow/sandbox/: tool allowlist
  via Proxy guard + cooperative timeout via AbortController + parent-signal
  propagation. Plain JS wrappers, NOT node:vm — same interface so Phase 3
  can swap when untrusted agent JS first runs in-slot.
- Schema gate (validateAndRun) with strict + soft modes. Soft is the runner
  default: input-fail strands downstream with one clear error; output-fail
  warns and forwards rawOutput so a too-strict schema doesn't strand the graph.
- Extract-node parallelized via pMap inside the node (not N parallel DAG
  instances). Stub bench: xlarge (22 chunks) 1130ms→311ms at c=4.
- BENCH=1 opt-in harness with committed baselines under test/benchmarks/.
- xlarge.txt fixture (~155KB) + delay() helper extracted to test/helpers.

Phase 2.5 — quality polish (R-1, R-2, R-3, R-4, R-6)
- R-1 hotfix: conflict-detector + conflict-resolver track facts/keywords/tags.
  executeUpdate's structural-loss machinery merges them back; UPDATE no
  longer silently overwrites existing facts.
- R-2 deterministic metadata derivation in metadata-derivers.ts (deriveTags,
  deriveKeywords, deriveRelated). Wired into services-adapter content fields.
  Uses toSnakeCase for slug parity with executeCurate's writer.
- R-3 provenance envelope in Reason: cur-<logId> + ISO timestamp + source
  channel + subject + category + statement preview. Six-file plumbing chain
  through TaskExecuteSchema → task-router → agent-process → curate-executor
  → buildLiveServices → buildReason. Envelope ALWAYS emitted; UPDATE's
  d.reason appended as `Decision: ...` INSIDE the envelope.
- R-4 subject-as-title + ADD→UPSERT switch: filenames are clean <subject>.md
  slugs. UPSERT routes in-batch same-subject collisions through executeUpdate's
  safe merge path instead of executeAdd's blind overwrite — closes a latent
  data-loss bug both source plans missed.
- R-6 effectiveSummary in executeUpdate: when structural-loss merge happened
  and >1 fact landed in the resolved set, regenerate frontmatter `summary`
  as `;`-joined statements (200-char cap) instead of leaving it stale.

Phase 2.6 — Phase 4 UAT follow-ups
- R-5 ROLLBACK extractConcurrency 8→4 per Phase 2.5 §3.5 go/no-go gate:
  Phase 4 UAT showed c=8 regressed Scenario 3 from 150s→169s (rate-limit
  retries on gpt-5.4-mini). Real perf work moves to a Phase 6 spike with
  profiler-driven measurements.
- NEW-1 fix: deriveRelated emitted phantom paths when multiple decisions in
  one batch routed to the same target file (UPSERT collision OR cross-batch
  UPDATE merge via lookupSubject). New deriveRelatedFromResolved operates
  over RESOLVED targets and filters same-target decisions. services-adapter.write
  refactored to two-pass (resolve → build) so cross-links never reference
  non-materialized files. Phase 4 UAT data: 3508 cross-links, 0 dangling.
- NEW-2 doc: UPSERT consolidation (Phase 4+ behavior, sibling-split removed)
  recorded as accepted design in services-adapter inline comment + plan §8.

Tests: 7072 passing (up ~50 from regression coverage across the three phases).
Lint clean, typecheck clean.

Open carry-forward: R-5 perf — 182s vs ≤120s spec on Scenario 3. Functional,
no timeouts. Phase 6 ticket: profile and budget chunked-curate perf to ≤120s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…answer)

Splits the legacy brv-query MCP tool into three LLM-free tools so external
agents can drive the search → synthesize → cache loop with their own model
instead of the daemon's internal LLM. Curate-flow path is untouched.

Tasks 5.1-5.6 + acceptance #8 round-trip integration test all landed. UAT
round 2 passed 17/20 (2 partials and 1 skip all out-of-scope per script §5);
see plan/agent-driven-graph/PHASE-5-CODE-REVIEW.md for the full pass-by-pass
lineage (Codex × 4, Kimi × 1, UAT × 2).

Task 5.1 — QueryDispatcher (tier 0/1/2 reusable core)
- New src/server/infra/dispatcher/query-dispatcher.ts: stateless dispatcher
  with discriminated-union DispatchResult. Tier 0 (exact cache) → tier 1
  (Jaccard fuzzy) → tier 2 (BM25 + direct-response).
- Public BrvSearchResult DTO + toBrvSearchResult() mapper: snake_case
  per DESIGN §6.1 (cached_answer / total_found / timing_ms); internal
  searchResult and directAnswer dropped from MCP wire (Codex F4 fix).
- DEFAULT_LIMIT = 10 to match DESIGN §6.1; DIRECT_RESPONSE_MAX_DOCS = 5
  for the internal direct-response slice (Pass 6 decoupling).
- Refactored query-executor.ts to delegate tiers 0/1/2 to dispatcher;
  tier 3/4 LLM path unchanged. Removed dead tryDirectSearchResponse +
  unused direct-response imports (W1 fix).
- Extracted buildPrefetchedContext() to prefetch-context-builder.ts so
  GatherExecutor reuses the helper without duplicating logic.

Task 5.2 — brv-search MCP tool
- src/server/infra/mcp/tools/brv-search-tool.ts: registers brv-search,
  sends task type 'mcp-search' (not 'search' — that's CLI BM25 only).
- agent-process.ts case 'mcp-search': computes fingerprint via new
  QueryExecutor.resolveFingerprint() so tier 0/1 cache hits work
  (F2 fix), maps via toBrvSearchResult before serialization (F4),
  attaches typed payload to MCP _meta channel for tool-aware clients
  (W2 fix).
- LLM_FREE_TASK_TYPES allowlist exempts mcp-search/gather/record-answer
  from provider validation (F3 fix).

Task 5.3 — brv-gather MCP + CLI
- New gather-executor.ts: pure data assembly (BM25 + prefetch bundle +
  token estimate via shared estimateTokens + rule-based follow_up_hints).
  Never invokes the LLM; constructor accepts only searchService — pinned
  by structural test.
- New brv-gather-tool.ts (MCP) + brv gather CLI command with
  --limit/--scope/--token-budget/--format flags.
- shared transport encoder gather-content.ts.

Task 5.4 — brv-record-answer MCP + CLI
- New record-answer-executor.ts: cache.set graceful (returns
  recorded:false when no cache or set throws — never errors).
- New brv-record-answer-tool.ts (MCP) + brv record-answer CLI command
  (positional query + answer args, required --fingerprint flag).
- QueryDispatcher.cache made public so agent-process shares one
  instance between dispatcher reads and record-answer writes.

Task 5.5 — brv-query deprecation marker + telemetry
- brv-query-tool.ts: [deprecated] description prefix, (deprecated)
  title suffix, _meta.deprecated:true with replacedBy hints (MCP SDK
  doesn't allow custom annotation fields).
- New deprecation-telemetry.ts: appends one JSONL line per legacy
  invocation to <getGlobalDataDir>/telemetry/mcp-deprecation.jsonl.
  Honors BRV_DATA_DIR. Best-effort (swallows I/O errors).
- CLI 'brv query' is unaffected — only MCP brv-query is deprecated.

Task 5.6 — SKILL.md + mcp-workflow.md updates
- SKILL.md: new ## MCP Workflow section, 3rd ## Workflow bullet for
  the LLM-free synthesis loop, two new CLI sections (### 4. Gather
  Context Bundle, ### 5. Record Synthesized Answer), downstream
  sections renumbered to 6-13.
- All references use HYPHENATED MCP names (brv-search, brv-gather,
  brv-record-answer) matching actual registrations — Pass 7 caught
  the prior mismatch where SKILL.md taught underscored names that
  didn't exist (DESIGN.md keeps underscored prose for readability).
- sections/mcp-workflow.md (loaded by RuleTemplateService.generateMcpContent
  for connector-installed agents): rewrote from legacy two-tool flow
  to full Phase 5 pipeline with deprecation callout + worked example.

UAT B3 fix — CLI exit codes
- src/oclif/commands/{search,gather,record-answer}.ts: validateInput
  now sets process.exitCode = 1 before returning false. CI scripts
  (e.g., 'brv search "" || echo failed') can now detect failure.

Round-trip acceptance (#8)
- test/integration/mcp/query-split-roundtrip.test.ts: executor-level
  cache-loop test (4 cases) — proves shared-cache wiring works
  search → gather → record → cached-search.
- test/integration/mcp/query-split-mcp-harness.test.ts: real MCP
  protocol test using @modelcontextprotocol/sdk InMemoryTransport
  + Client + McpServer — drives client.callTool('brv-search')
  through the wire. Catches what executor-level can't: tool name
  registration mismatches, schema validation, _meta correctness,
  encoding/decoding wire issues.

Tests: 229 passing across the Phase 5 surface (dispatcher, executors,
MCP tools, CLI commands, templates, integration).
Lint clean. Typecheck clean.

Out-of-scope carry-forwards documented in plan/agent-driven-graph/CARRY-FORWARDS.md:
- ByteRover provider 401 token-expiry UX (separate provider-auth ticket)
- Tier 0 vs 2 latency gap retest on 500+ leaf corpus (Phase 6 perf spike)
- brv query CLI --format json missing tier field (optional polish)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 29, 2026

Claude finished @ngduyanhece's task in 4m 42s —— View job


PR Review: Curate DAG Cutover + Query Split

Tasks

  • Gather context and understand request
  • Review curate DAG runner and sandbox implementation
  • Review query dispatcher and MCP tools (brv-search, brv-gather, brv-record-answer)
  • Review new CLI commands (gather, record-answer)
  • Review agent-process LLM_FREE_TASK_TYPES and deprecation telemetry
  • Review test coverage
  • Post comprehensive review with inline comments and score

Overall Assessment — Score: 4 / 5

This is a well-engineered, thoroughly-documented architectural PR. The Kahn-based topological runner, slot contracts, sandbox proxy, and tiered query dispatcher are all correct. The phase-gate pattern (Phase 2 plain-JS wrapper → Phase 3 node:vm) is clearly reasoned and the known limits are documented in-code. Test coverage across unit, integration, and benchmark layers is strong.

The issues below are limited in scope. None are blockers. Two are worth fixing before merge (correctness + project standards).


Issues

Blocking / Must Fix

None.

Important — Fix Before Merge

1. as cast on unvalidated JSON parsesrc/server/infra/mcp/tools/brv-search-tool.ts:120
JSON.parse(result) as BrvSearchResult violates the project TypeScript rule against as assertions (CLAUDE.md). A shape-divergent daemon response is silently accepted and _meta downstream consumers receive wrong types. Inline comment posted with a structural-check suggestion. Fix this →

2. Empty bundle from applyTokenBudgetsrc/server/infra/executor/gather-executor.ts:107
When every section individually exceeds tokenBudget, kept is empty and the function returns ''. An agent with a tight budget receives prefetched_context: ''—indistinguishable from a zero-results search. Should guarantee at least the first section. Inline comment posted with fix. Fix this →

3. Silent onError in text modesrc/oclif/commands/gather.ts:233 and src/oclif/commands/record-answer.ts:233
The onError callback only emits output in json format. In text mode a daemon-side task error produces no visible output. Add an else { this.log(...) } branch for text mode. Inline comments posted. Fix this →


Minor / Nit

4. parseExistingIdForUpdate depth ceilingsrc/agent/infra/curation/flow/services-adapter.ts:94
Paths deeper than 4 segments silently fall back to ADD. On a context tree that has grown past that depth, re-curating the same subject always adds a new leaf instead of updating. The limit is documented but a fallback telemetry signal or simply relaxing the upper bound (>= 3 segments, title = last part, path = everything else) would be safer. Inline comment posted.

5. Missing _meta on gather / record-answer MCP tools
brv-search returns both text content and a typed _meta struct. brv-gather and brv-record-answer return text only. Tool-aware agents that parse search results structurally must fall back to JSON.parse for the other two. Not wrong, but the asymmetry adds friction for agent developers. Inline suggestion posted on brv-gather-tool.ts:103.

6. NodeContext.extractConcurrency leaks node-specific tuningrunner.ts:98
A single slot's concurrency knob lives on the shared runner context interface. As Phase 3+ adds more slot-specific options this will grow into a per-slot grab-bag. A nodeOptions?: Partial<Record<NodeSlot, unknown>> bag keeps slot concerns slot-scoped. Inline comment posted.

7. static strict = false undocumentedgather.ts:72, record-answer.ts:65
Unknown flags are silently swallowed (e.g., --limt typo won't error). If this is intentional forward-compat, a comment would prevent it from being "cleaned up" in a future pass.

8. Duplicate cache guard in QueryDispatcherquery-dispatcher.ts:188–214
this.cache && fingerprint is evaluated twice (tier 0 then tier 1). A single merged block reduces duplication. Inline nit posted.


What's well done

  • Kahn's runner (runner.ts): correct fail-open propagation, cycle detection, and pMap bounded-concurrency at level boundaries. The input-plumbing 0/1/N convention is clean and forward-compatible.
  • Sandbox proxy (slot-sandbox-builder.ts): recursive Proxy correctly guards nested dotted paths; the parentSignal chain correctly links external abort to per-slot AbortController without an unhandled rejection leak.
  • Soft-mode schema gate: failing on input (strand downstream) vs. forwarding raw output on output-schema drift (F5 finding) is the right tradeoff for Phase 2.
  • LLM_FREE_TASK_TYPES allowlist: the Set<string> constant with co-located unit tests is a clean enforcement pattern. The bypass is opt-in and explicit.
  • Deprecation telemetry: best-effort, never throws, JSONL append is append-only safe. recordLegacyQueryInvocation() fires before routing so failed calls still count.
  • QueryDispatcher design: stateless dispatcher shared between the MCP brv-search and legacy brv-query paths eliminates code forking. toBrvSearchResult correctly drops the internal searchResult at the MCP boundary (F4 finding).
  • Test coverage: runner, sandbox, schema gate, all new MCP tools, full MCP harness integration test, and UAT snapshot parity fixtures — coverage is thorough for a PR of this scale.

// so legacy callers still receive the raw daemon response.
let parsedMeta: BrvSearchResult | undefined
try {
parsedMeta = JSON.parse(result) as BrvSearchResult
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (correctness/standards): JSON.parse(result) as BrvSearchResult violates the project's TypeScript rule against as assertions (CLAUDE.md). If the daemon returns a non-JSON string or a shape that diverges from BrvSearchResult, the cast silently accepts invalid data and the _meta downstream consumers will see wrong types at runtime.

Use the existing BrvSearchResult Zod shape (or a lean inline parse) instead:

Suggested change
parsedMeta = JSON.parse(result) as BrvSearchResult
let parsedMeta: BrvSearchResult | undefined
try {
const raw: unknown = JSON.parse(result)
// Light structural check — at minimum tier + status must be present
if (
raw !== null &&
typeof raw === 'object' &&
'tier' in raw &&
'status' in raw
) {
parsedMeta = raw as BrvSearchResult
}
} catch {
// raw text fallback below
}

Alternatively, export a Zod schema for BrvSearchResult alongside toBrvSearchResult and use schema.safeParse(JSON.parse(result)).

* If the bundle exceeds the token budget, truncate to the last full section
* (sections are joined by `\n\n---\n\n` per `buildPrefetchedContext`).
*/
function applyTokenBudget(bundle: string, tokenBudget: number): string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (correctness): When no section individually fits under tokenBudget, kept stays empty and applyTokenBudget returns ''. An agent that calls brv-gather with a very tight token_budget (e.g., 150 tokens) but a large first section silently receives prefetched_context: ''—indistinguishable from a zero-results search. The invariant should guarantee at least the first section to avoid the "phantom empty" case.

Suggested change
function applyTokenBudget(bundle: string, tokenBudget: number): string {
function applyTokenBudget(bundle: string, tokenBudget: number): string {
if (!bundle) return bundle
const tokens = estimateTokens(bundle)
if (tokens <= tokenBudget) return bundle
const sections = bundle.split('\n\n---\n\n')
const kept: string[] = []
let accumulated = 0
for (const section of sections) {
const sectionTokens = estimateTokens(section)
if (accumulated + sectionTokens > tokenBudget) break
kept.push(section)
accumulated += sectionTokens
}
// Always include at least the first section so callers receive some context
// even when token_budget is smaller than that section.
if (kept.length === 0 && sections.length > 0) {
return sections[0]
}
return kept.join('\n\n---\n\n')
}

Comment thread src/oclif/commands/gather.ts Outdated
}
}
},
onError({error}) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (UX): onError only emits output in json format; in text mode the callback body is empty, so a task-level error (e.g., daemon emits task:error after the task is routed) produces no visible output to the user. The outer withDaemonRetry catch handles transport failures, but in-stream errors go silent in text mode.

Suggested change
onError({error}) {
onError({error}) {
if (format === 'json') {
writeJsonResponse({
command: 'gather',
data: {event: 'error', message: error.message, status: 'error'},
success: false,
})
} else {
this.log(`\nError: ${error.message}\n`)
}
},

// PHASE-5-UAT.md UAT-14: empty input must exit non-zero so CI scripts
// can detect failure.
process.exitCode = 1
return false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (same pattern as gather.ts): onError is also silent in text mode here. Same fix applies — add an else { this.log(...) } branch so users see the error message in both formats.

Comment on lines +188 to +214
if (this.cache && fingerprint) {
const cached = this.cache.get(query, fingerprint)
if (cached) {
return {
cachedAnswer: cached,
fingerprint,
status: 'cached_answer',
tier: 0,
timingMs: Date.now() - startTime,
totalFound: 0,
}
}
}

if (this.cache && fingerprint) {
const fuzzyHit = this.cache.findSimilar(query, fingerprint)
if (fuzzyHit) {
return {
cachedAnswer: fuzzyHit,
fingerprint,
status: 'cached_answer',
tier: 1,
timingMs: Date.now() - startTime,
totalFound: 0,
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (readability): The this.cache && fingerprint guard is evaluated twice — once for tier 0 and once for tier 1. These are the only cache paths, so a single early-exit after both checks would make the invariant clearer and eliminate the duplication:

if (this.cache && fingerprint) {
  const exact = this.cache.get(query, fingerprint)
  if (exact) return { cachedAnswer: exact, fingerprint, status: 'cached_answer', tier: 0, timingMs: Date.now() - startTime, totalFound: 0 }

  const fuzzy = this.cache.findSimilar(query, fingerprint)
  if (fuzzy) return { cachedAnswer: fuzzy, fingerprint, status: 'cached_answer', tier: 1, timingMs: Date.now() - startTime, totalFound: 0 }
}

Not a correctness issue — current logic is equivalent.


const result = await resultPromise

return {content: [{text: result, type: 'text' as const}]}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: brv-search returns both content (text) and _meta (typed struct) for tool-aware MCP clients. brv-gather only returns content. Tool-aware agents that parse brv-search results structurally would need to JSON.parse the gather text to access total_tokens_estimated or follow_up_hints. Consider adding the same dual-format pattern for consistency:

let parsedMeta: unknown
try { parsedMeta = JSON.parse(result) } catch { /* fallback */ }
return {
  _meta: parsedMeta as Record<string, unknown> | undefined,
  content: [{text: result, type: 'text' as const}],
}

Not required for correctness; the text payload is still usable as-is.

* Concurrency for ExtractNode's per-chunk fan-out (Phase 2 Task 2.4).
* Defaults to 4 inside `extract-node.ts`. Tunable via curate-executor.
*/
readonly extractConcurrency?: number
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (coupling): extractConcurrency is an ExtractNode-specific tuning knob sitting on the shared NodeContext. As more node-specific options arrive in later phases, NodeContext will accumulate per-node fields. Consider a nodeOptions?: Partial<Record<NodeSlot, unknown>> bag here so slot-specific tuning stays slot-scoped (e.g., nodeOptions?.extract?.concurrency). Low-priority given the single current user, but worth flagging before Phase 3 adds more.

const cleaned = existingId.replaceAll(/^\/+|\/+$/g, '').replace(/\.md$/, '')
const parts = cleaned.split('/').filter(Boolean)
// Need at least domain/topic/title (3) and at most domain/topic/subtopic/title (4).
if (parts.length < 3 || parts.length > 4) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (silent data loss): Paths with depth > 4 (e.g., project/auth/oauth/providers/google.md) silently fall back to ADD instead of UPDATE. For a context tree that has ever grown beyond the 4-segment ceiling, this means repeat curations of the same subject always ADD new leaves instead of updating the existing one, causing duplicates.

The comment documents the limit, but it's worth either widening the range (to e.g., >= 3 segments, taking title = last, path = everything before last) or logging a fallback telemetry signal so the pattern can be detected in production.

min: 100,
}),
}
public static strict = false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: static strict = false silently ignores unknown flags (e.g., brv gather "q" --limt 5 won't error). The comment on the peer record-answer.ts gives no reason either. If this is intentional forward-compat, a one-line comment would help reviewers (and keep it from being removed as "dead config" in a future cleanup).

- brv-search-tool.ts: replace unsafe 'as' cast with structural check
- gather-executor.ts: guarantee at least first section under tight token budget
- gather.ts & record-answer.ts: fix silent onError in text mode
- query-dispatcher.ts: merge duplicate cache guard blocks
- Add comments explaining static strict = false forward-compat
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 29, 2026

Claude finished @ngduyanhece's task in 1m 25s —— View job


Re-review: PR #578 — Fix commit 0398d89 verified

Tasks

  • Gather context and understand request
  • Check latest commit for changes addressing previous review comments
  • Verify fix for issue Feat/init #1: as cast in brv-search-tool.ts
  • Verify fix for issue Feat/refactor structure #2: empty bundle in gather-executor.ts
  • Verify fix for issue add the app cli #3: silent onError in gather.ts / record-answer.ts
  • Check other minor issues
  • Post updated review

Fixes Verified — Score: 5 / 5

Commit 0398d89 ("fix: address PR #578 review comments") cleanly resolves all three "Fix Before Merge" issues and two of the minor nits. Each fix matches the suggestion from the earlier review exactly.


Issue-by-issue status

# Original severity Status
1 Important ✅ Fixed
2 Important ✅ Fixed
3 Important ✅ Fixed
4 Minor/nit Carry-forward (acceptable)
5 Minor/suggestion Carry-forward (acceptable)
6 Minor/nit Carry-forward (acceptable)
7 Nit ✅ Fixed
8 Nit ✅ Fixed

Fix details

Issue #1as cast in brv-search-tool.ts:120

The JSON.parse(result) as BrvSearchResult unsafe cast is replaced with the structural check suggested in the review:

const raw: unknown = JSON.parse(result)
if (raw !== null && typeof raw === 'object' && 'tier' in raw && 'status' in raw) {
  parsedMeta = raw as BrvSearchResult
}

The residual as BrvSearchResult after the guard is the standard narrowing pattern; the invariant is now verified before the cast. The double-cast _meta: parsedMeta as unknown as Record<string, unknown> on line 136 is an acceptable structural upcast to satisfy the MCP SDK's return type — BrvSearchResult is a plain object.

Issue #2 — Empty bundle from applyTokenBudget

The fallback guarantees at least one section:

if (kept.length === 0 && sections.length > 0) {
  return sections[0]
}

Correct placement (after the loop, before the join) and matches the review suggestion exactly.

Issue #3 — Silent onError in text mode

Both gather.ts and record-answer.ts now have an else branch. Note the additional fix that makes this work: onError({error}) was changed to onError: ({error}) => — the arrow function correctly closes over this so this.log(...) binds to the oclif command instance. The original method shorthand would have silently shadowed this at runtime.

Issue #7static strict = false undocumented

Both commands now have a comment explaining the forward-compat rationale:

// Allow unknown flags for forward-compatibility (e.g., new daemon flags
// passed through by wrapper scripts without requiring a CLI upgrade).
public static strict = false

Issue #8 — Duplicate cache guard in QueryDispatcher

The two if (this.cache && fingerprint) blocks are merged into one. Variable names are also improved (cached → exact, fuzzyHit → fuzzy).


Remaining carry-forwards (all non-blocking)


Verdict

All blocking and "fix before merge" items are resolved. The PR is ready to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant