feat: [ENG-2485] defer summary cascade to dream#579
feat: [ENG-2485] defer summary cascade to dream#579RyanNg1403 wants to merge 3 commits intoproj/curation-enhancementfrom
Conversation
Move the LLM-bound parent-summary cascade off curate's hot path. Curate
now enqueues changed paths to a stale-summary queue persisted in
.brv/dream-state.json; the next dream cycle drains the queue, merges
with its own snapshot diff (A ∪ B), and runs propagateStaleness once
per dirty directory.
- DreamState schema gains staleSummaryPaths[]; new service methods
enqueueStaleSummaryPaths (RMW with path-level dedup, oldest enqueuedAt
preserved) and drainStaleSummaryPaths (atomic snapshot-and-clear).
- curate-executor swaps the inline propagateStaleness call for an
enqueueStaleSummaryPaths call. Manifest rebuild stays inline (pure
filesystem, no LLM).
- dream-executor's step 5 drains the queue, unions with the diff-set,
and runs propagateStaleness once per unique path. On propagation
failure the catch re-enqueues the drained snapshot so atomic drain
doesn't lose work.
- runStaleSummaryPropagation extracted as a protected seam; preserves
the ENG-2100 parentTaskId threading so summary regenerations still
share one billing session with the parent dream task.
Validation:
- validate.mjs: 15/15 BM25 invariants hold under stale _index.md
- Real-LLM E2E: 8 curates / 2 dreams on byterover paid-tier
- dedup, multi-curate accumulation, drain, re-enqueue on failure
- Full unit suite: 6981 passing
|
Claude finished @RyanNg1403's task in 2m 33s —— View job PR Review: feat: [ENG-2485] defer summary cascade to dream
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0cf9ae52b6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- Bypass dream activity gate when the stale-summary queue has work (codex P1): low-activity projects with 1-2 curates would otherwise strand queued paths indefinitely. dream-trigger.ts now treats a populated queue as enough reason to drain. - Dedup within-batch input in enqueueStaleSummaryPaths: a caller passing a non-unique array no longer produces duplicate queue entries. - Split the curate-executor post-processing try block: snapshot diff, enqueue, and manifest rebuild now have independent fail-open scopes so a transient queue write error cannot skip the (no-LLM) manifest scan that keeps newly-curated leaves immediately discoverable. - Restore the runOperations JSDoc that the cherry-pick orphaned in front of runStaleSummaryPropagation. - Tighten StaleSummaryEntrySchema to z.string().min(1) so empty paths cannot persist into dream-state.json. Tests: - dream-trigger: bypass-when-queue-populated + still-block-when-empty - dream-state-service: within-batch duplicate dedup
|
Claude finished @RyanNg1403's task in 3m 59s —— View job PR Review: feat: [ENG-2485] defer summary cascade to dreamTodo
Overall Assessment: ⭐⭐⭐⭐⭐ 5/5The refactor commit ( What the Refactor Commit Fixed
Remaining Observations (all non-blocking)1. Misleading comment in Step 1 snapshot-failure catch (
|
|
Pushed 0e5e1b3 addressing the Claude review-agent feedback: Fixed:
Plus the codex P1 (dream-trigger Gate 2 bypass when queue has work), addressed in the inline thread. Test coverage gap noted in the review (no test for within-batch duplicates) is now closed by the new dedup test. |
| // this dream cycle and accumulates until the next successful run. | ||
| // Drain-skip is preferable to drain-and-lose because atomic drain | ||
| // already removed entries before any propagation could re-enqueue them. | ||
| } |
There was a problem hiding this comment.
nitpick: The phrase "atomic drain already removed entries before any propagation could re-enqueue them" reads as if drain has already run in this code path — but drain is intentionally skipped when preState is undefined. The intended meaning is closer to:
Leaving the queue intact is safer than draining-then-failing, because the atomic drain design clears entries immediately on drain; there would be no catch to re-enqueue them if we drained up here and then threw.
No behavior change needed, just a future-reader confusion risk.
| // already removed them, so without this they would be lost. | ||
| if (drainedSnapshot.length > 0) { | ||
| await this.deps.dreamStateService.enqueueStaleSummaryPaths(drainedSnapshot).catch(() => { | ||
| // If the re-enqueue itself fails, there is nothing more to do here. |
There was a problem hiding this comment.
nitpick: enqueueStaleSummaryPaths stamps each entry with Date.now(), so original enqueuedAt timestamps from the drained snapshot are silently reset on retry. No consumer reads this field today, but a future latency dashboard would see re-tried paths as "freshly enqueued." Worth a comment here so a future telemetry author isn't surprised by the reset semantics.
| // already removed them, so without this they would be lost. | ||
| if (drainedSnapshot.length > 0) { | ||
| await this.deps.dreamStateService.enqueueStaleSummaryPaths(drainedSnapshot).catch(() => { | ||
| // If the re-enqueue itself fails, there is nothing more to do here. |
There was a problem hiding this comment.
nitpick: enqueueStaleSummaryPaths stamps re-enqueued entries with Date.now(), so the original enqueuedAt timestamps from the drained snapshot are lost on retry. The telemetry property ("oldest waiting path") is reset as though the retry is a fresh enqueue.
No consumer reads enqueuedAt today, so this is purely cosmetic. But if you ever add latency dashboards, the first propagation failure silently resets the clock. Consider passing the original timestamps through a lower-level API — or just a comment that explains this trade-off so a future telemetry author isn't surprised.
| // If the re-enqueue itself fails, there is nothing more to do here. | |
| // Note: original enqueuedAt timestamps are not preserved here; re-enqueued | |
| // paths receive Date.now() so latency telemetry restarts from this point. | |
| await this.deps.dreamStateService.enqueueStaleSummaryPaths(drainedSnapshot).catch(() => { |
- Reword the Step 1 snapshot-failure catch in dream-executor: the old comment claimed "atomic drain already removed entries", but drain doesn't run in this code path. Clarify that skipping drain here is safer than draining-and-failing because the atomic-drain RMW clears entries synchronously. - Add a checkEligibility test for the stale-summary queue bypass so both tryStartDream and checkEligibility pin the invariant. They delegate to the same checkGates1to3 path; without this, a future refactor of the shared path could regress only one call site.
|
Claude finished @RyanNg1403's task in 1m 33s —— View job PR Review: feat: [ENG-2485] defer summary cascade to dreamTodo
Overall Assessment: ⭐⭐⭐⭐⭐ 5/5The third commit ( What the Third Commit Fixed
Review of the FixesFix 1: Step 1 snapshot-failure catch comment ( The rewrite is correct and meaningfully clearer. The old comment ("atomic drain already removed entries before any propagation could re-enqueue them") falsely implied drain had run. The new comment explains the actual invariant: This is the correct reasoning — a future reader landing on this catch can now understand why the code is written this way, not just what happens. Fix 2: The symmetry comment is a good call: // Symmetry with the tryStartDream bypass test — both methods delegate
// to checkGates1to3, so a future refactor of the shared path must keep
// this invariant on both call sites.Pinning the contract on both call sites means a refactor that breaks the gate bypass would fail in two distinct test contexts rather than one, making the regression harder to miss. Remaining Observation (non-blocking, no change needed)Re-enqueue resets The inline suggestion from the second pass (adding a comment about timestamp loss on retry) was not applied. This is a reasonable call — What's Done Well (cumulative across all three passes)
This PR is ready to merge. |

Summary
brv curaterunspropagateStalenessinline at the end of the hot path, making 1–3 LLM calls per curate to regenerate parent_index.mdsummaries up the directory tree..brv/dream-state.json. The nextbrv dreamcycle drains the queue, unions with its own snapshot diff (A ∪ B), and runspropagateStalenessonce per unique path. On propagation failure, the catch re-enqueues the drained snapshot so atomic-drain doesn't lose work._index.mdfiles still exist (frontmatter + body still BM25-indexed);propagateStalenesssemantics unchanged; manifest rebuild stays inline (pure filesystem, no LLM); dream's 12h cadence unchanged; in-curate writes to leaf topic files unchanged.Type of change
Scope (select all touched areas)
Linked issues
runStaleSummaryPropagationseam)Root cause (bug fixes only, otherwise write
N/A)Test plan
test/unit/infra/dream/dream-state-schema.test.ts(+54)test/unit/infra/dream/dream-state-service.test.ts(+159, new tests forenqueueStaleSummaryPaths,drainStaleSummaryPaths, dedup, concurrency)test/unit/infra/executor/dream-executor.test.ts(+126, drain + re-enqueue + A∪B merge)test/unit/infra/executor/curate-executor.test.ts(replaced obsolete inline-propagateStaleness ENG-2100 test with ENG-2485 deferral test)enqueuedAtpreservedPromise.allover 3 enqueues → 3 unique entriesFileContextTreeSnapshotService)User-visible changes
None directly. Side effect: parent
_index.mdsummaries may lag by up to 12h after a curate (until the next dream). BM25 search and leaf-file lookups are unaffected.Evidence
node validate.mjs(BM25 invariants under stale_index.md): 15/15 passed across 4 staleness scenariosenqueuedAtbrv dream --forcecleared 6-path queue, regenerated 5_index.mdfiles at all depthschmod 000 dream-state.json→ curate exit 0_manifest.jsonas a directory →buildManifestthrew EISDIR → drained snapshot re-enqueued with newenqueuedAt_index.mdLLM regenerations → 0