fix(firo): make Spark anon-set sync resumable and crash-safe#1298
fix(firo): make Spark anon-set sync resumable and crash-safe#1298reubenyap wants to merge 2 commits intocypherstack:stagingfrom
Conversation
Adds the storage substrate that the resumable-sync fix will use in the
next commit. This commit on its own is purely additive — existing
reader queries and the existing writer are untouched, so behavior for
anyone running only this commit is unchanged.
Schema additions (both in the fresh-DB CREATE TABLE and in the
idempotent `_migrateSparkSetCacheDb` migration that runs on every
open):
1. `SparkSet.complete INTEGER NOT NULL`: will gate reader visibility
once the next commit wires it up. 0 = in-progress, 1 = finalized.
Pre-migration rows default to 1 (legacy writer was all-or-nothing,
so any existing row represents a finalized sync); fresh-DB new
rows default to 0 and the writer will explicitly set complete=0
until finalize.
2. `SparkSetCoins.orderKey INTEGER NOT NULL DEFAULT 0`: will hold the
server-side delta index of each coin so the reader can reconstruct
server newest-first ordering end-to-end. Pre-migration rows
default to 0; the reader's `ssc.id ASC` tiebreaker (added next
commit) then sorts them in PK order, which is exactly the layout
the pre-fix writer produced — preserving ordering byte-for-byte.
3. `UNIQUE INDEX idx_sparksetcoins_set_coin ON SparkSetCoins(setId,
coinId)`: required for INSERT OR IGNORE on the link table during
resumable per-sector writes (added next commit). Before creating
the index, any pre-existing duplicate (setId, coinId) rows are
removed — keeping MIN(id) per group — so a legacy DB with
unexpected duplicates can still upgrade. The pre-fix writer
shouldn't have produced any (its INSERT was not OR IGNORE and
would have thrown on collision), but scrubbing once is cheaper
than failing to open the DB.
Migration uses explicit `PRAGMA table_info` / `sqlite_master` presence
checks rather than `try/catch` around the ALTER statements, so unrelated
SQLite errors don't get silently swallowed.
New reader helpers used by the resume logic in the next commit:
* `_getIncompleteSetForGroupId(groupId)`: returns the newest
complete=0 row for a group, if any.
* `_countSetCoins(setId)`: count of links attached to a SparkSet.
Verified via sqlite3 CLI simulation: a realistic pre-migration DB (two
finalized syncs, 5 link rows) is byte-for-byte round-tripped through
the migration with no change in reader output. Dedup correctly
collapses a constructed 5-row table with duplicates to 3 unique rows
keeping MIN(id).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Problem runFetchAndUpdateSparkAnonSetCacheForGroupId previously accumulated every sector of a group's anonymity set in an in-memory Dart list and only wrote to SQLite after the full delta had been fetched. A network drop, app kill, or power loss mid-download discarded the in-memory buffer and left no persisted progress, so the next sync attempt recomputed the same meta.size - prevSize delta and re-downloaded everything. For a large first-sync this burned the whole set repeatedly. Fix The previous commit added the storage substrate: `SparkSet.complete`, `SparkSetCoins.orderKey`, a UNIQUE index on SparkSetCoins(setId, coinId), and the helper reader queries. This commit wires all of them together. Writer (firo_cache_writer.dart, firo_cache_worker.dart): * Removes _updateSparkAnonSetCoinsWith. * Adds _insertSparkAnonSetCoinsIncremental — per-sector write. Each call is an atomic SQLite transaction: INSERT OR IGNORE SparkSet with complete=0, INSERT OR IGNORE each coin and SparkSetCoins link with orderKey = startIndex + i. Readers don't see the row until finalize flips complete to 1. * Adds _markSparkAnonSetComplete — gated on a strict integrity check (COUNT(SparkSetCoins WHERE setId=?) == expectedLinkedCount). If the count disagrees, rolls back and leaves complete=0 so the next sync observes and resets the state. A partial or over-full cache never becomes the current set. * Adds _deleteIncompleteSparkSetsForGroup — discards in-progress rows (and their SparkSetCoins links) for a group. Used on blockHash shift or corruption detection. Finalized rows are not touched. * Defensive: _insertSparkAnonSetCoinsIncremental refuses to append to a row with complete=1 (catches a pathological case where the server reports the same blockHash/setHash as an already-finalized set). Reader (firo_cache_reader.dart): * All anon-set-visible reads gain `WHERE ss.complete = 1`, so in-progress rows never leak partial coins to callers. This is load-bearing for libspark membership-proof construction: a spend initiated mid-sync must not see half-populated SparkSetCoins. * Coin ordering is preserved end-to-end via `ORDER BY ss.id ASC, ssc.orderKey DESC, ssc.id ASC`. The orderKey DESC sort yields server newest-first after the coordinator's existing Dart `.reversed`, matching the pre-fix behavior. The `ssc.id ASC` tiebreaker covers pre-migration rows whose orderKey defaults to 0: their PK order is the exact layout the old writer produced (it inserted coins in globally-reversed RPC order). * _getLatestSetInfoForGroupId gains a deterministic `ss.id DESC` tiebreaker for the same-size edge case. Coordinator (firo_cache_coordinator.dart): * Resumable sync loop: reads the in-progress row if any, verifies its blockHash and setHash match the current meta, then resumes at the count of already-linked coins. If blockHash/setHash disagree, discards the in-progress row and starts fresh. If the linked count exceeds the expected delta, treats as corrupt and resets. * Per-sector server response size is cross-checked against the request range; a mismatch aborts before persisting, leaving the cursor (and thus the resume point) unchanged. * Reorg defense: skip sync if the server reports a smaller size than the last finalized state. * Same-blockHash-different-size anomaly: skip sync rather than let INSERT OR IGNORE append unverified coins to the already-finalized row and leak them past its committed setHash. * Empty-delta case (blockHash advanced without new coins): do not write a new SparkSet row. A same-size row would create a tiebreaker ambiguity in _getLatestSetInfoForGroupId; any stray in-progress row from a prior attempt is cleared. No changes required to external callers — FiroCacheCoordinator's public signatures are preserved. Verification * sqlite3 CLI simulation of 14 scenarios: first sync, incremental sync, integrity-gate rejection, resume from committed cursor, blockhash-shift reset, idempotent sector replay, cross-set no duplicates, end-to-end ordering equivalence with pre-fix, pre-migration data read unchanged, and five focused safety-check tests. All pass. * Upgrade-path simulation against a pre-migration DB (two finalized syncs, five coin link-rows): reader output is byte-for-byte identical before and after migration; a subsequent post-migration incremental sync produces the correct hybrid layout. * External-caller audit (firo_wallet.dart, spark_interface.dart, settings/UI views): all callers use unchanged public signatures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-review — findings after cold adversarial readRan a second pass with a skeptical Dart/SQLite reviewer agent to catch things I missed. Below are all findings from both passes, triaged by actionability. TL;DR: no bugs found that this PR needs to fix before merge. Details: False positives (analyzed and dismissed)Claim: The scenario requires the server to return the same coin at two different indices for a given For cross-sync consistency: the The Claim: Redundant spark_interface.dart:608 and :614 both set Claim: Reader ordering is only newest-first within a set, not across sets. Traced end-to-end with the delta-storage model. For sets 1 (size 100 at H1) and 2 (delta 50 at H2=150):
End-to-end order is correct across sets. The comment in the reader accurately describes the behavior. Pre-existing issues not introduced by this PRConcurrency: main-isolate reads vs worker-isolate writes on the same SQLite file. The sqlite3 package opens in rollback-journal mode by default; two handles on the same file can contend on the database-level lock. In my sync flow, main-isolate reads happen before the first worker task is dispatched, so the sync itself doesn't contend. But UI-initiated reads ( TOCTOU between spark_interface.dart:589 then :597 reads coins, then reads meta. Between the two
The FK declarations on Notes worth documenting but not changingProgress-bar callback semantic inconsistency. Initial call passes
Migrated DBs get Orphan SparkCoin rows on blockHash shift.
In pathological scenarios (multiple aborted syncs at different blockHashes) several incomplete rows could coexist. My query returns Testing asks for the reviewerThree things this environment couldn't run:
Happy to address any of the above findings (e.g., enabling WAL, tightening the 🤖 Self-review generated with Claude Code |
Problem
runFetchAndUpdateSparkAnonSetCacheForGroupIdaccumulates every sector of a group's anonymity set in an in-memory DartListand only writes to SQLite after the full delta has been fetched. A network drop, app kill, or power loss mid-download discards the in-memory buffer and leaves no persisted progress, so the next sync attempt recomputes the samemeta.size - prevSizedelta and re-downloads everything. For a large first-sync this burns the whole set repeatedly.Additionally, a related design — persisting partial sectors directly to
SparkSetwith acompleteflag but not filtering reader queries on it — has a second failure mode: a spend initiated while a sync is running can observe a half-populated anonymity set and hand libspark asetHash-mismatched set, failing the membership proof.Fix
Per-sector commits against an in-progress
SparkSetrow that stays invisible to readers (complete = 0) until a strict integrity check passes, at which point a single atomicUPDATEflips it tocomplete = 1. Partial data is never observable by readers, and the next sync resumes from the count of already-linked coins.Design
SparkSet.complete(integrity gate) andSparkSetCoins.orderKey(server-side delta index, for ordering preservation), plus aUNIQUE INDEXonSparkSetCoins(setId, coinId)so per-sectorINSERT OR IGNOREdedupes correctly under crash-recovery replay.PRAGMA table_info/sqlite_masterpresence checks rather thantry/catcharoundALTER(so unrelated SQLite errors don't get silently swallowed). Defensive dedup of any legacy duplicate(setId, coinId)rows before creating theUNIQUEindex. ExistingSparkSetrows default tocomplete = 1(the pre-fix writer was all-or-nothing, so they represent finalized syncs); existingSparkSetCoins.orderKeydefaults to 0 and the reader'sssc.id ASCtiebreaker reproduces the pre-fix layout byte-for-byte.COUNT(SparkSetCoins) == expectedDelta), and delete-incomplete (for blockHash shift / corruption reset). Each runs in its own SQLite transaction.WHERE ss.complete = 1. Coin ordering reconstructed viaORDER BY ss.id ASC, ssc.orderKey DESC, ssc.id ASC— matches pre-fix end-to-end behavior for both pre-migration rows (orderKey tied at 0, PK tiebreaker) and post-migration rows (orderKey meaningful).blockHashandsetHashmatch the current server meta, resumes atCOUNT(SparkSetCoins)if they do, discards and restarts if they don't. Sector-size cross-check refuses to persist a partial/over-full server response. Reorg-shrink and blockHash-advanced-without-new-coins cases are handled explicitly to avoid corrupting finalized state.Integrity guarantees
COUNT(SparkSetCoins).complete=1flipped or not — no intermediate. Either way, next sync resumes correctly.complete=0. Next sync observes the over-linked row and resets.complete=1filter keeps in-progress data invisible. Spend sees the last finalized state.Upgrade path
Fully automatic — no manual cache clear, no wallet rescan, no user action. The migration runs on first open of the new version:
ALTER TABLE SparkSet ADD COLUMN complete INTEGER NOT NULL DEFAULT 1(legacy rows are finalized).ALTER TABLE SparkSetCoins ADD COLUMN orderKey INTEGER NOT NULL DEFAULT 0.CREATE UNIQUE INDEX.Verified against a realistic pre-migration DB (two finalized syncs, five link rows): reader output is byte-for-byte identical before and after migration; a subsequent post-migration incremental sync produces the expected hybrid layout.
Review strategy
The change is split into two commits with clear separation of concerns, so a reviewer can verify them independently:
1.
1d543d8— schema + migration (+126 −5, 2 files)Purely additive. Existing reader/writer/coordinator are untouched, so behavior for anyone running only this commit is unchanged. Verify the schema design and migration correctness in isolation before looking at any logic.
2.
90cd308— behavioral fix (+424 −77, 4 files)Writer, worker, reader filter/ordering, and coordinator rewrite. Public
FiroCacheCoordinatorsignatures are preserved, so no callers infiro_wallet.dart,spark_interface.dart, or UI views need changes.Test plan
Environment didn't have the Flutter/Dart toolchain available during development, so I verified by replaying the writer's exact SQL against an in-memory
sqlite3CLI instance across 18 scenarios:(setId, coinId)dedup keeps MIN(id) per group beforeCREATE UNIQUE INDEXfiro_wallet.dart,spark_interface.dart, UI) use unchanged public signaturesflutter analyze/dart analyzepass🤖 Generated with Claude Code