Skip to content

fix: prevent seed state cache poisoning in loadState clone#9246

Open
lodekeeper wants to merge 3 commits intoChainSafe:unstablefrom
lodekeeper:fix/loadstate-clone-cache-aliasing
Open

fix: prevent seed state cache poisoning in loadState clone#9246
lodekeeper wants to merge 3 commits intoChainSafe:unstablefrom
lodekeeper:fix/loadstate-clone-cache-aliasing

Conversation

@lodekeeper
Copy link
Copy Markdown
Contributor

Summary

Fixes the v1.42.0 "Withdrawal mismatch at index=0" regression by changing the two clone() calls in loadState.ts to clone(true) (i.e. dontTransferCache=true).

The default clone() in @chainsafe/ssz transfers the source sub-view's cache to the new instance, which means the migratedState.validators sub-view and the seed container's cached child-view snapshot share the same internal nodes[] and caches[] arrays. A subsequent migratedState.commit() writes modified validator / inactivity-score nodes into those shared arrays, silently corrupting the seed state's cache snapshot at the modified indices.

The corruption stays latent until the seed state is later cloned with transfer-cache enabled — the path verifyBlock takes via preState.clone({dontTransferCache: false}). On that next-block clone, reads at the modified index return the migrated state's validator instead of the seed's, which surfaces as Withdrawal mismatch at index=0 divergences between Lodestar and EL.

Timeline of the corruption

// Inside loadState() with seedState = head state:
migratedState.validators = seedState.validators.clone();
// ^ migrated.validators.nodes === seedState.caches[validatorsIndex].nodes (shared)

for (const i of modifiedValidators) {
  migratedState.validators.set(i, loadValidator(...));  // staged in viewsChanged
}

migratedState.commit();
// ^ arrayComposite.js commit(): `this.nodes[index] = node`
//   writes newValidator into the SHARED nodes[] array.
//   seedState.caches[validatorsIndex].nodes[modifiedIndex] is now poisoned.

On the next block:

const preState = headState.clone();  // default dontTransferCache=false
// ^ transfers the poisoned caches[] snapshot to preState
preState.validators.getReadonly(i);  // returns migrated validator, not seed's
// -> getExpectedWithdrawals reads wrong validator at index 0
// -> "Withdrawal mismatch at index=0"

Root cause

Introduced by #8857 (chore: consume BeaconStateView) which added the loadOtherState / shared-head seed path that exercises this clone in production.

Test plan

  • New regression test loadState does not poison seed state's cache in packages/state-transition/test/unit/util/loadState.test.ts
  • Verified the test FAILS without the fix (reads 0xaa-filled validator on postState.clone()) and PASSES with the fix
  • All 176 existing state-transition util tests pass
  • check-types and lint clean

Relation to #9245

PR #9245 (fix: gate loadOtherState validators/balances preload behind opt-in) addresses a different regression from the same #8857-era changes — the eager getAllReadonlyValues() preload causing memory spikes on the API path. The two fixes are independent and both needed:

🤖 Generated with AI assistance

The default `clone()` in `@chainsafe/ssz` transfers the source sub-view's
cache to the new instance, which means `migrated.validators` and the seed
container's cached child-view snapshot share the SAME internal `nodes[]`
and `caches[]` arrays. A subsequent `migratedState.commit()` then writes
modified validator / inactivity-score nodes into those shared arrays,
silently corrupting the seed state's cache snapshot at the modified
indices.

The corruption stays latent until the seed is later cloned with
transfer-cache enabled - the path `verifyBlock` takes via
`preState.clone({dontTransferCache: false})`. On that next-block clone,
reads at the modified index return the migrated state's validator instead
of the seed's, which surfaces in production as
"Withdrawal mismatch at index=0" divergences between Lodestar and EL.

Use `clone(true)` so the migrated sub-view starts with a fresh empty
cache and its commit cannot reach into the seed's cache arrays. A
regression test exercises the `loadState -> seedState.clone() ->
validators.getReadonly(modifiedIndex)` sequence.

Root cause was introduced by ChainSafe#8857 which added the `loadOtherState` /
shared-head seed path that exercises this clone in production.

🤖 Generated with AI assistance
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a cache aliasing bug in the loadState utility where cloning subviews with default settings could lead to corruption of the seed state's cache during a commit. The fix involves using clone(true) for inactivityScores and validators to ensure a fresh cache. A regression test has been added to prevent future occurrences. Feedback suggests using the more explicit object syntax {dontTransferCache: true} in the clone calls to improve code readability and maintain consistency with the inline documentation.

// silently corrupting the seed container's cache snapshot. That corruption only surfaces
// on the next `seedState.clone({dontTransferCache: false})` read, producing a
// "Withdrawal mismatch at index=0"-style divergence downstream.
migratedState.inactivityScores = seedState.inactivityScores.clone(true);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability and to avoid "magic boolean" parameters, it is recommended to use the object syntax for the clone options. This also ensures consistency with the documentation provided in the comments above (line 118) and the PR description, which both reference the {dontTransferCache: ...} syntax.

Suggested change
migratedState.inactivityScores = seedState.inactivityScores.clone(true);
migratedState.inactivityScores = seedState.inactivityScores.clone({dontTransferCache: true});

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — but this suggestion doesn't match the actual SSZ API. TreeViewDU.clone() is typed as clone(dontTransferCache?: boolean) (see @chainsafe/ssz packages/ssz/src/viewDU/abstract.ts:111 and partialListComposite.ts:156), so {dontTransferCache: true} wouldn't type-check.

The inconsistency you flagged is real, but the right fix is to update my comments to match the boolean API rather than the code to match the (incorrect) comments. Pushed dde4635 correcting the three places that referenced the object-syntax form.

// arrays, silently corrupting the seed container's cache snapshot. That corruption only
// surfaces on the next `seedState.clone({dontTransferCache: false})` read, producing a
// "Withdrawal mismatch at index=0"-style divergence downstream.
migratedState.validators = seedState.validators.clone(true);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability and to avoid "magic boolean" parameters, it is recommended to use the object syntax for the clone options. This also ensures consistency with the documentation provided in the comments above (line 201) and the PR description, which both reference the {dontTransferCache: ...} syntax.

Suggested change
migratedState.validators = seedState.validators.clone(true);
migratedState.validators = seedState.validators.clone({dontTransferCache: true});

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as aboveTreeViewDU.clone() only accepts a boolean, so the object syntax wouldn't compile. I updated the inline comments here and on the inactivity-scores path in dde4635 so they no longer reference a {dontTransferCache: false} form that doesn't exist.

The inline comments referenced a `seedState.clone({dontTransferCache: false})`
object-syntax call that does not exist — the SSZ `TreeViewDU.clone()` signature
is `clone(dontTransferCache?: boolean)`. Clarify that the corruption surfaces
on the next default `seedState.clone()` read, and that `clone(true)` is the
dontTransferCache flag.

🤖 Generated with AI assistance
Copy link
Copy Markdown
Member

@nflaig nflaig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, leaving up to @twoeths for final approval/merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants