Skip to content

feat(host-reth): replace ExEx backfill with direct DB reads#136

Merged
prestwich merged 9 commits intomainfrom
prestwich/db-backfill
Apr 20, 2026
Merged

feat(host-reth): replace ExEx backfill with direct DB reads#136
prestwich merged 9 commits intomainfrom
prestwich/db-backfill

Conversation

@prestwich
Copy link
Copy Markdown
Member

@prestwich prestwich commented Apr 10, 2026

Summary

  • Replace ExEx-driven backfill (which re-executes blocks) with direct DB reads from the reth provider, fixing slow startup and memory issues
  • New DbBackfill<P> reads blocks+receipts in batches up to finalized, then hands off to the ExEx stream for live blocks
  • HostChain enum unifies backfill and live chain segments behind Extractable
  • set_head documented and enforced as once-only across both notifier implementations

Details

The ExEx backfill mechanism re-executes historical blocks on startup, which is slow and has caused memory issues. The reth DB already contains executed results, making re-execution unnecessary.

New types (signet-host-reth):

  • DbBlock / DbChainSegment — owned block+receipts from DB, implements Extractable
  • DbBackfill<P> — batch reader using spawn_blocking for MDBX reads
  • HostChain — enum wrapping DbChainSegment (backfill) and RethChain (live)

Notifier changes:

  • RethHostNotifier::next_notification is now two-phase: drain DB backfill, then switch to ExEx
  • set_head creates a DbBackfill instead of calling ExEx set_with_head
  • ExEx stream is initiated after backfill completes, pointed at the last backfilled block
  • reth-stages-types dependency removed

Cross-crate:

  • HostNotifier::set_head trait doc clarified as once-only
  • RpcHostNotifier::set_head guards against repeated calls

Review follow-ups

  • Fraser: set_backfill_thresholds(None) now resets to DEFAULT_BATCH_SIZE via a new DbBackfill::reset_batch_size, matching the trait contract and RpcHostNotifier.
  • Evalir: Removed the genesis fallback after backfill completion. The ExEx startup race it was defending against was fixed upstream in reth (#19665 / #22168, merged Feb 2026), and at our call site DbBackfill has just successfully read last_backfilled from the same provider — so a missing header there now indicates DB-level failure and returns RethHostError::MissingHeader.

Closes ENG-1784

Test plan

  • cargo clippy passes (both --all-features and --no-default-features)
  • RUSTDOCFLAGS="-D warnings" cargo doc passes
  • All existing tests pass (signet-host-reth, signet-node-types, signet-host-rpc)
  • signet-node compiles cleanly with new HostChain type
  • Integration test on a reth node with historical data (manual)

🤖 Generated with Claude Code

prestwich and others added 8 commits April 10, 2026 12:17
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduces three types for reading host-chain blocks from the reth DB:
- `DbBlock`: owned block+receipts pair from the provider
- `DbChainSegment`: newtype over `Vec<DbBlock>` implementing `Extractable`
  using the same `RecoveredBlockShim` transmute pattern as `RethChain`
- `DbBackfill<P>`: batch reader that walks from a cursor to the finalized
  block, recording metrics per batch via `crate::metrics`

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds `HostChain` enum with `Backfill(DbChainSegment)` and `Live(RethChain)`
variants, both delegating to the inner `Extractable` impl. Promotes
`DbChainSegment` to `pub` and re-exports both new types from the crate root.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace reth's built-in ExEx backfill with DbBackfill for startup
catch-up. The notifier now runs a two-phase loop: phase 1 drains DB
batches via DbBackfill, then phase 2 switches to live ExEx
notifications. set_head initializes backfill instead of resolving a
header directly, and set_backfill_thresholds configures DbBackfill
batch size. Chain type changes from RethChain to HostChain enum.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@prestwich prestwich requested a review from a team as a code owner April 10, 2026 16:34
@prestwich prestwich requested review from Evalir and Fraser999 April 10, 2026 16:36
Comment thread crates/host-reth/src/notifier.rs Outdated
Comment on lines +43 to +52
const {
assert!(
size_of::<RecoveredBlockShim>() == size_of::<RethRecovered>(),
"RecoveredBlockShim layout diverged from RethRecovered"
);
assert!(
align_of::<RecoveredBlockShim>() == align_of::<RethRecovered>(),
"RecoveredBlockShim alignment diverged from RethRecovered"
);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very nice

Comment thread crates/host-reth/src/notifier.rs Outdated
let backfill = self.backfill.take().expect("backfill was Some");
let last_backfilled = backfill.cursor().saturating_sub(1);

let head = self
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, this behavior of fetching the last seen block, and if that fails, fall back to genesis, stems from a probably still unresolved bug where Reth starts the exex before its connected to its own db provider (that's roughly what I remember). This ofc complicates the logic quite a bit by adding the genesis fallback path.

Considering that most of the time we might be falling into phase 1 when restarting the node to catch up with a few blocks, maybe we can simplify this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code]

Good call — ripped it out in ca2b6c2. A bit of digging in reth turned up #19665 / #22168 (merged Feb 2026), which fixed the backfill notification-channel stall that I believe was producing the symptom you remembered. And at this specific call site, DbBackfill has already successfully read last_backfilled from the same provider moments earlier, so if sealed_header returns None here it's DB corruption, not a startup race. Now returns RethHostError::MissingHeader instead of quietly pretending we're at genesis.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💆💆💆💆💆 feels good to remove this piece of code, always made me feel bad having it

- `set_backfill_thresholds(None)` now resets the batch size to
  `DEFAULT_BATCH_SIZE` via a new `DbBackfill::reset_batch_size`,
  matching the trait contract ("`None` means use the backend's
  default") and the `RpcHostNotifier` implementation.
- Remove the genesis fallback after backfill completion. The
  documented ExEx startup race it was defending against
  (reth #19665 / #22168) was fixed upstream, and in any case
  `DbBackfill` just read `last_backfilled` from the same provider,
  so a missing header at this point indicates DB-level failure.
  Now returns `RethHostError::MissingHeader` instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@prestwich prestwich merged commit 05b6814 into main Apr 20, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants