Skip to content

fix: handle nil block result in EmptyBlocksSanitizer#19

Merged
rswanson merged 2 commits intomasterfrom
fix/empty-blocks-sanitizer-nil-result
Apr 16, 2026
Merged

fix: handle nil block result in EmptyBlocksSanitizer#19
rswanson merged 2 commits intomasterfrom
fix/empty-blocks-sanitizer-nil-result

Conversation

@rswanson
Copy link
Copy Markdown
Member

Summary

  • Indexer.Fetcher.EmptyBlocksSanitizer.classify_blocks_from_result/1 assumed every eth_getBlockByNumber batch entry carried a block map, so a spec-compliant result: null (returned for pruned/reorged blocks) crashed the GenServer with BadMapError at Map.get(nil, "transactions", nil).
  • Because the fetcher uses restart: :permanent, each bad block kept crashing the process until the supervisor exhausted its restart intensity, shut down the indexer application, and took the whole BEAM VM with it — producing a K8s crashloop.
  • Added a match clause for %{id: _, result: nil} that logs a warning and skips the block, plus a regression test that mocks a nil batch response.

Context

Observed in prod against an upstream RPC (Signet sidecar) that correctly returns null for blocks that have been truncated out of cold storage after a host-chain reorg. The previous logs showed:

** (BadMapError) expected a map, got: nil
    (elixir 1.19.4) lib/map.ex:541: Map.get(nil, \"transactions\", nil)
    (indexer 10.0.0) lib/indexer/fetcher/empty_blocks_sanitizer.ex:119: anonymous fn/2 in Indexer.Fetcher.EmptyBlocksSanitizer.classify_blocks_from_result/1
    ...
Application indexer exited: shutdown
Kernel pid terminated (application_controller)

Test plan

  • CI: new test skips blocks for which JSON-RPC returns nil result passes
  • CI: existing empty_blocks_sanitizer_test.exs cases still pass
  • Deploy `:latest` to the signet prod cluster and confirm the pod stops crashlooping

Local validation

  • mix format --check-formatted couldn't be run in isolation (project .formatter.exs requires a full mix deps.get). Files were checked against the default Elixir formatter with line_length: 120 (matching .formatter.exs) and reported as already formatted.
  • Full test suite was not runnable locally; relying on CI.

🤖 Generated with Claude Code

rswanson and others added 2 commits April 16, 2026 17:07
A spec-compliant JSON-RPC server returns `result: null` for blocks it
cannot find (e.g. pruned or reorged out). The previous implementation of
`classify_blocks_from_result/1` assumed every batch response carried a
block map, so a `nil` result crashed the GenServer with `BadMapError` at
`Map.get(nil, "transactions", nil)`. Because the fetcher uses
`restart: :permanent`, repeated crashes eventually exhausted the
supervisor's restart intensity and brought down the `indexer`
application, taking the whole BEAM VM with it.

Add a match clause for `%{id: _, result: nil}` that logs a warning and
skips the block. Include a regression test that mocks a nil batch
response and asserts the sanitizer does not mutate the DB record.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The initial nil-handling commit silently skipped blocks for which the
JSON-RPC returned `result: null`, which left those blocks with
`is_empty: nil, refetch_needed: false` — the exact predicate that
`consensus_blocks_with_nil_is_empty_query/1` selects for. The sanitizer
would then re-query the same blocks every cycle (every 10s), producing
perpetual RPC traffic and log spam.

Reconcile requested vs returned block numbers in the caller and call
`Block.set_refetch_needed/1` on the difference. The refetch_needed
predicate then excludes them from subsequent sanitizer cycles and hands
them off to the regular refetch path — consistent with how the fetcher
already handles blocks whose RPC view disagrees with the DB.

Also strengthen the regression test so it actually discriminates the
fixed code from the buggy code:
- Replace `Process.sleep/1` with `wait_for_results/1`, which polls for
  `refetch_needed == true` and fails the test if the sanitizer never
  updates the row (as happens on the pre-fix `BadMapError` crash path).
- Assert both `is_empty == nil` (unchanged) and `refetch_needed == true`
  (positive signal only the fix can produce).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rswanson rswanson self-assigned this Apr 16, 2026
@rswanson rswanson merged commit 034ea1a into master Apr 16, 2026
60 of 63 checks passed
rswanson added a commit that referenced this pull request Apr 17, 2026
When the JSON-RPC returns \`result: null\` for a block (pruned, reorged
out, or genuinely missing), \`classify_blocks_from_result/1\` crashed
with \`BadMapError\` on \`Map.get(nil, "transactions", nil)\`. Because the
fetcher is \`restart: :permanent\`, each crash exhausted the supervisor's
restart intensity and brought down the \`indexer\` application + BEAM VM,
producing a K8s crashloop against Signet's sidecar RPC.

- Add a \`%{id: _, result: nil}\` match clause that skips without
  crashing.
- Reconcile requested vs. returned block numbers in the caller and
  flag the missing ones with \`Block.set_refetch_needed/1\` so
  \`consensus_blocks_with_nil_is_empty_query\` stops re-selecting them
  every cycle.
- Regression test stubs a nil batch response and uses
  \`wait_for_results\` to assert \`refetch_needed == true\`; on the
  un-fixed code the GenServer crashes and the assertion times out.

Merged equivalent of master PR #19 onto signet-main. Preserves the
existing defensive \`Map.get(block, "transactions") || []\` guard from
this branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant