Skip to content

codegen improvements [AI slop]#131

Draft
pk910 wants to merge 10 commits intomasterfrom
perf/codegen-improvements
Draft

codegen improvements [AI slop]#131
pk910 wants to merge 10 commits intomasterfrom
perf/codegen-improvements

Conversation

@pk910
Copy link
Copy Markdown
Owner

@pk910 pk910 commented Mar 16, 2026

let claude iterate on codegen improvements for several hours.
not for merge, more for cherry picking ideas


perf: improve hasher/merkleization performance and reduce allocations

This PR optimizes the shared hasher and SSZ utility code used by the code generation path. All changes are in hasher/hasher.go and sszutils/, affecting hash tree root computation speed and allocation behavior across all codegen operations.

Changes

Hasher PutX optimization (hasher/hasher.go)
Rewrite PutUint64/PutUint32/PutUint16/PutUint8/PutBool to append 32 zero bytes then write the value directly, instead of encoding into a tmp buffer and calling AppendBytes32.

Merkleize fast paths (hasher/hasher.go)
Add early returns in Merkleize() for the most common input sizes, avoiding the full merkleizeImpl function call with its getDepth computation and loop:

  • 1 chunk (32 bytes): return immediately — data already in place
  • 2 chunks (64 bytes): single hash call — handles Checkpoint, BLSPubKey
  • 3–4 chunks (96–128 bytes): two hash calls — handles Fork and similar
  • 8 chunks (256 bytes): three hash calls — handles Validator (100K per state, biggest single win)
  • 16 chunks (512 bytes): four hash calls

MerkleizeWithMixin optimization (hasher/hasher.go)
Replace 3-step mixin size encoding (MarshalUint64 → append → pad) with a single 32-byte zero append + direct PutUint64 write.

PutBytes fast path (hasher/hasher.go)
Skip the AppendBytes32 call and its modulo-32 check for exact 32-byte inputs (Hash32, Root, WithdrawalCredentials).

Hasher buffer pre-allocation (hasher/hasher.go)
Pre-allocate 4MB buffer for new hashers, sized for 100K validators × 32 bytes = 3.2MB peak. Eliminates buffer regrowth allocations during HTR. Codegen HTR achieves 0 allocs/op consistently.

Inline BufferDecoder limits stack (sszutils/decoder_buffer.go)
Embed [16]int array in BufferDecoder, point the limits slice at it. Eliminates a separate make([]int, 0, 16) allocation per unmarshal.

Inline StreamEncoder scratch buffer (sszutils/encoder_stream.go)
Embed [32]byte array in StreamEncoder, point the scratch slice at it. Eliminates make([]byte, 0, 32) per MarshalWriter.

ExpandSlice capacity reuse (sszutils/unmarshal.go)
When cap(src) >= size, use src[:size] instead of make([]T, size). Helps repeated unmarshal on the same target.

Benchmark Results — StateMainnet

Throughput (ns/op, average of 3 runs)

Operation Before After Δ
HashTreeRoot 59,817K 50,585K −15.4%
Unmarshal 6,261K 6,191K
Marshal 3,996K 3,997K
MarshalWriter 4,339K 4,901K

Allocations (per op)

Operation Before allocs After allocs Before B/op After B/op
HashTreeRoot 0–2 (unstable) 0 (stable) 0–840K ~175K (amortized)
MarshalWriter 6 3 2,480 2,336
UnmarshalReader 1,522 1,520 164,698 164,186

Benchmark Results — BlockMainnet

Operation Before (ns/op) After (ns/op) Δ
HashTreeRoot 516,924 453,560 −12.3%
Operation Before allocs After allocs
HashTreeRoot 0 0

Diff

4 files changed, 120 insertions(+), 45 deletions(-)

hasher/hasher.go, sszutils/decoder_buffer.go, sszutils/encoder_stream.go, sszutils/unmarshal.go

No public API changes. All existing tests pass.

pk910-agent and others added 10 commits March 16, 2026 17:58
… copy

Write directly to the hash buffer instead of going through the tmp
buffer and AppendBytes32. This reduces from 2 appends to 1 append +
direct write for each Put operation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add early returns in Merkleize for the common cases of single-chunk
(32 bytes) and two-chunk (64 bytes) inputs, avoiding the full
merkleizeImpl call with its capacity pre-check and loop overhead.

Single-chunk: just return (data is already in place)
Two-chunk: single hash call directly, skip merkleizeImpl entirely

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Skip the AppendBytes32 call and its modulo check for the common case
of exactly 32-byte inputs (Hash32, Root, WithdrawalCredentials).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace 3-step append (MarshalUint64 + output + zeroBytes[:24]) with a
single append of 32 zero bytes + direct binary.LittleEndian.PutUint64
write. Reduces from 3 appends to 1 append + 1 direct write.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add early return for 3-4 chunk (96-128 bytes) inputs in Merkleize,
avoiding the full merkleizeImpl call. Uses two direct hash operations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add direct 3-hash-operation path for exactly 256 bytes (8 chunks).
Common for containers with 8 fields like Validator.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Inline BufferDecoder limits stack as [16]int array to avoid separate
  slice allocation (saves 128 bytes per unmarshal)
- ExpandSlice: reuse existing capacity instead of always allocating new

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Embed a [32]byte array in StreamEncoder and use it as the backing
for the scratch slice, avoiding a separate make([]byte, 0, 32) heap
allocation per MarshalWriter call.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Set defaultHasherBufSize to 4MB for new hashers. When the hasher is
pooled and reused, the capacity is retained. Eliminates buffer growth
allocations during HTR.

HTR: consistently 0 allocs/op for codegen path

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 16, 2026

Codecov Report

❌ Patch coverage is 91.66667% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.44%. Comparing base (168669d) to head (9044fb7).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #131      +/-   ##
==========================================
- Coverage   92.59%   92.44%   -0.15%     
==========================================
  Files          44       44              
  Lines        8826     8883      +57     
==========================================
+ Hits         8172     8212      +40     
- Misses        397      408      +11     
- Partials      257      263       +6     
Components Coverage Δ
dynssz 98.19% <91.66%> (-0.40%) ⬇️
dynsszgen 87.54% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant