Draft
Conversation
… copy Write directly to the hash buffer instead of going through the tmp buffer and AppendBytes32. This reduces from 2 appends to 1 append + direct write for each Put operation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add early returns in Merkleize for the common cases of single-chunk (32 bytes) and two-chunk (64 bytes) inputs, avoiding the full merkleizeImpl call with its capacity pre-check and loop overhead. Single-chunk: just return (data is already in place) Two-chunk: single hash call directly, skip merkleizeImpl entirely Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Skip the AppendBytes32 call and its modulo check for the common case of exactly 32-byte inputs (Hash32, Root, WithdrawalCredentials). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace 3-step append (MarshalUint64 + output + zeroBytes[:24]) with a single append of 32 zero bytes + direct binary.LittleEndian.PutUint64 write. Reduces from 3 appends to 1 append + 1 direct write. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add early return for 3-4 chunk (96-128 bytes) inputs in Merkleize, avoiding the full merkleizeImpl call. Uses two direct hash operations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add direct 3-hash-operation path for exactly 256 bytes (8 chunks). Common for containers with 8 fields like Validator. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Inline BufferDecoder limits stack as [16]int array to avoid separate slice allocation (saves 128 bytes per unmarshal) - ExpandSlice: reuse existing capacity instead of always allocating new Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Embed a [32]byte array in StreamEncoder and use it as the backing for the scratch slice, avoiding a separate make([]byte, 0, 32) heap allocation per MarshalWriter call. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Set defaultHasherBufSize to 4MB for new hashers. When the hasher is pooled and reused, the capacity is retained. Eliminates buffer growth allocations during HTR. HTR: consistently 0 allocs/op for codegen path Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #131 +/- ##
==========================================
- Coverage 92.59% 92.44% -0.15%
==========================================
Files 44 44
Lines 8826 8883 +57
==========================================
+ Hits 8172 8212 +40
- Misses 397 408 +11
- Partials 257 263 +6
🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
let claude iterate on codegen improvements for several hours.
not for merge, more for cherry picking ideas
perf: improve hasher/merkleization performance and reduce allocations
This PR optimizes the shared hasher and SSZ utility code used by the code generation path. All changes are in
hasher/hasher.goandsszutils/, affecting hash tree root computation speed and allocation behavior across all codegen operations.Changes
Hasher PutX optimization (
hasher/hasher.go)Rewrite
PutUint64/PutUint32/PutUint16/PutUint8/PutBoolto append 32 zero bytes then write the value directly, instead of encoding into atmpbuffer and callingAppendBytes32.Merkleize fast paths (
hasher/hasher.go)Add early returns in
Merkleize()for the most common input sizes, avoiding the fullmerkleizeImplfunction call with itsgetDepthcomputation and loop:MerkleizeWithMixin optimization (
hasher/hasher.go)Replace 3-step mixin size encoding (
MarshalUint64→ append → pad) with a single 32-byte zero append + directPutUint64write.PutBytes fast path (
hasher/hasher.go)Skip the
AppendBytes32call and its modulo-32 check for exact 32-byte inputs (Hash32, Root, WithdrawalCredentials).Hasher buffer pre-allocation (
hasher/hasher.go)Pre-allocate 4MB buffer for new hashers, sized for 100K validators × 32 bytes = 3.2MB peak. Eliminates buffer regrowth allocations during HTR. Codegen HTR achieves 0 allocs/op consistently.
Inline BufferDecoder limits stack (
sszutils/decoder_buffer.go)Embed
[16]intarray inBufferDecoder, point thelimitsslice at it. Eliminates a separatemake([]int, 0, 16)allocation per unmarshal.Inline StreamEncoder scratch buffer (
sszutils/encoder_stream.go)Embed
[32]bytearray inStreamEncoder, point thescratchslice at it. Eliminatesmake([]byte, 0, 32)per MarshalWriter.ExpandSlice capacity reuse (
sszutils/unmarshal.go)When
cap(src) >= size, usesrc[:size]instead ofmake([]T, size). Helps repeated unmarshal on the same target.Benchmark Results — StateMainnet
Throughput (ns/op, average of 3 runs)
Allocations (per op)
Benchmark Results — BlockMainnet
Diff
hasher/hasher.go,sszutils/decoder_buffer.go,sszutils/encoder_stream.go,sszutils/unmarshal.goNo public API changes. All existing tests pass.