Draft
Conversation
For lists and vectors of uint64 elements (like Balances, InactivityScores, Slashings), use bulk memory operations instead of per-element reflection dispatch. This avoids the overhead of reflect.Value.Index() + marshalType/ unmarshalType/buildRootFromType calls for each element. Benchmarks show ~12-16% improvement on StateMainnet marshal/unmarshal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… copy Write directly to the hash buffer instead of going through the tmp buffer and AppendBytes32. This reduces from 2 appends to 1 append + direct write for each Put operation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use the existing sszutils.HashUint64Slice() function for bulk memory copy when hashing uint64 lists and vectors, instead of a per-element AppendUint64 loop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
For addressable fixed-size byte arrays (like [48]byte BLSPubKey, [32]byte Hash32), use unsafe.Slice to get bytes directly instead of going through reflect.Value.Bytes() which takes a slow path for arrays. This avoids the reflect bytesSlow path overhead (~4% of marshal time). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Return the pre-computed Size directly for primitive types (bool, uintN, intN, floatN) and byte-array vectors without entering the full switch dispatch. This avoids unnecessary function call overhead when sizing basic types recursively. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add early returns in Merkleize for the common cases of single-chunk (32 bytes) and two-chunk (64 bytes) inputs, avoiding the full merkleizeImpl call with its capacity pre-check and loop overhead. Single-chunk: just return (data is already in place) Two-chunk: single hash call directly, skip merkleizeImpl entirely Benchmarks show ~9% improvement on StateMainnet HashTreeRoot. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Skip the AppendBytes32 call and its modulo check for the common case of exactly 32-byte inputs (Hash32, Root, WithdrawalCredentials). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace 3-step append (MarshalUint64 + output + zeroBytes[:24]) with a single append of 32 zero bytes + direct binary.LittleEndian.PutUint64 write. Reduces from 3 appends to 1 append + 1 direct write. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add early return for 3-4 chunk (96-128 bytes) inputs in Merkleize, avoiding the full merkleizeImpl call. Uses two direct hash operations instead of the loop-based approach. This helps containers with 3-4 fields (like Fork, Checkpoint). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add direct 3-hash-operation path for exactly 256 bytes (8 chunks), which is the common case for containers with 8 fields like Validator. This avoids the merkleizeImpl function call overhead and its loop. Only safe for exact power-of-2 chunk counts where no zero-hash padding is needed at intermediate levels. Benchmarks show ~4.5% additional improvement on StateMainnet HTR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add direct 4-hash-operation path for exactly 512 bytes (16 chunks). Useful for containers with 16 fields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use sszutils.HashUint64Slice() for bulk memory copy when hashing uint64 vectors (like Slashings), instead of per-element AppendUint64. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Return ReflectionCtx by value from NewReflectionCtx to keep it on the stack (eliminates 1 heap alloc per operation) - Inline BufferDecoder limits stack as [16]int array to avoid separate slice allocation (saves 128 bytes per unmarshal) - Reuse existing slice backing arrays in unmarshalList/unmarshalVector/ unmarshalDynamicList/unmarshalDynamicVector when capacity is sufficient (avoids reflect.MakeSlice for repeated unmarshal on same target) - ExpandSlice: reuse existing capacity instead of always allocating new Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace sourceValue.Interface().([]uint64) with unsafe.Pointer + unsafe.Slice to access uint64 slice data directly, avoiding the heap allocation from reflect.Value.Interface() boxing. Marshal: 5 → 2 allocs/op (−3), −72 B/op MarshalWriter: 6 → 3 allocs/op (−3), −70 B/op HTR: 5 → 4 allocs/op (−1) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Embed a [32]byte array in StreamEncoder and use it as the backing for the scratch slice, avoiding a separate make([]byte, 0, 32) heap allocation per MarshalWriter call. MarshalWriter: 3 → 2 allocs/op, 2178 → 2176 B/op Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Set defaultHasherBufSize to 4MB for new hashers. This is enough for most BeaconState HTR operations (100K validators × 32 bytes = 3.2MB) without triggering buffer regrowth. When the hasher is pooled and reused, the capacity is retained. When GC evicts it, the new hasher starts with sufficient capacity. HTR: 4 allocs/op → 2 allocs/op (stable), ~1MB → ~233KB B/op Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The bulk uint64 unmarshal path was creating make([]uint64, n) and setting it via reflect.ValueOf, which panics when the target slice has a defined element type (e.g. type Gwei uint64 vs Gwei = uint64). Fix by using reflect.MakeSlice with the correct target type, then using unsafe to get a []uint64 view for the bulk decode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add DefinedUint64 (type DefinedUint64 uint64, not alias) test cases for both list and vector operations. These catch bugs where bulk uint64 paths create []uint64 instead of the correct defined type, which causes reflect.Value.Set to panic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #132 +/- ##
==========================================
- Coverage 92.59% 91.94% -0.65%
==========================================
Files 44 44
Lines 8826 8956 +130
==========================================
+ Hits 8172 8235 +63
- Misses 397 444 +47
- Partials 257 277 +20
🚀 New features to boost your workflow:
|
reflect.Value.Pointer() panics on array values — it only works on slices, pointers, maps, channels, and funcs. Add sourceType.Kind == reflect.Slice guard to all bulk uint64 fast paths that use Pointer() in marshal, unmarshal, and HTR. Fixes panic in buildRootFromVector for array-typed uint64 vectors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
let claude iterate on reflection improvements for several hours.
not for merge, more for cherry picking ideas
perf: improve reflection path performance and reduce allocations
This PR optimizes the reflection-based SSZ paths across all operations: marshal, unmarshal, hash tree root, and their streaming variants. Changes target three areas: bulk data handling for uint64 slices, hasher/merkleization internals, and heap allocation reduction.
Changes
Bulk uint64 fast paths (
reflection/marshal.go,reflection/unmarshal.go,reflection/treeroot.go)For
[]uint64lists and vectors (Balances, InactivityScores, Slashings), bypass per-element reflection dispatch. Marshal/HTR useunsafe.Pointer+unsafe.Sliceto access the slice data directly and callEncodeUint64Slice/HashUint64Slice. Unmarshal usesreflect.MakeSlicewith the correct target type (supporting defined types liketype Gwei uint64) then decodes viaunsafeview.Hasher PutX optimization (
hasher/hasher.go)Rewrite
PutUint64/PutUint32/PutUint16/PutUint8/PutBoolto append 32 zero bytes then write the value directly, instead of encoding into atmpbuffer and callingAppendBytes32.Merkleize fast paths (
hasher/hasher.go)Add early returns in
Merkleize()for the most common input sizes, avoiding the fullmerkleizeImplloop:MerkleizeWithMixin optimization (
hasher/hasher.go)Replace 3-step mixin size encoding (
MarshalUint64→ append → pad) with a single 32-byte zero append + directPutUint64write.PutBytes / AppendBytes32 fast paths (
hasher/hasher.go)Skip the modulo-32 padding check for exact 32-byte inputs (Hash32, Root, WithdrawalCredentials).
Unsafe byte access for arrays (
reflection/marshal.go,reflection/treeroot.go)Use
unsafe.Slice((*byte)(sourceValue.UnsafeAddr()), len)for addressable[N]bytearrays, avoidingreflect.Value.Bytes()which takes a slow path for array types.Fast size path for primitives (
reflection/sszsize.go)Return pre-computed
TypeDescriptor.Sizedirectly for primitive types and byte-array vectors ingetSszValueSize, skipping the full switch dispatch.Allocation reductions (
reflection/common.go,sszutils/decoder_buffer.go,sszutils/encoder_stream.go,sszutils/unmarshal.go,reflection/unmarshal.go)ReflectionCtxby value to keep it on the stack (−1 alloc per op)BufferDecoderlimits as[16]intembedded array (−1 alloc, −128B per unmarshal)StreamEncoderscratch as[32]byteembedded array (−1 alloc per MarshalWriter)ExpandSlice: reuse existing capacity instead of always allocatingInterface()boxing in bulk uint64 paths viaunsafe.Pointer(−3 allocs on marshal)Hasher buffer pre-allocation (
hasher/hasher.go)Pre-allocate 4MB buffer for new hashers, sized for 100K validators × 32 bytes. Eliminates buffer regrowth allocations during HTR. Stabilizes alloc count from variable 0–5 to consistent 2.
Benchmark Results — StateMainnet
Throughput (ns/op, average of 3 runs)
Allocations (per op)
Benchmark Results — BlockMainnet
Diff
No public API changes. All existing tests pass.