Skip to content

prevent excessive buffer allocation in stream marshal/unmarshal#166

Open
pk910 wants to merge 1 commit intomasterfrom
pk910/fix-streaming-buffer-use
Open

prevent excessive buffer allocation in stream marshal/unmarshal#166
pk910 wants to merge 1 commit intomasterfrom
pk910/fix-streaming-buffer-use

Conversation

@pk910
Copy link
Copy Markdown
Owner

@pk910 pk910 commented Apr 7, 2026

Prevent excessive buffer allocation in stream marshal/unmarshal

Problem

When using MarshalSSZWriter or UnmarshalSSZReader with types that have generated buffer-based methods (DynamicMarshaler/DynamicUnmarshaler) but no streaming methods (DynamicEncoder/DynamicDecoder), the reflection layer immediately delegates to the buffer-based method for the entire object. For large types like BeaconState (200MB+), this defeats the purpose of streaming by allocating the full object size into a temporary buffer.

Solution

Introduce a max delegation buffer size threshold (default 200KB) that controls when the stream encoder/decoder skips buffer-based delegation and falls through to reflection-based field-by-field processing instead. This is decoupled from the existing initial buffer size (default 2KB) which controls the internal I/O buffer.

When a type's serialized size exceeds the max delegation buffer, individual fields are processed via reflection — each field may still delegate to buffer-based methods if it fits within the threshold, so only the top-level container avoids the large allocation.

Changes

New Decoder/Encoder interface methods:

  • MaxDecodeBufferSize() int — returns the configured max buffer for delegation decisions
  • MaxEncodeBufferSize() int — returns the configured max buffer for delegation decisions
  • BufferDecoder/BufferEncoder return math.MaxInt (data already in memory, no concern)
  • StreamDecoder/StreamEncoder return the configured max (default 200KB)

Decoupled buffer sizes in StreamDecoder and StreamEncoder:

  • NewStreamDecoder(reader, totalLen, bufSize, maxBufSize) — separate initial read buffer from max delegation buffer
  • NewStreamEncoder(writer, bufSize, maxBufSize) — separate internal write buffer from max delegation buffer
  • New constants: DefaultStreamDecoderMaxBufSize, DefaultStreamEncoderMaxBufSize (200KB)

New configuration options:

  • WithStreamReaderMaxBufferSize(size) — controls max delegation buffer for UnmarshalSSZReader
  • WithStreamWriterMaxBufferSize(size) — controls max delegation buffer for MarshalSSZWriter

Guard checks in reflection/unmarshal.go (3 delegation points):

  • DynamicViewUnmarshaler: skip if bufLen > decoder.MaxDecodeBufferSize()
  • FastsszUnmarshaler: skip if sszLen > decoder.MaxDecodeBufferSize() (except SszCustomType)
  • DynamicUnmarshaler: skip if sszLen > decoder.MaxDecodeBufferSize()

Guard checks in reflection/marshal.go (3 delegation points):

  • FastsszMarshaler: skip if sourceType.Size > encoder.MaxEncodeBufferSize() (except SszCustomType)
  • DynamicMarshaler: skip unless encoder is seekable or static size fits within limit
  • DynamicViewMarshaler: skip unless encoder is seekable or static size fits within limit

For the marshal path, dynamic types (unknown output size) on stream encoders conservatively skip delegation since the output size cannot be determined without marshalling. Buffer-based encoders (Seekable() == true) always delegate since the data is already in memory.

Breaking changes

  • NewStreamDecoder signature changed from (reader, totalLen, maxBufSize) to (reader, totalLen, bufSize, maxBufSize)
  • NewStreamEncoder signature changed from (writer, bufSize) to (writer, bufSize, maxBufSize)
  • Existing callers passing 0 for defaults need an additional 0 argument

Copilot AI review requested due to automatic review settings April 7, 2026 21:28
Comment thread reflection/marshal.go
newBuf, err := marshaller.MarshalSSZTo(encoder.GetBuffer())
if err != nil {
return err
if int(sourceType.Size) <= encoder.MaxEncodeBufferSize() || sourceType.SszType == ssztypes.SszCustomType {
Comment thread reflection/marshal.go
newBuf, err := marshaller.MarshalSSZDyn(ctx.ds, encoder.GetBuffer())
if err != nil {
return err
if encoder.Seekable() || (sourceType.Size > 0 && int(sourceType.Size) <= encoder.MaxEncodeBufferSize()) {
Comment thread reflection/marshal.go
newBuf, err := marshalFn(ctx.ds, encoder.GetBuffer())
if err != nil {
return true, err
if encoder.Seekable() || (sourceType.Size > 0 && int(sourceType.Size) <= encoder.MaxEncodeBufferSize()) {
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 91.42857% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.70%. Comparing base (4f7b42e) to head (e97de62).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #166      +/-   ##
==========================================
- Coverage   95.75%   95.70%   -0.05%     
==========================================
  Files          47       47              
  Lines       10921    10947      +26     
==========================================
+ Hits        10457    10477      +20     
- Misses        290      296       +6     
  Partials      174      174              
Components Coverage Δ
dynssz 99.50% <91.42%> (-0.12%) ⬇️
dynsszgen 92.21% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to preserve the benefits of streaming SSZ marshal/unmarshal by preventing the reflection layer from delegating to buffer-based (whole-object) marshal/unmarshal methods when doing so would force large temporary allocations.

Changes:

  • Added MaxEncodeBufferSize() / MaxDecodeBufferSize() to Encoder / Decoder and implemented them for buffer vs stream encoders/decoders.
  • Split StreamEncoder/StreamDecoder configuration into (1) internal I/O buffer size and (2) max delegation buffer size (default 200KB).
  • Added delegation guards in reflection marshal/unmarshal paths and new DynSsz options to configure the max delegation buffer.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
sszutils/encoder.go Extends Encoder interface with max-delegation buffer sizing.
sszutils/encoder_stream.go Adds max delegation buffer config + new StreamEncoder constructor signature.
sszutils/encoder_buffer.go Implements MaxEncodeBufferSize() as unlimited for in-memory encoding.
sszutils/decoder.go Extends Decoder interface with max-delegation buffer sizing.
sszutils/decoder_stream.go Splits internal buffer size vs max delegation buffer size + new constructor signature.
sszutils/decoder_buffer.go Implements MaxDecodeBufferSize() as unlimited for in-memory decoding.
reflection/marshal.go Adds size-based guards to avoid large buffer-based delegation on stream encoders.
reflection/unmarshal.go Adds size-based guards to avoid large buffer-based delegation on stream decoders.
options.go Adds DynSsz options for stream max delegation buffer sizes.
dynssz.go Wires new stream encoder/decoder constructor args from DynSsz options.
sszutils/stream_test.go Updates constructor call sites to new signatures.
reflection/overflow_internal_test.go Updates StreamEncoder constructor call site.
reflection/marshal_test.go Updates StreamEncoder constructor call site.
docs/streaming.md Updates docs for new constructor signatures and max-delegation buffer behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread reflection/marshal.go
Comment on lines 75 to 84
if useFastSsz {
if marshaller, ok := getPtr(sourceValue).Interface().(sszutils.FastsszMarshaler); ok {
newBuf, err := marshaller.MarshalSSZTo(encoder.GetBuffer())
if err != nil {
return err
if int(sourceType.Size) <= encoder.MaxEncodeBufferSize() || sourceType.SszType == ssztypes.SszCustomType {
newBuf, err := marshaller.MarshalSSZTo(encoder.GetBuffer())
if err != nil {
return err
}
encoder.SetBuffer(newBuf)
return nil
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FastSSZ delegation guard uses sourceType.Size to decide whether to call MarshalSSZTo, but for dynamic containers TypeDescriptor.Size is typically 0 (unknown), which will always pass the <= MaxEncodeBufferSize() check and still trigger a full temporary buffer allocation on stream encoders (the issue this PR is trying to prevent). Consider using the fastssz SizeSSZ() result (with an overflow check) when sourceType.Size == 0 (and encoder is non-seekable), or conservatively skipping delegation when the size is unknown.

Copilot uses AI. Check for mistakes.
Comment thread reflection/marshal.go
Comment on lines 96 to 106

if useDynamicMarshal {
if marshaller, ok := getPtr(sourceValue).Interface().(sszutils.DynamicMarshaler); ok {
newBuf, err := marshaller.MarshalSSZDyn(ctx.ds, encoder.GetBuffer())
if err != nil {
return err
if encoder.Seekable() || (sourceType.Size > 0 && int(sourceType.Size) <= encoder.MaxEncodeBufferSize()) {
newBuf, err := marshaller.MarshalSSZDyn(ctx.ds, encoder.GetBuffer())
if err != nil {
return err
}
encoder.SetBuffer(newBuf)
return nil
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block converts sourceType.Size (uint32) to int when comparing against MaxEncodeBufferSize() without guarding against platform overflow. On 32-bit platforms (or very large size hints), int(sourceType.Size) can wrap and incorrectly allow delegation. Consider checking sourceType.Size > math.MaxInt first (returning ErrPlatformOverflowFn) or performing comparisons in int64/uint64 safely.

Copilot uses AI. Check for mistakes.
Comment thread reflection/marshal.go
Comment on lines 218 to +227
if useViewMarshaler {
if marshaller, ok := getPtr(sourceValue).Interface().(sszutils.DynamicViewMarshaler); ok {
if marshalFn := marshaller.MarshalSSZDynView(*sourceType.CodegenInfo); marshalFn != nil {
newBuf, err := marshalFn(ctx.ds, encoder.GetBuffer())
if err != nil {
return true, err
if encoder.Seekable() || (sourceType.Size > 0 && int(sourceType.Size) <= encoder.MaxEncodeBufferSize()) {
newBuf, err := marshalFn(ctx.ds, encoder.GetBuffer())
if err != nil {
return true, err
}
encoder.SetBuffer(newBuf)
return true, nil
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same potential uint32 -> int overflow issue here: int(sourceType.Size) is used implicitly via int(...) <= MaxEncodeBufferSize() in the condition. To avoid incorrect delegation decisions on platforms where int is smaller than uint32, add an explicit overflow guard (or compare in a wider type) before converting.

Copilot uses AI. Check for mistakes.
Comment thread reflection/marshal.go
Comment on lines 75 to 106
@@ -94,12 +96,14 @@ func (ctx *ReflectionCtx) marshalType(sourceType *ssztypes.TypeDescriptor, sourc

if useDynamicMarshal {
if marshaller, ok := getPtr(sourceValue).Interface().(sszutils.DynamicMarshaler); ok {
newBuf, err := marshaller.MarshalSSZDyn(ctx.ds, encoder.GetBuffer())
if err != nil {
return err
if encoder.Seekable() || (sourceType.Size > 0 && int(sourceType.Size) <= encoder.MaxEncodeBufferSize()) {
newBuf, err := marshaller.MarshalSSZDyn(ctx.ds, encoder.GetBuffer())
if err != nil {
return err
}
encoder.SetBuffer(newBuf)
return nil
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new MaxEncodeBufferSize()-based delegation behavior (skipping buffer-based marshal methods on stream encoders when the payload is large) doesn’t appear to be covered by tests. Adding a unit test that asserts large values do not invoke FastsszMarshaler/DynamicMarshaler on StreamEncoder (but still do on BufferEncoder) would help prevent regressions.

Copilot uses AI. Check for mistakes.
Comment thread reflection/unmarshal.go
Comment on lines 79 to 90
if useViewUnmarshaler {
if unmarshaler, ok := targetValue.Addr().Interface().(sszutils.DynamicViewUnmarshaler); ok {
if unmarshalFn := unmarshaler.UnmarshalSSZDynView(*targetType.CodegenInfo); unmarshalFn != nil {
bufLen := decoder.GetLength()
buf, err := decoder.DecodeBytesBuf(bufLen)
if err != nil {
return err
if bufLen <= decoder.MaxDecodeBufferSize() {
buf, err := decoder.DecodeBytesBuf(bufLen)
if err != nil {
return err
}
return unmarshalFn(ctx.ds, buf)
}
return unmarshalFn(ctx.ds, buf)
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new MaxDecodeBufferSize()-based delegation guards in the view/fastssz/dynamic unmarshaler paths don’t appear to have test coverage. Consider adding a test where a type implements Dynamic(Un)marshaler (or view unmarshaler) and the input length is above/below the threshold to verify delegation is correctly skipped only when expected.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Benchmark Results

Library Benchmarks

                                                        │ base-lib.txt │             pr-lib.txt             │
                                                        │    sec/op    │    sec/op     vs base              │
TreeFromNodes/zero_limit_returns_empty_node-4             4.056n ± ∞ ¹   4.047n ± ∞ ¹  -0.22% (p=0.040 n=5)
TreeFromNodes/single_node_with_limit_2-4                  33.72n ± ∞ ¹   34.74n ± ∞ ¹  +3.02% (p=0.008 n=5)
TreeFromNodes/two_nodes_with_limit_2-4                    32.38n ± ∞ ¹   33.66n ± ∞ ¹  +3.95% (p=0.008 n=5)
TreeFromNodes/non-power_of_2_limit-4                      25.12n ± ∞ ¹   25.78n ± ∞ ¹  +2.63% (p=0.008 n=5)
TreeFromNodes/four_nodes_with_limit_8-4                   83.50n ± ∞ ¹   87.84n ± ∞ ¹  +5.20% (p=0.008 n=5)
TreeFromNodes/large_limit_with_few_nodes_does_not_OOM-4   324.3n ± ∞ ¹   345.2n ± ∞ ¹  +6.44% (p=0.008 n=5)
¹ need >= 6 samples for confidence interval at level 0.95
geomean                                                   179.0n         182.4n        +1.86%

                                                        │ base-lib.txt  │              pr-lib.txt               │
                                                        │     B/op      │     B/op       vs base                │
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean
geomean                                                               ³                  +0.00%               ³

                                                        │ base-lib.txt │             pr-lib.txt              │
                                                        │  allocs/op   │  allocs/op   vs base                │
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean
geomean                                                              ³                +0.00%               ³

Performance Benchmarks

                                          │ base-perf.txt │             pr-perf.txt             │
                                          │    sec/op     │    sec/op     vs base               │
Codegen_BlockMainnet/Unmarshal-4             58.21µ ± ∞ ¹   53.91µ ± ∞ ¹   -7.38% (p=0.008 n=5)
Codegen_BlockMainnet/UnmarshalReader-4       77.95µ ± ∞ ¹   76.38µ ± ∞ ¹   -2.01% (p=0.008 n=5)
Codegen_BlockMainnet/Marshal-4               21.30µ ± ∞ ¹   18.44µ ± ∞ ¹  -13.43% (p=0.008 n=5)
Codegen_BlockMainnet/HashTreeRoot-4          568.4µ ± ∞ ¹   570.1µ ± ∞ ¹   +0.30% (p=0.016 n=5)
Codegen_StateMainnet/Unmarshal-4             5.798m ± ∞ ¹   5.593m ± ∞ ¹   -3.53% (p=0.016 n=5)
Codegen_StateMainnet/UnmarshalReader-4       11.25m ± ∞ ¹   10.71m ± ∞ ¹   -4.79% (p=0.008 n=5)
Codegen_StateMainnet/Marshal-4               3.526m ± ∞ ¹   2.635m ± ∞ ¹  -25.25% (p=0.008 n=5)
Codegen_BlockMinimal/Unmarshal-4             61.31µ ± ∞ ¹   56.57µ ± ∞ ¹   -7.73% (p=0.032 n=5)
Codegen_BlockMinimal/Marshal-4               23.03µ ± ∞ ¹   20.85µ ± ∞ ¹   -9.45% (p=0.008 n=5)
Reflection_StateMainnet/MarshalWriter-4      5.093m ± ∞ ¹   5.422m ± ∞ ¹   +6.47% (p=0.008 n=5)
Reflection_StateMinimal/MarshalWriter-4      4.381m ± ∞ ¹   4.808m ± ∞ ¹   +9.73% (p=0.008 n=5)
Reflection_StateMinimal/HashTreeRoot-4       61.53m ± ∞ ¹   61.83m ± ∞ ¹   +0.49% (p=0.016 n=5)
¹ need >= 6 samples for confidence interval at level 0.95
geomean                                      757.5µ         742.9µ         -1.93%

                                          │ base-perf.txt │               pr-perf.txt               │
                                          │     B/op      │     B/op       vs base                  │
Codegen_BlockMainnet/MarshalWriter-4        2.422Ki ± ∞ ¹   2.438Ki ± ∞ ¹  +0.65% (p=0.008 n=5)
Codegen_StateMainnet/MarshalWriter-4        2.281Ki ± ∞ ¹   2.297Ki ± ∞ ¹  +0.68% (p=0.008 n=5)
Codegen_BlockMinimal/MarshalWriter-4        2.422Ki ± ∞ ¹   2.438Ki ± ∞ ¹  +0.65% (p=0.008 n=5)
Codegen_StateMinimal/MarshalWriter-4        2.281Ki ± ∞ ¹   2.297Ki ± ∞ ¹  +0.68% (p=0.008 n=5)
Reflection_BlockMainnet/MarshalWriter-4     2.422Ki ± ∞ ¹   2.438Ki ± ∞ ¹  +0.65% (p=0.008 n=5)
Reflection_StateMainnet/MarshalWriter-4     2.281Ki ± ∞ ¹   2.297Ki ± ∞ ¹  +0.68% (p=0.008 n=5)
Reflection_BlockMinimal/MarshalWriter-4     2.422Ki ± ∞ ¹   2.438Ki ± ∞ ¹  +0.65% (p=0.008 n=5)
Reflection_StateMinimal/MarshalWriter-4     2.281Ki ± ∞ ¹   2.297Ki ± ∞ ¹  +0.68% (p=0.008 n=5)
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean
⁴ ratios must be >0 to compute geomean
geomean                                     94.41Ki                        ?                    ³ ⁴

                                          │ base-perf.txt │             pr-perf.txt              │
                                          │   allocs/op   │  allocs/op    vs base                │
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean
geomean                                                 ³                 +0.00%               ³

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants