feat: optimize rpc calls by sbackend123 · Pull Request #5394 · ethersphere/bee

sbackend123 · 2026-03-12T14:08:56Z

Checklist

I have read the coding guide.
My change requires a documentation update, and I have done it.
I have added tests to cover my changes.
I have filled out the description and linked the related issues.

Description

Removed RPC patterns in the transaction flow. Added cache layer for BlockNumber rpc call.
Fixed lint issues in the new cache package and cleaned up minor lint findings in adjacent test/helper code.

Open API Spec Version Changes (if applicable)

Motivation and Context (Optional)

Related Issue (Optional)

#5388

Screenshots (if appropriate):

Highly likely hit ratio is low because of high load errors (EOF), which is fix by PR

Added cach layer Remove redundant calls.

gacevicljubisa · 2026-03-16T12:59:09Z

cmd/bee/cmd/deploy.go

 				blocktime,
 				true,
 				c.config.GetUint64(optionNameMinimumGasTipCap),
+				blocktime-2,


Just to note that current blocktime on our CI is 1s. So this way caching is not used there... How we could achive to use it there?

In deploy.go blockTime is hardcoded by itself. But if we are talking about CI, normally cacheTTL is about 85% of block time, so even if it is 1 second, than we will get 850 ms

This is very strange, as block time is defined as const blocktime = 15 on line 16, which is actually a nanosecond. As that is passed in poolingInterval, it looks to me that it is a very short time, I suppose that block time should be in minutes, not nanoseconds. Luckily, this is only for the deploy command. Bee has its own block time passed from options. I believe that line 16 in this file should look like this const blocktime = 15 * time.Minute, which is not related to this PR.

Also what happens if blocktime value is 1 (if someone changes the constant) and we have a negative duration? If the constant is changed to the much longer value like 15*time.Minute, literal 2 in here would reduce only two nanoseconds from that large number.

gacevicljubisa · 2026-03-16T13:08:15Z

pkg/bmt/proof_test.go

 		t.Helper()

-		var expSegments [][]byte
+		expSegments := make([][]byte, 0, len(exp))


These are fine but feel like they belong in a separate PR.

Yes, but linter was failing so I decided to fix everything. Next time will create separate PR

gacevicljubisa · 2026-03-16T13:09:01Z

pkg/node/chain.go

 	pollingInterval time.Duration,
 	chainEnabled bool,
 	minimumGasTipCap uint64,
+	blockCacheTTl time.Duration,


Typo blockCacheTTl -> blockCacheTTL

gacevicljubisa · 2026-03-16T13:12:37Z

pkg/transaction/wrapped/cache/metrics.go

+	}
+}
+
+func NewMetrics() Metrics {


Seems not used (dead code).

gacevicljubisa · 2026-03-16T13:16:52Z

pkg/transaction/wrapped/cache/cache.go

+			c.metrics.LoadErrors.Inc()
+			return val, err
+		}
+		c.Set(val, time.Now())


Should we use here now from argument?

What could happen if loader takes a while?

I do not think so because there is difference between time when we request value and time, when we set new value if we had to load it.

gacevicljubisa · 2026-03-16T13:40:07Z

Suggestion: Instead of a fixed TTL (blocktime*85/100, blocktime-2) + BlockNumber RPC, use a single HeaderByNumber(nil) call periodically to get both the block number and timestamp, then extrapolate between syncs with zero RPC calls.

How it works:

Every ~20 blocks, call HeaderByNumber(nil) (latest) — returns block number + timestamp in one RPC call. Store as anchor point.
Between syncs, estimate the current block from pure math: currentBlock = anchorBlock + (now - anchorTimestamp) / blockTime. No RPC calls at all.
Cache TTL: anchorTimestamp + (currentBlock - anchorBlock + 1) * blockTime - now.
?

janos · 2026-03-20T22:08:09Z

pkg/transaction/wrapped/cache/cache.go

@@ -0,0 +1,107 @@
+// Copyright 2025 The Swarm Authors. All rights reserved.


janos · 2026-03-20T22:08:16Z

pkg/transaction/wrapped/cache/cache_test.go

@@ -0,0 +1,189 @@
+// Copyright 2025 The Swarm Authors. All rights reserved.


janos · 2026-03-20T22:08:31Z

pkg/transaction/wrapped/cache/keys.go

@@ -0,0 +1,10 @@
+// Copyright 2025 The Swarm Authors. All rights reserved.


janos · 2026-03-20T22:08:38Z

pkg/transaction/wrapped/cache/metrics.go

@@ -0,0 +1,84 @@
+// Copyright 2025 The Swarm Authors. All rights reserved.


janos · 2026-03-20T22:14:52Z

pkg/transaction/wrapped/cache/cache.go

+	expiresAt time.Time
+
+	group   singleflight.Group
+	key     Key


The key here is only used in metrics and in c.group.Do as the key, but that key is always the same as the singleflight group is the same and the key is the same for the same instance of the ExpiringSingleFlightCache type. I owuld suggest to remove the Key type and just to provide the metrics prefix string in the constructor NewExpiringSingleFlightCache.

janos · 2026-03-20T22:17:51Z

pkg/transaction/wrapped/cache/cache_test.go

+		}(i)
+	}
+
+	time.Sleep(50 * time.Millisecond)


Could synctest be used here to avoid sleeping and potentially have a flaky test as synchronization not happen in 50ms?

janos · 2026-03-20T22:19:26Z

pkg/transaction/wrapped/cache/cache.go

+
+	c.metrics.Misses.Inc()
+
+	result, err, shared := c.group.Do(string(c.key), func() (any, error) {


This group is always calling the same key, so it can be even the static string.

janos · 2026-03-20T22:25:07Z

pkg/transaction/wrapped/wrapped.go

+		b.metrics.TotalRPCCalls.Inc()
+		b.metrics.BlockNumberCalls.Inc()
+
+		blockNumber, err := b.backend.BlockNumber(ctx)


The problem with the golang.org/x/sync/singleflight is that it is not context.Context aware. The context that is passed from the first caller will influence the end result of all other callers of this function. Meaning, if the first caller cancels or times out on the context, all other callers will receive that error even if they did not cancel or the timeout for their call did not happen. This is why I've created the resenje.org/signleflight, that is context aware and will not terminate the execution of the function until all callers terminate their contexts. There are a few places where resenje.org/singleflight is used in bee, if you like, you can look at it and consider using if you see important for this case. Basically GetOrLoad would require to accept context, pass to signleflight.Do and the context from the callback would be used in b.backend.BlockNumber.

janos · 2026-03-20T22:38:41Z

cmd/bee/cmd/deploy.go

 				blocktime,
 				true,
 				c.config.GetUint64(optionNameMinimumGasTipCap),
+				blocktime-2,


This is very strange, as block time is defined as const blocktime = 15 on line 16, which is actually a nanosecond. As that is passed in poolingInterval, it looks to me that it is a very short time, I suppose that block time should be in minutes, not nanoseconds. Luckily, this is only for the deploy command. Bee has its own block time passed from options. I believe that line 16 in this file should look like this const blocktime = 15 * time.Minute, which is not related to this PR.

Also what happens if blocktime value is 1 (if someone changes the constant) and we have a negative duration? If the constant is changed to the much longer value like 15*time.Minute, literal 2 in here would reduce only two nanoseconds from that large number.

janos · 2026-03-20T22:50:02Z

pkg/transaction/wrapped/cache/cache.go

+
+	c.value = value
+	c.valid = true
+	c.expiresAt = now.Add(c.ttl)


Ideally, for the block number caching, the expiration time should be the time when the next block number is expected and that is around the block time interval. This is to have as least as possible time when the cache is reporting the older block number comapred what is the current block number.

In this generalized caching implementation, it is hard to generalize such requirement as it is very specific to how the block number is increased (in some ~ period, that is configurable). So the ideal situation would be to calculate what would be the time of the next block number and to set expiresAt to that time, or a bit later. And to watch if the block number is actually increased.

In the current implementation, the worst case would be that the cache will keep the block number cache up to the block time (15 minutes, for example on some networks) when the actual block number is increased, if the value is set on the cache a moment before the new block number. On average, statistically in a very large sample of caching, that time would be half of the block time (7 minutes).

I am not sure if such precision is needed, and if the delay of the new block number of up to the whole block time is acceptable. This is just my observation.

gacevicljubisa · 2026-03-24T10:25:06Z

I don't see how this cache meaningfully reduces RPC calls. The BlockNumber callers (postage listener, storage incentives, tx confirmation, API) run on independent staggered timers and rarely overlap within the same block window. With the TTL at 85% of block time, each caller almost always finds the cache expired by its next poll, resulting in a fresh RPC call anyway. The singleflight deduplication only helps when callers hit BlockNumber at the exact same instant, which is uncommon given the staggered schedules. On Gnosis mainnet the block time is only 5s, so the cache TTL would be ~4.25s — hardly enough to span multiple independent caller cycles.

If we do want to reduce BlockNumber RPC calls, a more effective approach would be to use a single HeaderByNumber(nil) call periodically (e.g. every ~20 blocks, or more) to get both the block number and timestamp, then extrapolate between syncs: currentBlock = anchorBlock + (now - anchorTimestamp) / blockTime (mentioned here).

The filterPendingTransactions refactor (eliminating redundant TransactionByHash calls in nextNonce) is a genuine improvement and worth keeping.

@janos wdyt?

janos · 2026-03-24T11:44:25Z

I don't see how this cache meaningfully reduces RPC calls. The BlockNumber callers (postage listener, storage incentives, tx confirmation, API) run on independent staggered timers and rarely overlap within the same block window. With the TTL at 85% of block time, each caller almost always finds the cache expired by its next poll, resulting in a fresh RPC call anyway. The singleflight deduplication only helps when callers hit BlockNumber at the exact same instant, which is uncommon given the staggered schedules. On Gnosis mainnet the block time is only 5s, so the cache TTL would be ~4.25s — hardly enough to span multiple independent caller cycles.

If we do want to reduce BlockNumber RPC calls, a more effective approach would be to use a single HeaderByNumber(nil) call periodically (e.g. every ~20 blocks, or more) to get both the block number and timestamp, then extrapolate between syncs: currentBlock = anchorBlock + (now - anchorTimestamp) / blockTime (mentioned here).

The filterPendingTransactions refactor (eliminating redundant TransactionByHash calls in nextNonce) is a genuine improvement and worth keeping.

@janos wdyt?

Yes, given that block numbers are changing very frequently on 5 to 12s depending on the network (gnosis and sepolia), but not exactly the same period every time. It is a very good suggestion to calculate block number as you described to reduce even more the frequency of rpc call to get the block number and estimate the block time by HeaderByNumber. In that case even signleflight is not needed, as internally, block numbers will always be returned by its specific cache.

I would even go further and use the block time value that is calculated from the HeaderByNumber instead specifying it using options statically.

The consequence could be that it is required to get the block number as the node startup as both block number and block time are needed as known values. Maybe that is even good to do, and to exit the application in case that there are problems with the rpc endpoint when getting the block number on start. Just thinking.

sbackend123 · 2026-03-25T15:13:35Z

Suggestion: Instead of a fixed TTL (blocktime*85/100, blocktime-2) + BlockNumber RPC, use a single HeaderByNumber(nil) call periodically to get both the block number and timestamp, then extrapolate between syncs with zero RPC calls.

How it works:
* Every ~20 blocks, call `HeaderByNumber(nil)` (latest) — returns block number + timestamp in one RPC call. Store as anchor point.

* Between syncs, estimate the current block from pure math: `currentBlock = anchorBlock + (now - anchorTimestamp) / blockTime`. No RPC calls at all.

* Cache TTL: `anchorTimestamp + (currentBlock - anchorBlock + 1) * blockTime - now`.
  ?

For me it sounds like overengineering a little bit from one side, and not so flexible implementations from another side (may be I miss smth?). I thought about whatif would like to cache another type of value? Than

I don't see how this cache meaningfully reduces RPC calls. The BlockNumber callers (postage listener, storage incentives, tx confirmation, API) run on independent staggered timers and rarely overlap within the same block window. With the TTL at 85% of block time, each caller almost always finds the cache expired by its next poll, resulting in a fresh RPC call anyway. The singleflight deduplication only helps when callers hit BlockNumber at the exact same instant, which is uncommon given the staggered schedules. On Gnosis mainnet the block time is only 5s, so the cache TTL would be ~4.25s — hardly enough to span multiple independent caller cycles.
If we do want to reduce BlockNumber RPC calls, a more effective approach would be to use a single HeaderByNumber(nil) call periodically (e.g. every ~20 blocks, or more) to get both the block number and timestamp, then extrapolate between syncs: currentBlock = anchorBlock + (now - anchorTimestamp) / blockTime (mentioned here).
The filterPendingTransactions refactor (eliminating redundant TransactionByHash calls in nextNonce) is a genuine improvement and worth keeping.
@janos wdyt?

Yes, given that block numbers are changing very frequently on 5 to 12s depending on the network (gnosis and sepolia), but not exactly the same period every time. It is a very good suggestion to calculate block number as you described to reduce even more the frequency of rpc call to get the block number and estimate the block time by HeaderByNumber. In that case even signleflight is not needed, as internally, block numbers will always be returned by its specific cache.

I would even go further and use the block time value that is calculated from the HeaderByNumber instead specifying it using options statically.

The consequence could be that it is required to get the block number as the node startup as both block number and block time are needed as known values. Maybe that is even good to do, and to exit the application in case that there are problems with the rpc endpoint when getting the block number on start. Just thinking.

I have couple of concerns:

Drift over time: block production is not perfectly uniform, so we can accumulate error between syncs, especially if the resync interval is relatively large.
Risk of overshooting: extrapolation may produce a block number that has not actually been produced yet
With this approach we resolve problem with block_number call only. If in future we would need to reduce some other rpc calls, we might need to add smth else.

I think, there can be other corner cases, which would be more difficult to catch, since debugging will become more complex.

sbackend123 changed the title ~~RPC calls optimisation~~ Feat: RPC calls optimisation Mar 12, 2026

sbackend123 force-pushed the feat/rpc-calls-optimisation branch from f8ca4e2 to e55ac11 Compare March 13, 2026 07:54

sbackend123 added 4 commits March 13, 2026 08:58

feat: rpc calls optimisation

ad48778

Added cach layer Remove redundant calls.

Merge branch 'master' into feat/rpc-calls-optimisation

1da63cc

fix: linter issues

68ecefb

Merge branch 'master' into feat/rpc-calls-optimisation

04f6520

sbackend123 force-pushed the feat/rpc-calls-optimisation branch from e55ac11 to 04f6520 Compare March 13, 2026 08:00

sbackend123 changed the title ~~Feat: RPC calls optimisation~~ feat: optimize rpc calls Mar 13, 2026

fix: enable cache metrics

1ff4f75

sbackend123 marked this pull request as ready for review March 16, 2026 12:31

sbackend123 requested review from gacevicljubisa and janos March 16, 2026 12:32

gacevicljubisa reviewed Mar 16, 2026

View reviewed changes

pkg/transaction/wrapped/cache/metrics.go Outdated

}

}

func NewMetrics() Metrics {

Copy link
Copy Markdown

Member

gacevicljubisa Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems not used (dead code).

gacevicljubisa reviewed Mar 16, 2026

View reviewed changes

sbackend123 added 2 commits March 17, 2026 14:44

fix: review issues

095f4af

fix: dead code

e3b1aa5

janos reviewed Mar 20, 2026

View reviewed changes

sbackend123 added 2 commits March 25, 2026 16:22

fix: update cache implementation

f8e3049

fix: wrapped backend test

e6945ef

		@@ -0,0 +1,107 @@
		// Copyright 2025 The Swarm Authors. All rights reserved.

		@@ -0,0 +1,189 @@
		// Copyright 2025 The Swarm Authors. All rights reserved.

		@@ -0,0 +1,10 @@
		// Copyright 2025 The Swarm Authors. All rights reserved.

		@@ -0,0 +1,84 @@
		// Copyright 2025 The Swarm Authors. All rights reserved.


		c.metrics.Misses.Inc()

		result, err, shared := c.group.Do(string(c.key), func() (any, error) {

Conversation

sbackend123 commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Description

Open API Spec Version Changes (if applicable)

Motivation and Context (Optional)

Related Issue (Optional)

Screenshots (if appropriate):

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gacevicljubisa commented Mar 16, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gacevicljubisa commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

janos commented Mar 24, 2026

Uh oh!

sbackend123 commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sbackend123 commented Mar 12, 2026 •

edited

Loading

gacevicljubisa commented Mar 24, 2026 •

edited

Loading

sbackend123 commented Mar 25, 2026 •

edited

Loading