Skip to content

use LRU eviction to prevent session key data loss in the gateway#2836

Open
zkokelj wants to merge 1 commit intomainfrom
ziga/fix_gw_sk_expiration_capacity_issue
Open

use LRU eviction to prevent session key data loss in the gateway#2836
zkokelj wants to merge 1 commit intomainfrom
ziga/fix_gw_sk_expiration_capacity_issue

Conversation

@zkokelj
Copy link
Copy Markdown
Contributor

@zkokelj zkokelj commented Mar 3, 2026

Why this change is needed
The SessionKeyActivityTracker has a hard limit of 100k entries. When this limit is reached, new session key activities are silently dropped (those keys will never expire and funds).

What changes were made as part of this PR
I replaced the simple map with an LRU cache. Instead of dropping new entries at capacity, we now evict the oldest entry and persist it to CosmosDB via an async batch writer.

Changes:

session_key_activity.go — LRU cache using container/list with O(1) eviction; background goroutine batches evicted entries (100 items or 5s flush interval) and writes to CosmosDB
session_key_expiration.go — Expiration check now queries both in-memory cache AND CosmosDB, merging results to catch entries that were evicted from memory
session_key_activity_storage.go — Added SaveBatch(), ListOlderThan(), and Delete() to storage interface
cosmosdb/session_key_activity_storage.go — Implemented incremental upsert (read-merge-write) for batched persistence
Graceful shutdown flushes pending writes before exit
Write frequency impact: No additional CosmosDB writes unless cache exceeds 100k entries. When evictions occur, writes are batched efficiently (~1-2 shard writes per 100 evictions).

@zkokelj zkokelj force-pushed the ziga/fix_gw_sk_expiration_capacity_issue branch from 8d66b37 to fb1d92a Compare March 3, 2026 10:00
func (t *sessionKeyActivityTracker) Stop() {
t.stopOnce.Do(func() {
close(t.stopChan)
if t.persistQueue != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to close this as well - the stopChan calls close in stop_control.go?


// store all activities in the database to make them persistent and recoverable in case of restart
allActivities := s.activityTracker.ListAll()
_ = s.activityStorage.Save(allActivities)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misunderstanding the code but won't this overwrite potentially expired entries that occur between eviction the time expiration threshold?

  1. session key A gets evicted (as its hit limit) from memory and sent to cosmos via the persist queue
  2. key A is in cosmos and not in the in-memory tracker
  3. expiration loop firsts sessionKeyExpirations() but key A isn't past the expiration
  4. ListAll() returns only the memory (key A missing)
  5. Save writes that list and overwrites and deletes key A?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants