Skip to content

[management] Prevent JWT reuse during peer login#6002

Merged
bcmmbaga merged 3 commits intomainfrom
fix/session-invalidation
Apr 29, 2026
Merged

[management] Prevent JWT reuse during peer login#6002
bcmmbaga merged 3 commits intomainfrom
fix/session-invalidation

Conversation

@bcmmbaga
Copy link
Copy Markdown
Contributor

@bcmmbaga bcmmbaga commented Apr 27, 2026

Describe your changes

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

  • New Features

    • Added a session-backed token tracking to prevent JWT reuse and enforce expirations during authentication.
    • Authentication now records per-peer token usage to improve session safety.
  • Tests

    • Added comprehensive tests for session store behavior, expiration handling, and token reuse prevention.
    • Updated test harnesses to exercise the new session/token flows.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6b2486e2-15d6-4b52-a9ea-23fac159ff86

📥 Commits

Reviewing files that changed from the base of the PR and between a53cc20 and 359b7e8.

📒 Files selected for processing (2)
  • management/internals/server/boot.go
  • management/server/management_proto_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • management/internals/server/boot.go
  • management/server/management_proto_test.go

📝 Walkthrough

Walkthrough

Adds per-peer JWT claiming to the management gRPC server: a new SessionStore prevents token reuse via cache-backed registration, the server receives the sessionStore dependency, and tests/constructors updated to match the new nbgrpc.NewServer signature.

Changes

Cohort / File(s) Summary
Test helper updates
client/cmd/testutil_test.go, client/internal/engine_test.go, client/server/server_test.go, management/server/management_proto_test.go, management/server/management_test.go, shared/management/client/client_test.go
Updated nbgrpc.NewServer invocations to include one additional trailing parameter (nil) to match the new constructor arity.
Server initialization
management/internals/server/boot.go, management/internals/server/controllers.go
Added BaseServer.SessionStore() and pass its result into nbgrpc.NewServer(...) during server boot.
gRPC server token claiming
management/internals/shared/grpc/server.go
NewServer gains sessionStore *auth.SessionStore field/param; validateToken now accepts peerKey and registers tokens via the session store, mapping claim/registration failures to appropriate gRPC error codes.
Session store implementation
management/server/auth/session.go
New auth.SessionStore with NewSessionStore, RegisterToken, token hashing, TTL calculation, and exported errors ErrTokenAlreadyUsed / ErrTokenExpired.
Session store tests
management/server/auth/session_test.go
New tests verifying RegisterToken behavior, reuse prevention, expiry handling, TTL eviction, and deterministic hashing.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant gRPC as gRPC Server
    participant Auth as Auth Manager
    participant Store as SessionStore
    participant Cache as CacheStore

    Client->>gRPC: Request with jwtToken
    gRPC->>Auth: ValidateAndParseToken(jwtToken)
    Auth-->>gRPC: claims (expiresAt, subject...)
    gRPC->>Store: RegisterToken(jwtToken, expiresAt)
    Store->>Store: hashToken(jwtToken)
    Store->>Cache: Get(hash)
    alt entry exists
        Cache-->>Store: found
        Store-->>gRPC: ErrTokenAlreadyUsed
        gRPC-->>Client: Unauthenticated
    else expired (expiresAt <= now)
        Store-->>gRPC: ErrTokenExpired
        gRPC-->>Client: Unauthenticated
    else not found
        Store->>Cache: Set(hash -> marker, TTL)
        Cache-->>Store: OK
        Store-->>gRPC: nil
        gRPC-->>Client: proceed (OK)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • pascal-fischer
  • mlsmaycon

Poem

🐰 I hopped into code with a stash of cache,
I hashed each token in a lightning flash;
Once claimed, it's marked, no reuse allowed,
Expiry and reuse both make me proud. 🥕

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is largely incomplete with an empty 'Describe your changes' section and missing implementation details. Only checklist items and documentation selection are filled; issue tracking and stack information are absent. Complete the 'Describe your changes' section with details on what was implemented and why. Add the issue ticket number/link and provide stack information if applicable.
Docstring Coverage ⚠️ Warning Docstring coverage is 6.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: implementing JWT reuse prevention during peer login, which aligns with the new SessionStore feature and token validation flow in the gRPC server.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/session-invalidation

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
management/internals/shared/grpc/server.go (1)

542-578: ⚠️ Potential issue | 🔴 Critical

Retry loop is broken: a token claimed on attempt 1 makes attempt 2 always fail with ErrTokenAlreadyUsed.

claimLoginToken is invoked between ValidateAndParseToken and the downstream GetAccountIDFromUserAuth / EnsureUserAccessByJWTGroups / SyncUserJWTGroups calls. If any of those downstream calls returns a transient error, processJwtToken's retry loop re-invokes validateToken with the same JWT, and the second pass now fails at claimLoginToken with ErrTokenAlreadyUsedcodes.Unauthenticated.

Net effect:

  • The "IdP cache issue" retry the comment refers to no longer works for failures past the parse step.
  • A legitimate user can get permanently locked out of further login attempts with that JWT for the entire JWT TTL.
  • The error returned to the client is misleading (always “JWT already used”) rather than the original transient cause.

Fix: claim the token only once the full validation chain has succeeded, so retries on transient downstream errors remain idempotent. Concurrent peer logins with the same JWT are still rejected because RegisterToken is the gating operation just before returning success (assuming the TOCTOU race in SessionStore.RegisterToken is also fixed — see separate comment on session.go).

🐛 Proposed reordering
 func (s *Server) validateToken(ctx context.Context, peerKey, jwtToken string) (string, error) {
 	if s.authManager == nil {
 		return "", status.Errorf(codes.Internal, "missing auth manager")
 	}
 
 	userAuth, token, err := s.authManager.ValidateAndParseToken(ctx, jwtToken)
 	if err != nil {
 		return "", status.Errorf(codes.InvalidArgument, "invalid jwt token, err: %v", err)
 	}
 
-	if err := s.claimLoginToken(ctx, peerKey, jwtToken, token); err != nil {
-		return "", err
-	}
-
 	// we need to call this method because if user is new, we will automatically add it to existing or create a new account
 	accountId, _, err := s.accountManager.GetAccountIDFromUserAuth(ctx, userAuth)
 	if err != nil {
 		return "", status.Errorf(codes.Internal, "unable to fetch account with claims, err: %v", err)
 	}
 
 	if userAuth.AccountId != accountId {
 		log.WithContext(ctx).Debugf("gRPC server sets accountId from ensure, before %s, now %s", userAuth.AccountId, accountId)
 		userAuth.AccountId = accountId
 	}
 
 	userAuth, err = s.authManager.EnsureUserAccessByJWTGroups(ctx, userAuth, token)
 	if err != nil {
 		return "", status.Error(codes.PermissionDenied, err.Error())
 	}
 
 	err = s.accountManager.SyncUserJWTGroups(ctx, userAuth)
 	if err != nil {
 		log.WithContext(ctx).Errorf("gRPC server failed to sync user JWT groups: %s", err)
 	}
 
+	if err := s.claimLoginToken(ctx, peerKey, jwtToken, token); err != nil {
+		return "", err
+	}
+
 	return userAuth.UserId, nil
 }

Also applies to: 869-887

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/internals/shared/grpc/server.go` around lines 542 - 578, The
validateToken flow currently calls claimLoginToken before downstream checks,
causing retries to fail with ErrTokenAlreadyUsed; move the claimLoginToken call
to after successful GetAccountIDFromUserAuth, EnsureUserAccessByJWTGroups and
SyncUserJWTGroups (i.e., only claim/register the JWT once the full validation
chain has succeeded) so transient downstream errors can be retried idempotently;
update the same change for the duplicate block around the second occurrence
(lines noted in the review) and ensure the final success path performs the token
registration/claiming (keeping RegisterToken as the gating operation before
returning success).
🧹 Nitpick comments (2)
management/server/auth/session_test.go (1)

60-73: Time-based TTL eviction test may be flaky on slow CI.

A 50 ms TTL with a 120 ms sleep leaves only a ~70 ms safety margin, and any GC pause / scheduler hiccup on a busy CI runner can cause RegisterToken after the sleep to still see the entry and surface ErrTokenAlreadyUsed, failing the test intermittently. Consider widening the gap (e.g., 50 ms TTL + 500 ms sleep) or polling with a deadline rather than a single fixed sleep.

♻️ Proposed change
-	require.NoError(t, s.RegisterToken(ctx, token, time.Now().Add(50*time.Millisecond)))
+	ttl := 50 * time.Millisecond
+	require.NoError(t, s.RegisterToken(ctx, token, time.Now().Add(ttl)))
 
-	err := s.RegisterToken(ctx, token, time.Now().Add(50*time.Millisecond))
+	err := s.RegisterToken(ctx, token, time.Now().Add(ttl))
 	assert.ErrorIs(t, err, ErrTokenAlreadyUsed)
 
-	time.Sleep(120 * time.Millisecond)
+	time.Sleep(ttl + 500*time.Millisecond)
 
 	require.NoError(t, s.RegisterToken(ctx, token, time.Now().Add(time.Hour)))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/auth/session_test.go` around lines 60 - 73, The TTL
eviction test TestSessionStore_EntryEvictsAtTTLAndAllowsReRegistration is flaky
due to tight timing: widen the margin by either increasing the TTL and sleep
(e.g., use 50ms TTL + 500ms sleep or set both to larger values) or replace the
fixed sleep with a polling loop that retries RegisterToken until success or a
deadline expires; update calls to RegisterToken and assertions around
ErrTokenAlreadyUsed accordingly so the test reliably observes eviction before
attempting re-registration.
management/internals/shared/grpc/server.go (1)

839-842: Silently bypassing claim when sessionStore == nil weakens the protection in production.

If a future code path (or a misconfigured boot) constructs Server without a sessionStore, JWT reuse protection silently disappears with no signal. Since the production constructor in boot.go always wires one, consider treating a missing sessionStore as a programmer error in production builds (or at least logging once at startup) rather than silently no-oping per request.

If the nil is exclusively to keep tests compiling, an alternative is to make the constructor tolerate nil once at init time and have RegisterToken always be called via a non-nil interface (e.g., a no-op implementation injected in tests). Not blocking — keep current behavior if the test surface justifies it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/internals/shared/grpc/server.go` around lines 839 - 842, The
claimLoginToken function currently silently returns when s.sessionStore is nil,
removing JWT reuse protection; change this by making a clear, deterministic
handling: ensure the Server constructor (or boot.go wiring) requires a non-nil
sessionStore or injects a no-op implementation for tests, and add a startup-time
check that logs or fails fast if a real sessionStore is absent in production
builds; alternatively, if you prefer a per-request guard, replace the silent
return in claimLoginToken with an explicit log/error (and optionally a once-only
metric) when s.sessionStore is nil so missing wiring is visible; reference:
claimLoginToken, Server constructor/boot.go, and the sessionStore.RegisterToken
call to locate the change points.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@management/server/auth/session.go`:
- Around line 33-56: The RegisterToken flow in SessionStore uses a non-atomic
Get-then-Set (s.cache.Get + s.cache.Set) allowing a TOCTOU race; fix by ensuring
atomic check-and-set: for single-instance deployments add an in-process lock
(e.g., a sync.Mutex or sharded mutex on the usedTokenKeyPrefix+hashToken(token)
key) to serialize RegisterToken calls before calling s.cache.Get/Set, or for
multi-instance Redis-backed caches replace the unconditional cache.Set with an
atomic SET ... NX (with expiration) against Redis so the Set is only successful
when the key did not previously exist; update RegisterToken to use the chosen
approach and keep the existing error handling
(ErrTokenAlreadyUsed/ErrTokenExpired).

---

Outside diff comments:
In `@management/internals/shared/grpc/server.go`:
- Around line 542-578: The validateToken flow currently calls claimLoginToken
before downstream checks, causing retries to fail with ErrTokenAlreadyUsed; move
the claimLoginToken call to after successful GetAccountIDFromUserAuth,
EnsureUserAccessByJWTGroups and SyncUserJWTGroups (i.e., only claim/register the
JWT once the full validation chain has succeeded) so transient downstream errors
can be retried idempotently; update the same change for the duplicate block
around the second occurrence (lines noted in the review) and ensure the final
success path performs the token registration/claiming (keeping RegisterToken as
the gating operation before returning success).

---

Nitpick comments:
In `@management/internals/shared/grpc/server.go`:
- Around line 839-842: The claimLoginToken function currently silently returns
when s.sessionStore is nil, removing JWT reuse protection; change this by making
a clear, deterministic handling: ensure the Server constructor (or boot.go
wiring) requires a non-nil sessionStore or injects a no-op implementation for
tests, and add a startup-time check that logs or fails fast if a real
sessionStore is absent in production builds; alternatively, if you prefer a
per-request guard, replace the silent return in claimLoginToken with an explicit
log/error (and optionally a once-only metric) when s.sessionStore is nil so
missing wiring is visible; reference: claimLoginToken, Server
constructor/boot.go, and the sessionStore.RegisterToken call to locate the
change points.

In `@management/server/auth/session_test.go`:
- Around line 60-73: The TTL eviction test
TestSessionStore_EntryEvictsAtTTLAndAllowsReRegistration is flaky due to tight
timing: widen the margin by either increasing the TTL and sleep (e.g., use 50ms
TTL + 500ms sleep or set both to larger values) or replace the fixed sleep with
a polling loop that retries RegisterToken until success or a deadline expires;
update calls to RegisterToken and assertions around ErrTokenAlreadyUsed
accordingly so the test reliably observes eviction before attempting
re-registration.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1f6646b0-8e40-46a3-a730-7be0ae8b93d4

📥 Commits

Reviewing files that changed from the base of the PR and between 154b816 and a53cc20.

📒 Files selected for processing (11)
  • client/cmd/testutil_test.go
  • client/internal/engine_test.go
  • client/server/server_test.go
  • management/internals/server/boot.go
  • management/internals/server/controllers.go
  • management/internals/shared/grpc/server.go
  • management/server/auth/session.go
  • management/server/auth/session_test.go
  • management/server/management_proto_test.go
  • management/server/management_test.go
  • shared/management/client/client_test.go

Comment thread management/server/auth/session.go
@sonarqubecloud
Copy link
Copy Markdown

@bcmmbaga bcmmbaga merged commit df197d5 into main Apr 29, 2026
63 of 65 checks passed
@bcmmbaga bcmmbaga deleted the fix/session-invalidation branch April 29, 2026 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants