Skip to content

Funky pagination error (vibe coded fix with Opus) on big Entra tenants with massive groups#189

Open
fjodoin wants to merge 2 commits intoSpecterOps:mainfrom
fjodoin:main
Open

Funky pagination error (vibe coded fix with Opus) on big Entra tenants with massive groups#189
fjodoin wants to merge 2 commits intoSpecterOps:mainfrom
fjodoin:main

Conversation

@fjodoin
Copy link
Copy Markdown

@fjodoin fjodoin commented Apr 22, 2026

Vibe coded a funky pagination error when using azurehound on an enormous Entra tenant with massive groups.

This pull request is to confirm if the patch is sound 😬

(also added .claude to the .gitignore)

ERROR:
ERR unable to continue processing members for this group error="map[error:map[code:Directory_ExpiredPageToken innerError:map[client-request-id:123...456 date:2026-04-21T17:58:12 request-id:123...456] message:The specified page token value has expired and can no longer be included in your request.]]" groupId=123...456

The "token value has expired" is not an OAuth expiration, but rather (according to Claude Opus), it is a pagination error.

"""
The error is Directory_ExpiredPageToken — not the OAuth access token you're thinking of (the 1h one). It's the Microsoft Graph pagination cursor embedded in the @odata.nextLink URL (an opaque $skiptoken). That cursor has a much shorter, undocumented TTL than your JWT, and it expires when a single group's pagination session stretches too long.

Three things in the AzureHound code conspire to stretch it:

  1. No default $top for group members. client/groups.go:63-72 doesn't set params.Top, so Graph's implicit default (~100) is used. A 100k-member group is ~1000 page round-trips. Every sibling endpoint (users, devices, SPs, app role assignments) sets Top=999 — group members is the odd one out.

  2. 25 parallel streams hammer Graph. cmd/list-group-members.go:70 fans out with config.ColStreamCount (default 25). Graph throttles, the 429 handler at client/rest/client.go:205 sleeps for Retry-After between every page. Enough elapsed wall-clock time between pages for that specific group and its cursor expires.

  3. Current behavior silently accepts partial data. cmd/list-group-members.go:112-114 logs item.Error but keeps going and emits whatever it had collected — so you got an incomplete membership set for group 123..., not just a noisy log.
    Also, client/rest/client.go:218-226 stringifies 4xx bodies via fmt.Errorf("%v", errRes), which is why the error code is buried in a stringified map — callers can't programmatically detect it to retry.
    """

  • Claude Opus

Summary by CodeRabbit

  • Bug Fixes
    • Group member enumeration now automatically retries when pagination token expiration occurs, ensuring more complete results for large groups.
  • Improvements
    • Enhanced error handling and reporting for Microsoft Graph API responses with detailed status codes and messages.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 22, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

Walkthrough

This PR introduces structured error handling for Microsoft Graph REST API responses and implements retry logic for handling expired pagination tokens during group member enumeration. A new GraphError type captures HTTP status code, error code, and message from Graph responses, while retry logic in group enumeration detects expired tokens and reattempts collection. The .claude/ directory is also added to gitignore.

Changes

Cohort / File(s) Summary
Error Infrastructure
client/rest/errors.go, client/rest/client.go
New GraphError type with StatusCode, Code, and Message fields implementing the error interface. Client error handler now parses response body into Graph-shaped error envelope, returns typed GraphError when code is present, and falls back to generic map-based error for malformed responses.
Pagination Retry Logic
cmd/list-group-members.go
Extended group member enumeration with retry mechanism (up to 4 attempts) that detects expired pagination tokens via IsExpiredPageToken, resets member collection on each retry, and logs progress info before retrying with exhausted token detection.
Configuration
.gitignore
Added .claude/ directory exclusion to gitignore.

Sequence Diagram(s)

sequenceDiagram
    participant Client as REST Client
    participant API as Microsoft Graph API
    participant Handler as Error Handler
    participant Detector as Token Expiration<br/>Detector
    participant Retry as Retry Logic

    Client->>API: Request group members
    API-->>Client: Error response (e.g., expired token)
    Client->>Handler: Parse response body
    Handler->>Detector: Create GraphError with code
    Detector->>Detector: Check if code ==<br/>"Directory_ExpiredPageToken"
    Detector-->>Retry: IsExpiredPageToken() = true
    Retry->>Retry: Increment attempt counter<br/>(< 4)
    Retry->>Client: Retry request
    Client->>API: Request group members (retry)
    API-->>Client: Success response
    Client-->>Retry: Members enumerated
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A token expires, the loop must retry,
Graph errors now structured, no mystery why,
Four chances to gather those members so dear,
With .claude/ hidden, the codebase is clear! ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title is vague and unprofessional, using informal language ('Funky', 'vibe coded') that obscures the actual change: detecting and retrying on Directory_ExpiredPageToken errors. Use a clear, professional title like 'Add retry logic for Directory_ExpiredPageToken in group member enumeration' to accurately describe the substantive change.
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cmd/list-group-members.go (1)

73-80: ⚠️ Potential issue | 🟠 Major

Set the group-member page size to reduce cursor lifetime pressure.

Top: 0 leaves Graph on its default page size of 100 members per page, so large groups require substantially more pagination requests than necessary and are more likely to hit Directory_ExpiredPageToken. Microsoft's List group members documentation specifies this endpoint supports $top with a maximum of 999: https://learn.microsoft.com/en-us/graph/api/group-list-members?view=graph-rest-1.0

Additionally, the current retry logic (lines 117-152) sends collected data regardless of retry exhaustion status—after all 4 attempts fail with expired token errors, the incomplete membership data from the final attempt is still emitted as a successful result (lines 154-157).

🐛 Proposed fix
 		params  = query.GraphParams{
 			Select: unique(listGroupMembersSelect),
 			Filter: "",
 			Count:  false,
 			Search: "",
-			Top:    0,
+			Top:    999,
 			Expand: "",
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/list-group-members.go` around lines 73 - 80, Set the Graph page size in
the params GraphParams by changing Top from 0 to the max allowed 999 (so set
params.Top = 999 in the params = query.GraphParams{...} block referencing
listGroupMembersSelect) to reduce pagination and expired page-token pressure;
also fix the retry logic that currently emits partial membership data after
retry exhaustion by ensuring the member-emission code only runs on a successful
fetch—if all retry attempts fail (the loop that collects members and checks for
Directory_ExpiredPageToken), return/propagate an error instead of sending the
incomplete results so callers do not receive partial data.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/list-group-members.go`:
- Around line 113-156: The loop currently always sends `data`
(models.GroupMembers) via `pipeline.SendAny` even when enumeration was
incomplete; change the logic to only emit a successful `KindAZGroupMember` when
the enumeration fully completed. Add a completion flag (e.g., `complete :=
true`) before iterating results from `client.ListAzureADGroupMembers`, set
`complete = false` when you encounter any non-expired error or when
expired-page-token retries are exhausted (`expired` true and `attempt ==
maxAttempts`), and only call `pipeline.SendAny` with the `AzureWrapper{ Kind:
enums.KindAZGroupMember, Data: data }` when `complete` is true; otherwise send
an error/failed wrapper or skip sending the member result and log the failure so
downstream consumers cannot treat partial `data` as a complete group membership.

---

Outside diff comments:
In `@cmd/list-group-members.go`:
- Around line 73-80: Set the Graph page size in the params GraphParams by
changing Top from 0 to the max allowed 999 (so set params.Top = 999 in the
params = query.GraphParams{...} block referencing listGroupMembersSelect) to
reduce pagination and expired page-token pressure; also fix the retry logic that
currently emits partial membership data after retry exhaustion by ensuring the
member-emission code only runs on a successful fetch—if all retry attempts fail
(the loop that collects members and checks for Directory_ExpiredPageToken),
return/propagate an error instead of sending the incomplete results so callers
do not receive partial data.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3e31c306-2ad2-4b04-9e9c-364a54dddd1b

📥 Commits

Reviewing files that changed from the base of the PR and between 1e211df and 5b80968.

📒 Files selected for processing (4)
  • .gitignore
  • client/rest/client.go
  • client/rest/errors.go
  • cmd/list-group-members.go

Comment thread cmd/list-group-members.go
Comment on lines 113 to 156
var (
data = models.GroupMembers{
GroupId: id,
}
count = 0
data models.GroupMembers
count int
)
for item := range client.ListAzureADGroupMembers(ctx, id, params) {
if item.Error != nil {
log.Error(item.Error, "unable to continue processing members for this group", "groupId", id)
} else {
groupMember := models.GroupMember{
Member: item.Ok,
GroupId: id,
for attempt := 1; attempt <= maxAttempts; attempt++ {
data = models.GroupMembers{GroupId: id}
count = 0
expired := false

for item := range client.ListAzureADGroupMembers(ctx, id, params) {
if item.Error != nil {
if rest.IsExpiredPageToken(item.Error) {
expired = true
log.Info("page token expired mid-enumeration, will retry group",
"groupId", id, "attempt", attempt, "collectedSoFar", count)
// Producer already returned after emitting the error;
// the channel will close and this range will exit.
continue
}
log.Error(item.Error, "unable to continue processing members for this group",
"groupId", id, "attempt", attempt)
} else {
groupMember := models.GroupMember{
Member: item.Ok,
GroupId: id,
}
log.V(2).Info("found group member", "groupId", groupMember.GroupId)
count++
data.Members = append(data.Members, groupMember)
}
log.V(2).Info("found group member", "groupId", groupMember.GroupId)
count++
data.Members = append(data.Members, groupMember)
}

if !expired {
break
}
if attempt == maxAttempts {
log.Error(fmt.Errorf("exhausted %d retries due to Directory_ExpiredPageToken", maxAttempts),
"group members may be incomplete", "groupId", id, "partialCount", count)
}
}

if ok := pipeline.SendAny(ctx.Done(), out, AzureWrapper{
Kind: enums.KindAZGroupMember,
Data: data,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t emit partial memberships as a successful result.

After a non-expired error, or after all expired-page-token attempts fail, data still gets sent as KindAZGroupMember. That preserves the original partial-data corruption mode: downstream consumers can’t distinguish incomplete group membership from a complete enumeration.

🐛 Proposed guard against sending incomplete data
 				var (
-					data  models.GroupMembers
-					count int
+					data      models.GroupMembers
+					count     int
+					completed bool
 				)
 				for attempt := 1; attempt <= maxAttempts; attempt++ {
 					data = models.GroupMembers{GroupId: id}
 					count = 0
 					expired := false
+					failed := false
 
 					for item := range client.ListAzureADGroupMembers(ctx, id, params) {
 						if item.Error != nil {
 							if rest.IsExpiredPageToken(item.Error) {
 								expired = true
 								log.Info("page token expired mid-enumeration, will retry group",
 									"groupId", id, "attempt", attempt, "collectedSoFar", count)
 								// Producer already returned after emitting the error;
 								// the channel will close and this range will exit.
 								continue
 							}
+							failed = true
 							log.Error(item.Error, "unable to continue processing members for this group",
 								"groupId", id, "attempt", attempt)
 						} else {
 							groupMember := models.GroupMember{
 								Member:  item.Ok,
@@
 						}
 					}
 
-					if !expired {
+					if !expired && !failed {
+						completed = true
 						break
 					}
+					if failed {
+						break
+					}
 					if attempt == maxAttempts {
-						log.Error(fmt.Errorf("exhausted %d retries due to Directory_ExpiredPageToken", maxAttempts),
+						log.Error(fmt.Errorf("exhausted %d attempts due to Directory_ExpiredPageToken", maxAttempts),
 							"group members may be incomplete", "groupId", id, "partialCount", count)
 					}
 				}
 
+				if !completed {
+					continue
+				}
 				if ok := pipeline.SendAny(ctx.Done(), out, AzureWrapper{
 					Kind: enums.KindAZGroupMember,
 					Data: data,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/list-group-members.go` around lines 113 - 156, The loop currently always
sends `data` (models.GroupMembers) via `pipeline.SendAny` even when enumeration
was incomplete; change the logic to only emit a successful `KindAZGroupMember`
when the enumeration fully completed. Add a completion flag (e.g., `complete :=
true`) before iterating results from `client.ListAzureADGroupMembers`, set
`complete = false` when you encounter any non-expired error or when
expired-page-token retries are exhausted (`expired` true and `attempt ==
maxAttempts`), and only call `pipeline.SendAny` with the `AzureWrapper{ Kind:
enums.KindAZGroupMember, Data: data }` when `complete` is true; otherwise send
an error/failed wrapper or skip sending the member result and log the failure so
downstream consumers cannot treat partial `data` as a complete group membership.

@fjodoin
Copy link
Copy Markdown
Author

fjodoin commented Apr 22, 2026

I have read the CLA Document and I hereby sign the CLA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant