Funky pagination error (vibe coded fix with Opus) on big Entra tenants with massive groups#189
Funky pagination error (vibe coded fix with Opus) on big Entra tenants with massive groups#189fjodoin wants to merge 2 commits intoSpecterOps:mainfrom
Conversation
|
All contributors have signed the CLA ✍️ ✅ |
WalkthroughThis PR introduces structured error handling for Microsoft Graph REST API responses and implements retry logic for handling expired pagination tokens during group member enumeration. A new Changes
Sequence Diagram(s)sequenceDiagram
participant Client as REST Client
participant API as Microsoft Graph API
participant Handler as Error Handler
participant Detector as Token Expiration<br/>Detector
participant Retry as Retry Logic
Client->>API: Request group members
API-->>Client: Error response (e.g., expired token)
Client->>Handler: Parse response body
Handler->>Detector: Create GraphError with code
Detector->>Detector: Check if code ==<br/>"Directory_ExpiredPageToken"
Detector-->>Retry: IsExpiredPageToken() = true
Retry->>Retry: Increment attempt counter<br/>(< 4)
Retry->>Client: Retry request
Client->>API: Request group members (retry)
API-->>Client: Success response
Client-->>Retry: Members enumerated
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cmd/list-group-members.go (1)
73-80:⚠️ Potential issue | 🟠 MajorSet the group-member page size to reduce cursor lifetime pressure.
Top: 0leaves Graph on its default page size of 100 members per page, so large groups require substantially more pagination requests than necessary and are more likely to hitDirectory_ExpiredPageToken. Microsoft's List group members documentation specifies this endpoint supports$topwith a maximum of 999: https://learn.microsoft.com/en-us/graph/api/group-list-members?view=graph-rest-1.0Additionally, the current retry logic (lines 117-152) sends collected data regardless of retry exhaustion status—after all 4 attempts fail with expired token errors, the incomplete membership data from the final attempt is still emitted as a successful result (lines 154-157).
🐛 Proposed fix
params = query.GraphParams{ Select: unique(listGroupMembersSelect), Filter: "", Count: false, Search: "", - Top: 0, + Top: 999, Expand: "", }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/list-group-members.go` around lines 73 - 80, Set the Graph page size in the params GraphParams by changing Top from 0 to the max allowed 999 (so set params.Top = 999 in the params = query.GraphParams{...} block referencing listGroupMembersSelect) to reduce pagination and expired page-token pressure; also fix the retry logic that currently emits partial membership data after retry exhaustion by ensuring the member-emission code only runs on a successful fetch—if all retry attempts fail (the loop that collects members and checks for Directory_ExpiredPageToken), return/propagate an error instead of sending the incomplete results so callers do not receive partial data.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/list-group-members.go`:
- Around line 113-156: The loop currently always sends `data`
(models.GroupMembers) via `pipeline.SendAny` even when enumeration was
incomplete; change the logic to only emit a successful `KindAZGroupMember` when
the enumeration fully completed. Add a completion flag (e.g., `complete :=
true`) before iterating results from `client.ListAzureADGroupMembers`, set
`complete = false` when you encounter any non-expired error or when
expired-page-token retries are exhausted (`expired` true and `attempt ==
maxAttempts`), and only call `pipeline.SendAny` with the `AzureWrapper{ Kind:
enums.KindAZGroupMember, Data: data }` when `complete` is true; otherwise send
an error/failed wrapper or skip sending the member result and log the failure so
downstream consumers cannot treat partial `data` as a complete group membership.
---
Outside diff comments:
In `@cmd/list-group-members.go`:
- Around line 73-80: Set the Graph page size in the params GraphParams by
changing Top from 0 to the max allowed 999 (so set params.Top = 999 in the
params = query.GraphParams{...} block referencing listGroupMembersSelect) to
reduce pagination and expired page-token pressure; also fix the retry logic that
currently emits partial membership data after retry exhaustion by ensuring the
member-emission code only runs on a successful fetch—if all retry attempts fail
(the loop that collects members and checks for Directory_ExpiredPageToken),
return/propagate an error instead of sending the incomplete results so callers
do not receive partial data.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3e31c306-2ad2-4b04-9e9c-364a54dddd1b
📒 Files selected for processing (4)
.gitignoreclient/rest/client.goclient/rest/errors.gocmd/list-group-members.go
| var ( | ||
| data = models.GroupMembers{ | ||
| GroupId: id, | ||
| } | ||
| count = 0 | ||
| data models.GroupMembers | ||
| count int | ||
| ) | ||
| for item := range client.ListAzureADGroupMembers(ctx, id, params) { | ||
| if item.Error != nil { | ||
| log.Error(item.Error, "unable to continue processing members for this group", "groupId", id) | ||
| } else { | ||
| groupMember := models.GroupMember{ | ||
| Member: item.Ok, | ||
| GroupId: id, | ||
| for attempt := 1; attempt <= maxAttempts; attempt++ { | ||
| data = models.GroupMembers{GroupId: id} | ||
| count = 0 | ||
| expired := false | ||
|
|
||
| for item := range client.ListAzureADGroupMembers(ctx, id, params) { | ||
| if item.Error != nil { | ||
| if rest.IsExpiredPageToken(item.Error) { | ||
| expired = true | ||
| log.Info("page token expired mid-enumeration, will retry group", | ||
| "groupId", id, "attempt", attempt, "collectedSoFar", count) | ||
| // Producer already returned after emitting the error; | ||
| // the channel will close and this range will exit. | ||
| continue | ||
| } | ||
| log.Error(item.Error, "unable to continue processing members for this group", | ||
| "groupId", id, "attempt", attempt) | ||
| } else { | ||
| groupMember := models.GroupMember{ | ||
| Member: item.Ok, | ||
| GroupId: id, | ||
| } | ||
| log.V(2).Info("found group member", "groupId", groupMember.GroupId) | ||
| count++ | ||
| data.Members = append(data.Members, groupMember) | ||
| } | ||
| log.V(2).Info("found group member", "groupId", groupMember.GroupId) | ||
| count++ | ||
| data.Members = append(data.Members, groupMember) | ||
| } | ||
|
|
||
| if !expired { | ||
| break | ||
| } | ||
| if attempt == maxAttempts { | ||
| log.Error(fmt.Errorf("exhausted %d retries due to Directory_ExpiredPageToken", maxAttempts), | ||
| "group members may be incomplete", "groupId", id, "partialCount", count) | ||
| } | ||
| } | ||
|
|
||
| if ok := pipeline.SendAny(ctx.Done(), out, AzureWrapper{ | ||
| Kind: enums.KindAZGroupMember, | ||
| Data: data, |
There was a problem hiding this comment.
Don’t emit partial memberships as a successful result.
After a non-expired error, or after all expired-page-token attempts fail, data still gets sent as KindAZGroupMember. That preserves the original partial-data corruption mode: downstream consumers can’t distinguish incomplete group membership from a complete enumeration.
🐛 Proposed guard against sending incomplete data
var (
- data models.GroupMembers
- count int
+ data models.GroupMembers
+ count int
+ completed bool
)
for attempt := 1; attempt <= maxAttempts; attempt++ {
data = models.GroupMembers{GroupId: id}
count = 0
expired := false
+ failed := false
for item := range client.ListAzureADGroupMembers(ctx, id, params) {
if item.Error != nil {
if rest.IsExpiredPageToken(item.Error) {
expired = true
log.Info("page token expired mid-enumeration, will retry group",
"groupId", id, "attempt", attempt, "collectedSoFar", count)
// Producer already returned after emitting the error;
// the channel will close and this range will exit.
continue
}
+ failed = true
log.Error(item.Error, "unable to continue processing members for this group",
"groupId", id, "attempt", attempt)
} else {
groupMember := models.GroupMember{
Member: item.Ok,
@@
}
}
- if !expired {
+ if !expired && !failed {
+ completed = true
break
}
+ if failed {
+ break
+ }
if attempt == maxAttempts {
- log.Error(fmt.Errorf("exhausted %d retries due to Directory_ExpiredPageToken", maxAttempts),
+ log.Error(fmt.Errorf("exhausted %d attempts due to Directory_ExpiredPageToken", maxAttempts),
"group members may be incomplete", "groupId", id, "partialCount", count)
}
}
+ if !completed {
+ continue
+ }
if ok := pipeline.SendAny(ctx.Done(), out, AzureWrapper{
Kind: enums.KindAZGroupMember,
Data: data,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/list-group-members.go` around lines 113 - 156, The loop currently always
sends `data` (models.GroupMembers) via `pipeline.SendAny` even when enumeration
was incomplete; change the logic to only emit a successful `KindAZGroupMember`
when the enumeration fully completed. Add a completion flag (e.g., `complete :=
true`) before iterating results from `client.ListAzureADGroupMembers`, set
`complete = false` when you encounter any non-expired error or when
expired-page-token retries are exhausted (`expired` true and `attempt ==
maxAttempts`), and only call `pipeline.SendAny` with the `AzureWrapper{ Kind:
enums.KindAZGroupMember, Data: data }` when `complete` is true; otherwise send
an error/failed wrapper or skip sending the member result and log the failure so
downstream consumers cannot treat partial `data` as a complete group membership.
|
I have read the CLA Document and I hereby sign the CLA |
Vibe coded a funky pagination error when using azurehound on an enormous Entra tenant with massive groups.
This pull request is to confirm if the patch is sound 😬
(also added .claude to the .gitignore)
ERROR:
ERR unable to continue processing members for this group error="map[error:map[code:Directory_ExpiredPageToken innerError:map[client-request-id:123...456 date:2026-04-21T17:58:12 request-id:123...456] message:The specified page token value has expired and can no longer be included in your request.]]" groupId=123...456
The "token value has expired" is not an OAuth expiration, but rather (according to Claude Opus), it is a pagination error.
"""
The error is Directory_ExpiredPageToken — not the OAuth access token you're thinking of (the 1h one). It's the Microsoft Graph pagination cursor embedded in the @odata.nextLink URL (an opaque $skiptoken). That cursor has a much shorter, undocumented TTL than your JWT, and it expires when a single group's pagination session stretches too long.
Three things in the AzureHound code conspire to stretch it:
No default $top for group members. client/groups.go:63-72 doesn't set params.Top, so Graph's implicit default (~100) is used. A 100k-member group is ~1000 page round-trips. Every sibling endpoint (users, devices, SPs, app role assignments) sets Top=999 — group members is the odd one out.
25 parallel streams hammer Graph. cmd/list-group-members.go:70 fans out with config.ColStreamCount (default 25). Graph throttles, the 429 handler at client/rest/client.go:205 sleeps for Retry-After between every page. Enough elapsed wall-clock time between pages for that specific group and its cursor expires.
Current behavior silently accepts partial data. cmd/list-group-members.go:112-114 logs item.Error but keeps going and emits whatever it had collected — so you got an incomplete membership set for group 123..., not just a noisy log.
Also, client/rest/client.go:218-226 stringifies 4xx bodies via fmt.Errorf("%v", errRes), which is why the error code is buried in a stringified map — callers can't programmatically detect it to retry.
"""
Summary by CodeRabbit