BED-4600: Add Request Timeout by StranDutton · Pull Request #188 · SpecterOps/AzureHound

StranDutton · 2026-04-15T21:56:17Z

Problem: Customer reported AzureHound collections not completing. After analysis, it seems that the issue may have been that we were not failing gracefully if any api call hung - we would block indefinitely.

This PR addresses that behavior so we do not block the entire pipeline if a single call hangs.

An improvement I'd like to make as a result of this would be to implement proper status reporting back to BHE so the user knows that only a partial collection was completed and that some objects may not have been collected - that is outside of the scope of this ticket though.

Summary by CodeRabbit

Bug Fixes
- Added per-page request timeouts (2 minutes) for paginated network operations to avoid hanging requests and improve responsiveness.
Tests
- Added tests covering successful pagination, enforced per-page timeouts, parent-context cancellation handling, error propagation, and graceful termination of pagination.

coderabbitai · 2026-04-15T21:56:25Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5cb3533d-da82-451c-963e-64a79c99f2f1

📥 Commits

Reviewing files that changed from the base of the PR and between 9057951 and cdd9104.

📒 Files selected for processing (1)

client/client_test.go

✅ Files skipped from review due to trivial changes (1)

client/client_test.go

Walkthrough

The changes add a package-level 2-minute per-page request timeout in getAzureObjectList, deriving a pageCtx for each page request and using it for all network calls; pageCtx is cancelled on errors or after successful decode. New tests exercise success, timeout, and parent-cancellation behaviors.

Changes

Cohort / File(s)	Summary
Timeout Implementation `client/client.go`	Add `var pageRequestTimeout = 2 * time.Minute`. `getAzureObjectList` now creates a per-page `pageCtx` via `context.WithTimeout(ctx, pageRequestTimeout)` and uses it for all network calls; `pageCtx` is explicitly cancelled on error paths and after successful decode.
Test Coverage `client/client_test.go`	New tests with `fakeRestClient` covering: success (two items emitted), timeout (override `pageRequestTimeout`, simulate hung request and assert deadline), and parent cancellation (cancel parent ctx and assert error propagation and channel closure).

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant getAzureObjectList as getAzureObjectList()
    participant Context as Context Management
    participant HTTP as client.Get / client.Send
    participant Decoder as JSON Decoder
    participant Out as Result Channel

    Caller->>getAzureObjectList: invoke with parent ctx
    loop per page
        getAzureObjectList->>Context: WithTimeout(parent ctx, 2m) -> pageCtx
        getAzureObjectList->>HTTP: request using pageCtx
        alt response OK
            HTTP-->>getAzureObjectList: HTTP 200 + body
            getAzureObjectList->>Decoder: decode body
            Decoder-->>getAzureObjectList: items
            getAzureObjectList->>Context: cancel pageCtx
            getAzureObjectList->>Out: send items
            Out-->>Caller: items received
        else error / timeout
            HTTP--x getAzureObjectList: error / ctx.Done
            getAzureObjectList->>Context: cancel pageCtx
            getAzureObjectList->>Out: send error result
            Out-->>Caller: error received
        end
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A two-minute hop per page I keep,

Through contexts deep and deadlines steep,
I cancel when decode is done,
Or when the hung requests have won,
Hooray—no stray calls in my heap!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly identifies the main change: adding request timeout functionality to handle API calls that hang indefinitely, addressing the core objective of preventing pipeline blocking.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch BED-4600-ingest-timeout-bug

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

client/client_test.go (2)
39-39: fakeRestClient.Send returns (nil, nil) — silent NPE risk for any future test exercising pagination.

The nextLink branch in getAzureObjectList (client.go lines 129-151) calls client.Send(req) and then dereferences res.Body. None of the current tests trigger that path, but a future test that returns a payload containing @odata.nextLink will nil-panic in the second iteration. Consider returning an explicit error so the failure mode is loud:
-func (s *fakeRestClient) Send(req *http.Request) (*http.Response, error) { return nil, nil }
+func (s *fakeRestClient) Send(req *http.Request) (*http.Response, error) {
+	return nil, fmt.Errorf("fakeRestClient.Send not implemented")
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/client_test.go` at line 39, The fakeRestClient.Send implementation
returns (nil, nil) which can cause a nil-pointer dereference when
getAzureObjectList calls client.Send(req) and reads res.Body; change
fakeRestClient.Send in client_test.go to return a non-nil error (e.g.,
fmt.Errorf with a clear message) when no response is intended, or return a valid
*http.Response with a non-nil Body for tests that exercise pagination; update
the Send method to either construct and return an appropriate http.Response
(with io.NopCloser) for success cases or return (nil, err) to fail loudly so
tests won't silently NPE in getAzureObjectList.
70-74: Mutating the package-level pageRequestTimeout makes these tests parallel-unsafe.

TestGetAzureObjectList_HungResponseTimesOut and TestGetAzureObjectList_StalledResponseBody_Integration both write to the package global pageRequestTimeout and restore it via defer. As written this works only because go test runs tests in the same package serially by default; adding t.Parallel() to any test in this package — or to a future test that reads the timeout — would introduce a data race that -race would flag.

Two cleaner options:

Inject the timeout via a function parameter or a struct field on a future azureClient so tests don't need to touch globals.

At minimum, document the constraint and avoid t.Parallel() here.

Also applies to: 142-145
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/client_test.go` around lines 70 - 74, The tests mutate the
package-level pageRequestTimeout (used in
TestGetAzureObjectList_HungResponseTimesOut and
TestGetAzureObjectList_StalledResponseBody_Integration), which is
parallel-unsafe; instead refactor the code so the timeout is injected (e.g., add
a timeout parameter to the function that fetches pages or add a timeout field to
the azureClient struct and use that value instead of the global), update the
tests to construct an azureClient with the shorter timeout (or call the function
with the timeout) rather than writing to pageRequestTimeout, and remove any
direct writes to the global to eliminate races (if refactoring is infeasible,
document the serial-only constraint and ensure these tests do not call
t.Parallel()).
client/client.go (1)
38-39: Consider making pageRequestTimeout configurable.

A hard-coded 2-minute timeout is reasonable for the reported BED-4600 hang scenario, but environments with large pages or high-latency tenants may legitimately exceed it (especially since the timeout also bounds body read/decode at line 161, not just header receipt). Exposing this via the existing config.Config (or a CLI flag) would let operators tune it without a code change. Out-of-scope deferral is fine; flagging for future iteration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/client.go` around lines 38 - 39, Replace the hard-coded var
pageRequestTimeout = 2 * time.Minute with a configurable value sourced from the
existing config (e.g., add a PageRequestTimeout time.Duration field to
config.Config with a default of 2m) and use that value wherever
pageRequestTimeout is referenced (including the body read/decode timeout usage).
Update the client initialization (where the client is constructed or config is
passed) to read config.Config.PageRequestTimeout and fall back to 2*time.Minute
if zero, and ensure any tests or constructors that instantiate the client accept
or pass the config so operators can tune the timeout without changing code.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/client_test.go`:
- Around line 92-100: The tests read from the out channel but skip assertions
when the channel is closed (ok==false), allowing false positives; update each
failing test (TestGetAzureObjectList_TimedOut,
TestGetAzureObjectList_ParentContextCanceled,
TestGetAzureObjectList_StalledResponseBody_Integration) to explicitly assert the
receiver succeeded before checking the result: when receiving from out do
"result, ok := <-out" then require.True(t, ok, "expected a value on out, channel
closed") and only after that run require.Error(t, result.Error, ...) and
require.ErrorIs(t, result.Error, context.DeadlineExceeded) (or the existing
error checks) so the test fails if the channel closed without a sent AzureResult
from getAzureObjectList.

---

Nitpick comments:
In `@client/client_test.go`:
- Line 39: The fakeRestClient.Send implementation returns (nil, nil) which can
cause a nil-pointer dereference when getAzureObjectList calls client.Send(req)
and reads res.Body; change fakeRestClient.Send in client_test.go to return a
non-nil error (e.g., fmt.Errorf with a clear message) when no response is
intended, or return a valid *http.Response with a non-nil Body for tests that
exercise pagination; update the Send method to either construct and return an
appropriate http.Response (with io.NopCloser) for success cases or return (nil,
err) to fail loudly so tests won't silently NPE in getAzureObjectList.
- Around line 70-74: The tests mutate the package-level pageRequestTimeout (used
in TestGetAzureObjectList_HungResponseTimesOut and
TestGetAzureObjectList_StalledResponseBody_Integration), which is
parallel-unsafe; instead refactor the code so the timeout is injected (e.g., add
a timeout parameter to the function that fetches pages or add a timeout field to
the azureClient struct and use that value instead of the global), update the
tests to construct an azureClient with the shorter timeout (or call the function
with the timeout) rather than writing to pageRequestTimeout, and remove any
direct writes to the global to eliminate races (if refactoring is infeasible,
document the serial-only constraint and ensure these tests do not call
t.Parallel()).

In `@client/client.go`:
- Around line 38-39: Replace the hard-coded var pageRequestTimeout = 2 *
time.Minute with a configurable value sourced from the existing config (e.g.,
add a PageRequestTimeout time.Duration field to config.Config with a default of
2m) and use that value wherever pageRequestTimeout is referenced (including the
body read/decode timeout usage). Update the client initialization (where the
client is constructed or config is passed) to read
config.Config.PageRequestTimeout and fall back to 2*time.Minute if zero, and
ensure any tests or constructors that instantiate the client accept or pass the
config so operators can tune the timeout without changing code.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4ae43e88-83ee-4975-84d4-cf5513b889fd

📥 Commits

Reviewing files that changed from the base of the PR and between 96436c7 and c9365f2.

📒 Files selected for processing (2)

client/client.go
client/client_test.go

BED-4600: Add page request timeout

792556b

StranDutton and others added 2 commits April 15, 2026 16:56

Merge branch 'main' into BED-4600-ingest-timeout-bug

84f6db6

BED-4600: add comments

c9365f2

StranDutton self-assigned this Apr 17, 2026

BED-4600: Clean up unit tests

9057951

coderabbitai Bot reviewed Apr 17, 2026

View reviewed changes

Comment thread client/client_test.go

StranDutton marked this pull request as ready for review April 17, 2026 16:14

StranDutton and others added 2 commits April 17, 2026 11:26

Merge branch 'main' into BED-4600-ingest-timeout-bug

8c6927e

BED-4600: Unit test refactor

cdd9104

definitelynotagoblin approved these changes Apr 17, 2026

View reviewed changes

Merge branch 'main' into BED-4600-ingest-timeout-bug

5207108

StranDutton merged commit 1e211df into main Apr 17, 2026
10 checks passed

github-actions Bot locked and limited conversation to collaborators Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BED-4600: Add Request Timeout#188

BED-4600: Add Request Timeout#188
StranDutton merged 7 commits intomainfrom
BED-4600-ingest-timeout-bug

StranDutton commented Apr 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

StranDutton commented Apr 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

StranDutton commented Apr 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading