Summary
No rate-limiting middleware is registered anywhere in the control plane, so a single misbehaving client can exhaust the async worker pool and database connections on hot endpoints.
Context
A search for rate.NewLimiter, tollbooth, or httprate in control-plane/internal/server/ returns no matches. The endpoints most at risk are /api/v1/execute (spawns goroutines + DB writes), /api/v1/discovery (fan-out resolution), and /api/v1/nodes/status/bulk (N storage reads per call). Without throttling, a runaway agent or a scanning client can trigger connection-pool exhaustion and starve legitimate traffic. There is no circuit breaker or backpressure mechanism downstream to compensate.
Scope
In Scope
- Add per-API-key token-bucket rate limiting on at minimum: execute, discovery, and bulk-status endpoints.
- Add per-IP rate limiting for any unauthenticated public paths (e.g. health, registration before auth).
- Make limits configurable via
agentfield.yaml / environment variables with sensible defaults.
- Return HTTP 429 with a
Retry-After header when a limit is exceeded.
Out of Scope
- Distributed rate limiting across multiple control-plane replicas — in-process limiting is sufficient for v1.
- Adding circuit breakers on storage calls — that is a separate resilience concern.
- Rate limiting internal agent-to-agent calls that bypass the HTTP layer.
Files
control-plane/internal/server/middleware/ — add new ratelimit.go middleware using golang.org/x/time/rate or a Gin-compatible library
control-plane/internal/server/routes.go — apply rate-limit middleware to execute, discovery, and bulk-status route groups
control-plane/internal/config/config.go — add RateLimit config struct with per-endpoint RequestsPerSecond and BurstSize fields
control-plane/internal/server/middleware/ratelimit_test.go — tests: limit is enforced, 429 returned, Retry-After header present
Acceptance Criteria
Notes for Contributors
Severity: MEDIUM
golang.org/x/time/rate is already a transitive dependency via otel — check go.sum before adding a new library. Use a sync.Map keyed by API key to hold per-key rate.Limiter instances. The limiter map should evict idle entries (e.g. via a background sweep) to avoid unbounded growth in deployments with many short-lived API keys.
Summary
No rate-limiting middleware is registered anywhere in the control plane, so a single misbehaving client can exhaust the async worker pool and database connections on hot endpoints.
Context
A search for
rate.NewLimiter,tollbooth, orhttprateincontrol-plane/internal/server/returns no matches. The endpoints most at risk are/api/v1/execute(spawns goroutines + DB writes),/api/v1/discovery(fan-out resolution), and/api/v1/nodes/status/bulk(N storage reads per call). Without throttling, a runaway agent or a scanning client can trigger connection-pool exhaustion and starve legitimate traffic. There is no circuit breaker or backpressure mechanism downstream to compensate.Scope
In Scope
agentfield.yaml/ environment variables with sensible defaults.Retry-Afterheader when a limit is exceeded.Out of Scope
Files
control-plane/internal/server/middleware/— add newratelimit.gomiddleware usinggolang.org/x/time/rateor a Gin-compatible librarycontrol-plane/internal/server/routes.go— apply rate-limit middleware to execute, discovery, and bulk-status route groupscontrol-plane/internal/config/config.go— addRateLimitconfig struct with per-endpointRequestsPerSecondandBurstSizefieldscontrol-plane/internal/server/middleware/ratelimit_test.go— tests: limit is enforced, 429 returned, Retry-After header presentAcceptance Criteria
Retry-Afterheadergo test ./control-plane/...)make lint)Notes for Contributors
Severity: MEDIUM
golang.org/x/time/rateis already a transitive dependency via otel — checkgo.sumbefore adding a new library. Use async.Mapkeyed by API key to hold per-keyrate.Limiterinstances. The limiter map should evict idle entries (e.g. via a background sweep) to avoid unbounded growth in deployments with many short-lived API keys.