seatbelt: review and addresses some of the AI feedback

# Seatbelt Crate Analysis
 
Analysis of the `seatbelt` crate covering performance, correctness, and usability.
 
## Performance
 
### 1. Circuit Breaker Engine Map Uses `Mutex<HashMap>` — Consider a Concurrent Map
 
**File:** `breaker/engine/engines.rs:15-17`
 
`Engines` wraps its map in a `Mutex<HashMap<BreakerId, Arc<Engine>>>`. Every `execute()` call acquires this mutex to look up (or insert) an engine by `BreakerId`. Under high concurrency with per-host breaker IDs, this serializes all lookups through a single lock.
 
**Suggestion:** Replace with a concurrent map (e.g., `dashmap` or a sharded lock scheme) to allow concurrent reads without contention. Alternatively, use a `RwLock` so that the common read-path (engine already exists) doesn't block other readers.
 
### 2. Telemetry Attribute Arrays Are Allocated on Every Event
 
**Files:** `breaker/engine/engine_telemetry.rs`, `timeout/service.rs:200-208`, `retry/service.rs:236-248`
 
Each telemetry report constructs a fresh `&[KeyValue]` array with `Cow::clone()` on every event. For hot paths (every retry attempt, every circuit breaker enter/exit, every timeout), this causes small but repeated heap allocations for the `Cow::Owned` pipeline/strategy names.
 
**Suggestion:** Pre-allocate a base `Vec<KeyValue>` containing the static pipeline/strategy/event attributes in the `*Shared` struct at construction time. On each call, clone or extend only the dynamic attributes (e.g., attempt index). This avoids re-cloning `Cow<str>` values on every invocation.
 
### 3. `BackoffOptions` Is Cloned on Every `delays()` Call
 
**File:** `retry/backoff.rs:49-55`
 
`DelayBackoff::delays()` clones the entire `BackoffOptions` (including the `Rnd` enum which contains an `Arc`) into each `DelaysIter`. Since a new iterator is created per retry cycle, this is a clone per request.
 
**Impact:** Minor — the clone is cheap (a few fields + an `Arc::clone`). Noting it only because in extreme throughput scenarios it's a small inefficiency. No change recommended unless profiling shows it as a hotspot.
 
### 4. `HealthMetrics` Window Cleanup Is Linear
 
**File:** `breaker/health.rs:119-124`
 
The `record()` method removes expired windows with a `while` loop calling `pop_front()`. With 10 fixed windows this is bounded and acceptable. However, the `health_info()` method iterates all windows to sum successes/failures on every call.
 
**Impact:** Negligible with `WINDOW_COUNT = 10`. If the window count ever becomes configurable and grows large, consider maintaining running totals.
 
## Correctness
 
### 5. Metrics `VERSION` Constant Is Stale
 
**File:** `metrics.rs:8`
 
```rust
const VERSION: &str = "v0.1.0";
```
 
The crate version in `Cargo.toml` is `0.3.0`, but the OpenTelemetry instrumentation scope reports `v0.1.0`. This means metrics consumers see the wrong version. This hardcoded constant will silently drift on every release.
 
**Suggestion:** Either use `env!("CARGO_PKG_VERSION")` to derive it automatically, or at minimum update it to `"v0.3.0"` and add a test that asserts consistency.
 
### 6. `failure_threshold` Clamps High Values but Allows Negative Values
 
**File:** `breaker/layer.rs:157-159`
 
```rust
pub fn failure_threshold(mut self, threshold: f32) -> Self {
    self.failure_threshold = threshold.min(1.0);
    self
}
```
 
The threshold is clamped to `1.0` on the upper end, but negative values (e.g., `-0.5`) pass through unchecked. A negative threshold would cause the circuit breaker to open immediately on any traffic exceeding `min_throughput`, since `failure_rate >= negative_threshold` is always true.
 
**Suggestion:** Clamp to `threshold.clamp(0.0, 1.0)` or at minimum `threshold.max(0.0).min(1.0)`.
 
### 7. `Window::update` Doesn't Use Saturating Arithmetic
 
**File:** `breaker/health.rs:169-174`
 
```rust
fn update(&mut self, result: ExecutionResult) {
    match result {
        ExecutionResult::Success => self.successes += 1,
        ExecutionResult::Failure => self.failures += 1,
    }
}
```
 
While `HealthInfo::new` and `Stats` methods use `saturating_add`, individual `Window` counters use plain `+=`. Under extreme traffic within a single window, this could theoretically overflow in debug builds (panic) or wrap in release builds.
 
**Suggestion:** Use `saturating_add(1)` for consistency with the rest of the crate.
 
### 8. Typo in Telemetry Field Name: `successfull`
 
**File:** `breaker/engine/engine_telemetry.rs:162`
 
```rust
circuit_breaker.probes.successfull = stats.probes_successes,
```
 
The log field name is misspelled as `successfull` (double `l`) instead of `successful`. Consumers querying or filtering logs by this field name will need to use the misspelled version.
 
### 9. `RecoveryKind` Wildcard Match May Mask New Variants
 
**File:** `retry/service.rs:184`
 
```rust
RecoveryKind::Never | RecoveryKind::Unknown | _ => false,
```
 
The wildcard `_` silently handles any future `RecoveryKind` variants as non-recoverable. If a new variant is added that should be recoverable, this match arm would silently suppress it. The `#[non_exhaustive]` attribute on `RecoveryKind` makes this a practical concern.
 
**Suggestion:** Remove the wildcard and explicitly match all known variants, or add a comment explaining the deliberate catch-all design decision. Note: this is behind `mutants::skip` so it may be intentional.
 
### 10. `ExecutionResult::from_recovery` Treats `Unknown` as Success
 
**File:** `breaker/execution_result.rs:27-33`
 
```rust
match recovery.kind() {
    RecoveryKind::Retry | RecoveryKind::Unavailable => Self::Failure,
    _ => Self::Success,
}
```
 
`RecoveryKind::Unknown` maps to `Success`, which means the circuit breaker won't count unknown recoveries as failures. This might be intentional (conservative: don't trip breaker on unknowns), but it's worth documenting the design choice explicitly since it differs from how one might intuitively expect "unknown" to be treated.
 
## Usability
 
### 11. `ResilienceContext` Requires Explicit Type Parameters
 
**File:** `context.rs:16`
 
```rust
pub struct ResilienceContext<In, Out> { ... }
```
 
The `In` and `Out` phantom type parameters exist only to carry type information for the middleware pipeline. Users must specify them explicitly in some contexts (e.g., `ResilienceContext::<String, Result<String, String>>::new(&clock)`), which is verbose. The types are only used to constrain layers at build time, not at construction time.
 
**Suggestion:** Consider whether `ResilienceContext` could be type-erased and have the layer methods constrain `In`/`Out` instead, or provide a convenience constructor that infers types from usage.
 
### 12. No `Default` Implementation for `HalfOpenMode`
 
**File:** `breaker/half_open_mode.rs`
 
`HalfOpenMode` doesn't implement `Default`, even though `BreakerLayer` defaults to `progressive(None)`. Adding `impl Default for HalfOpenMode` with `progressive(None)` would let users write `HalfOpenMode::default()` and align with the documented default.
 
### 13. `BreakerLayer::layer()` Can Panic at Runtime Despite Type-State Pattern
 
**File:** `breaker/layer.rs:348-349`
 
```rust
recovery: self.recovery.clone().expect("recovery must be set in Ready state"),
rejected_input: self.rejected_input.clone().expect("rejected_input must be set in Ready state"),
```
 
The type-state pattern (`Set`/`NotSet`) prevents calling `layer()` without setting required fields, but the actual values are stored as `Option<T>`. The `.expect()` calls should never fire, but they represent a mismatch between the type-level and value-level guarantees. The same pattern exists in `RetryLayer` and `TimeoutLayer`.
 
**Suggestion:** This is a minor internal concern (not a user-facing bug) since the type system prevents the invalid state. However, storing the values directly (not wrapped in `Option`) in the `Set` state would eliminate the `.expect()` calls entirely.
 
### 14. `min_throughput` Accepts 0 Without Warning
 
**File:** `breaker/layer.rs:170-173`
 
Setting `min_throughput(0)` means the circuit breaker can trip on the very first failure, regardless of how little traffic there is. This is likely never the intended behavior and could cause immediate circuit opening in low-traffic scenarios.
 
**Suggestion:** Either enforce a minimum of 1, or document the `0` behavior explicitly.
 
### 15. No Way to Inspect or Reset Circuit Breaker State
 
**File:** `breaker/service.rs`, `breaker/engine/engines.rs`
 
Users cannot programmatically query the circuit breaker's current state (open/closed/half-open) or manually reset it. This makes operational tasks difficult — for example, after deploying a fix for a downstream service, operators cannot force the circuit closed to resume traffic immediately.
 
**Suggestion:** Expose a `state()` or `is_open()` method on `Breaker`, and consider a `reset()` method for operational needs.
 
### 16. Circuit Breaker Engine Map Grows Without Bound
 
**File:** `breaker/engine/engines.rs:32-42`
 
The `Engines` map never evicts entries. If `BreakerId` values are derived from dynamic input (e.g., per-host circuit breakers and hosts come and go), the map will grow indefinitely. The doc comment on `BreakerId` warns about this, but there's no runtime protection.
 
**Suggestion:** Consider adding an optional eviction policy (LRU, TTL, or max-entries cap) to prevent unbounded memory growth in long-running services.
 
## Summary
 
| # | Category | Severity | Item |
|---|----------|----------|------|
| 1 | Performance | Medium | `Mutex<HashMap>` in engine map serializes lookups |
| 2 | Performance | Low | Telemetry attributes re-allocated on every event |
| 3 | Performance | Negligible | `BackoffOptions` cloned per retry cycle |
| 4 | Performance | Negligible | `HealthMetrics` linear window scan |
| 5 | Correctness | Medium | Metrics `VERSION` hardcoded at `v0.1.0`, crate is at `0.3.0` |
| 6 | Correctness | Low | `failure_threshold` allows negative values |
| 7 | Correctness | Low | `Window::update` uses non-saturating arithmetic |
| 8 | Correctness | Low | Typo `successfull` in telemetry field |
| 9 | Correctness | Low | Wildcard match on `RecoveryKind` may mask new variants |
| 10 | Correctness | Info | `Unknown` recovery treated as success in breaker |
| 11 | Usability | Low | `ResilienceContext` requires explicit type parameters |
| 12 | Usability | Low | No `Default` for `HalfOpenMode` |
| 13 | Usability | Info | Type-state pattern still uses `.expect()` internally |
| 14 | Usability | Low | `min_throughput(0)` has surprising behavior |
| 15 | Usability | Medium | No way to inspect or reset circuit breaker state |
| 16 | Usability | Medium | Engine map grows without bound |

## Human Notes

Some of the points are valid ones, will get through these and try to apply some reasoning and then fix ones that are relevant.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

seatbelt: review and addresses some of the AI feedback #284

Seatbelt Crate Analysis

Performance

1. Circuit Breaker Engine Map Uses `Mutex<HashMap>` — Consider a Concurrent Map

2. Telemetry Attribute Arrays Are Allocated on Every Event

3. `BackoffOptions` Is Cloned on Every `delays()` Call

4. `HealthMetrics` Window Cleanup Is Linear

Correctness

5. Metrics `VERSION` Constant Is Stale

6. `failure_threshold` Clamps High Values but Allows Negative Values

7. `Window::update` Doesn't Use Saturating Arithmetic

8. Typo in Telemetry Field Name: `successfull`

9. `RecoveryKind` Wildcard Match May Mask New Variants

10. `ExecutionResult::from_recovery` Treats `Unknown` as Success

Usability

11. `ResilienceContext` Requires Explicit Type Parameters

12. No `Default` Implementation for `HalfOpenMode`

13. `BreakerLayer::layer()` Can Panic at Runtime Despite Type-State Pattern

14. `min_throughput` Accepts 0 Without Warning

15. No Way to Inspect or Reset Circuit Breaker State

16. Circuit Breaker Engine Map Grows Without Bound

Summary

Human Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

#	Category	Severity	Item
1	Performance	Medium	`Mutex<HashMap>` in engine map serializes lookups
2	Performance	Low	Telemetry attributes re-allocated on every event
3	Performance	Negligible	`BackoffOptions` cloned per retry cycle
4	Performance	Negligible	`HealthMetrics` linear window scan
5	Correctness	Medium	Metrics `VERSION` hardcoded at `v0.1.0`, crate is at `0.3.0`
6	Correctness	Low	`failure_threshold` allows negative values
7	Correctness	Low	`Window::update` uses non-saturating arithmetic
8	Correctness	Low	Typo `successfull` in telemetry field
9	Correctness	Low	Wildcard match on `RecoveryKind` may mask new variants
10	Correctness	Info	`Unknown` recovery treated as success in breaker
11	Usability	Low	`ResilienceContext` requires explicit type parameters
12	Usability	Low	No `Default` for `HalfOpenMode`
13	Usability	Info	Type-state pattern still uses `.expect()` internally
14	Usability	Low	`min_throughput(0)` has surprising behavior
15	Usability	Medium	No way to inspect or reset circuit breaker state
16	Usability	Medium	Engine map grows without bound

seatbelt: review and addresses some of the AI feedback #284

Description

Seatbelt Crate Analysis

Performance

1. Circuit Breaker Engine Map Uses Mutex<HashMap> — Consider a Concurrent Map

2. Telemetry Attribute Arrays Are Allocated on Every Event

3. BackoffOptions Is Cloned on Every delays() Call

4. HealthMetrics Window Cleanup Is Linear

Correctness

5. Metrics VERSION Constant Is Stale

6. failure_threshold Clamps High Values but Allows Negative Values

7. Window::update Doesn't Use Saturating Arithmetic

8. Typo in Telemetry Field Name: successfull

9. RecoveryKind Wildcard Match May Mask New Variants

10. ExecutionResult::from_recovery Treats Unknown as Success

Usability

11. ResilienceContext Requires Explicit Type Parameters

12. No Default Implementation for HalfOpenMode

13. BreakerLayer::layer() Can Panic at Runtime Despite Type-State Pattern

14. min_throughput Accepts 0 Without Warning

15. No Way to Inspect or Reset Circuit Breaker State

16. Circuit Breaker Engine Map Grows Without Bound

Summary

Human Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Circuit Breaker Engine Map Uses `Mutex<HashMap>` — Consider a Concurrent Map

3. `BackoffOptions` Is Cloned on Every `delays()` Call

4. `HealthMetrics` Window Cleanup Is Linear

5. Metrics `VERSION` Constant Is Stale

6. `failure_threshold` Clamps High Values but Allows Negative Values

7. `Window::update` Doesn't Use Saturating Arithmetic

8. Typo in Telemetry Field Name: `successfull`

9. `RecoveryKind` Wildcard Match May Mask New Variants

10. `ExecutionResult::from_recovery` Treats `Unknown` as Success

11. `ResilienceContext` Requires Explicit Type Parameters

12. No `Default` Implementation for `HalfOpenMode`

13. `BreakerLayer::layer()` Can Panic at Runtime Despite Type-State Pattern

14. `min_throughput` Accepts 0 Without Warning