Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 40 additions & 33 deletions docs/incident_detection/tests/3.api_calls_data_loading_flows.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,9 @@
## 3. CRITICAL: Data Loading – API Call Bugs

**Automation Status**: PARTIALLY AUTOMATED (Sections 3.1 and 3.2)
**Automation Status**: PARTIALLY AUTOMATED (Sections 3.1, 3.2, and 3.4)

### Prerequisites: Test Data Setup for Data Loading Tests

**CSV file**: [`simulate_scenarios/data-loading-silences.csv`](../simulate_scenarios/data-loading-silences.csv)

**CSV Format** - These alerts test resolution, short duration, and silence logic (creates incidents F, G, I, J):

```csv
Expand Down Expand Up @@ -86,26 +84,45 @@ start,end,alertname,namespace,severity,silenced,labels
- Verify the latest query end time param is within the last 5 minutes


### 3.4 Many Alerts Break API Request (OU-632)
**BUG**: When an incident contains many alerts (100+), the Prometheus query for the Alerts endpoint becomes too large, resulting in a "Request Header Fields Too Large" (431) error. No alerts are rendered for that incident.
**Automation Status**: NOT AUTOMATED (requires live injected data on cluster)

**Data Setup**:
Use the simulation script from the `cluster-health-analyzer` repository (`local/simulate.sh`) with the following CSVs from `docs/incident_detection/simulate_scenarios/`:
- `100-alerts-14-days.csv` — 100 alerts across 14 days (single incident, triggers the bug)
- `1000-alerts-15-min.csv` — 1000 alerts in 15 minutes (extreme stress scenario)

- [ ] **100 Alerts Load Successfully**: Inject `100-alerts-14-days.csv`
- Navigate to Observe → Incidents
- Select the incident containing 100 alerts
- Verify alerts are rendered in the alerts chart (no blank view)
- Open browser console: verify no "Request Header Fields Too Large" error

- [ ] **1000 Alerts Load Successfully**: Inject `1000-alerts-15-min.csv`
- Navigate to Observe → Incidents
- Select the incident containing 1000 alerts
- Verify alerts are rendered (may be slow but must not fail with a 431 error)
- Open browser console: verify no "Request Header Fields Too Large" error
### 3.4 15-Day Data Loading with "Last N Days" Filtering
**FEATURE**: UI always loads 15 days of data (one query_range call per day), then filters client-side based on "Last N Days" selection.
**Automation Status**: AUTOMATED
**Test file**: `web/cypress/e2e/incidents/regression/03.reg_15day_data_loading.cy.ts`
**Fixture**: `web/cypress/fixtures/incident-scenarios/19-15-day-data-loading.yaml`

**Background**:
- Before: Data was downloaded only for "Last N Days", causing Start dates to be relative to N days
- After: Start displays an absolute date, even when "Last N Days" is shorter than the incident's actual start
- Limit: Start is capped at max 15 days (the maximum supported range)

**Fix Implementation**:
The absolute start date of an incident/alert is always displayed, regardless of the selected "Last N Days" filter.

Solution uses a new API call:
- Absolute timestamps are retrieved by performing an **instant query** call to Prometheus
- For incidents: `min_over_time(timestamp(cluster_health_components_map{}))`
- For alerts: `min_over_time(timestamp(ALERTS{}))`
- This returns the timestamp of the first datapoint for that metric
- The result is saved into Redux store and matched to related incident/alert to update the Start date displayed in the tooltip

**Manual Testing Data**:
Use `docs/incident_detection/simulate_scenarios/long-incident-15-days.csv` which creates a 15-day spanning incident for testing absolute start date display.

**Automated Coverage** (tests 1–4 in test file):
- [x] **Absolute Start Date Display** — AUTOMATED
- Tests 1–3 switch between 15-day, 7-day, and 3-day filters on a 14-day ongoing incident
- Collects start dates from four surfaces (incident table, alert table, incident tooltip, alert tooltip)
- Verifies all four start dates are identical and non-empty regardless of the selected filter (proves dates are absolute, not relative to the filter window)
- [x] **Escalating Severity Segment Stability** — AUTOMATED (test 4)
- Collects segment start dates at 15-day baseline, then switches to 7 days
- Verifies critical segment start date is unchanged when the info→warning boundary scrolls out of the visible window
- Verifies warning segment start date equals either the true segment start or the overall incident start when the info→warning boundary (10d ago) is outside the 7-day window

**Remaining Manual Steps**:
- [ ] **API Call Pattern Verification**: Monitor network requests on initial page load (requires real Prometheus endpoint)
- Verify 15 query_range calls are made on initial page load (one per day)
- Verify instant query calls for `min_over_time(timestamp(cluster_health_components_map{}))` and `min_over_time(timestamp(ALERTS{}))`
- Verify the time ranges cover the full 15-day window regardless of "Last N Days" selection

Comment thread
DavidRajnoha marked this conversation as resolved.
### 3.5 Data Integrity
**NEW, NOT AUTOMATED, TODO COO 1.4**
Expand All @@ -114,13 +131,3 @@ Use the simulation script from the `cluster-health-analyzer` repository (`local/
- [ ] Component lists combined for same group_id
- [ ] Watchdog alerts filtered out

### 3.6 Permission Denied Handling (OU-1213)
**BUG**: Page should gracefully handle 403 Forbidden responses from API endpoints.
**Automation Status**: AUTOMATED in `03.reg_api_calls.cy.ts`
- Uses mock: `cy.mockPermissionDenied({ rules: true, silences: true, prometheus: true })`
- Manual replication: Apply resources from [`docs/incident_detection/resources/`](../resources/)

- [ ] **403 Forbidden Response**: Create user with limited permissions (testuser/password123)
- Apply: `htpasswd-secret.yaml`, `oauth-htpasswd.yaml`, `limited-permissions-user.yaml`
- Login as testuser, navigate to Observe → Incidents
- Expected: `<EmptyState data-test="access-denied">` with "Restricted access" text
Loading