Skip to content
58 changes: 58 additions & 0 deletions adr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# ADR-0534: Add #[must_use] to render_sensor_report()

## Status
Proposed

## Context

The `render_sensor_report()` function in `crates/diffguard-core/src/sensor.rs` returns a `SensorReport` containing critical sensor data (findings, verdict, run metadata, and artifacts). This function is part of the R2 Library Contract for Cockpit ecosystem integration — it is exported publicly from `diffguard-core` and used as the primary data contract for the sensor system.

Unlike other render functions that return `String` (where discarding the result produces visible empty output), `render_sensor_report()` returns a structured `SensorReport`. If a caller discards this value, the entire sensor report is silently lost with no compiler warning. In a governance/sensor system where findings, verdict status, and run metadata are critical, silent data loss is a serious failure mode.

The codebase already uses `#[must_use]` extensively on similar functions:
- `diffguard-diff/src/unified.rs` — `is_submodule()`, `is_added()`, etc.
- `diffguard-domain/src/suppression.rs` — `suppress()` variants
- `diffguard-domain/src/overrides.rs` — `override_finding()`
- `diffguard-types/src/lib.rs` — various type methods
- `diffguard-analytics/src/lib.rs` — multiple functions

## Decision

Add `#[must_use]` attribute to the `render_sensor_report()` function in `crates/diffguard-core/src/sensor.rs` at line 44.

```rust
/// Renders a CheckReceipt as a SensorReport.
#[must_use]
pub fn render_sensor_report(receipt: &CheckReceipt, ctx: &SensorReportContext) -> SensorReport {
```

## Consequences

### Benefits
- **Compile-time enforcement against silent data loss**: Any caller that discards the `SensorReport` will now receive a compiler warning, making the bug immediately visible rather than silently failing.
- **Consistency with codebase conventions**: The attribute aligns with the established pattern of `#[must_use]` usage across the monorepo.
- **Strengthens the R2 Library Contract**: The `SensorReport` is explicitly the stable integration surface for Cockpit/BusyBox consumers. Making it a compile-time error to discard the result reinforces this contract.
- **Purely additive change**: No runtime behavior change. No existing correct code is affected.

### Tradeoffs/Risks
- **Potential warnings in existing code**: If any existing code (inside or outside `diffguard-core`) calls `render_sensor_report()` and discards the result, it will now produce a compiler warning. However, this only exposes pre-existing bugs — such code was already silently losing sensor data.
- **CI lint gates**: If other work items or CI pipelines have `#[deny(warnings)]`, newly warned code could cause failures. However, this is the intended behavior — those callers need to be fixed.

### No impact on:
- `render_sensor_json()` — returns `Result<String, serde_json::Error>`, which has implicit `#[must_use]` on `Result` in Rust 2018+.
- Runtime behavior — `#[must_use]` is purely compile-time.
- Backward compatibility — only affects code that was already incorrect.

## Alternatives Considered

### 1. Do nothing (leave without #[must_use])
- **Rejected because**: Silent data loss remains a latent defect. As Cockpit ecosystem integration grows, more callers might emerge that accidentally discard the result, and the bug would only be discovered when sensor dashboards go blank or integration tests silently pass with no findings.

### 2. Add #[must_use] to multiple render_* functions
- **Rejected because**: The issue specifically calls out `render_sensor_report()`. Other render functions (`render_csv_for_receipt()`, `render_junit_for_receipt()`) return `String`, which produces visible empty output if discarded. `render_sensor_report()` returns a structured `SensorReport` with no visible indication when discarded — a qualitatively different failure mode.

### 3. Document the requirement in code comments only
- **Rejected because**: Documentation can be ignored. The `#[must_use]` attribute provides compile-time enforcement that comments cannot.

## Dependencies
- None. The change is self-contained and does not require any other work items or external dependencies.
31 changes: 29 additions & 2 deletions crates/diffguard-core/src/sensor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
//!
//! This module converts CheckReceipt to the `sensor.report.v1` format.

use std::collections::{BTreeMap, HashMap};
use std::collections::{BTreeMap, BTreeSet, HashMap};

use diffguard_types::{
Artifact, CHECK_ID_PATTERN, CapabilityStatus, CheckReceipt, RunMeta, SENSOR_REPORT_SCHEMA_V1,
Expand Down Expand Up @@ -41,6 +41,22 @@ pub struct RuleMetadata {
}

/// Renders a CheckReceipt as a SensorReport.
///
/// The `#[must_use]` attribute is enforced by the R2 Library Contract — callers
/// MUST NOT discard the returned `SensorReport`. Discarding it would cause silent
/// sensor data loss, which is unacceptable for Cockpit ecosystem integration where
/// every run must produce an immutable, auditable sensor record.
///
/// # Returns
///
/// A `SensorReport` with schema `sensor.report.v1` containing:
/// - Tool identity and version
/// - Run metadata (timestamps, duration, capabilities)
/// - The original verdict (status and counts)
/// - All findings mapped to `SensorFinding` format with fingerprints
/// - Artifacts produced during the run
/// - Diff metadata and diffguard-specific stats (rules matched, suppressed count, etc.)
#[must_use]
pub fn render_sensor_report(receipt: &CheckReceipt, ctx: &SensorReportContext) -> SensorReport {
let findings = receipt
.findings
Expand Down Expand Up @@ -70,7 +86,7 @@ pub fn render_sensor_report(receipt: &CheckReceipt, ctx: &SensorReportContext) -

// Count distinct rule_ids across findings
let rules_matched = {
let mut seen = std::collections::BTreeSet::new();
let mut seen = BTreeSet::new();
for f in &receipt.findings {
seen.insert(&f.rule_id);
}
Expand All @@ -97,6 +113,8 @@ pub fn render_sensor_report(receipt: &CheckReceipt, ctx: &SensorReportContext) -
"rules_total": ctx.rules_total,
});

// Only include tags_matched when non-empty to keep the sensor payload clean.
// Omitting an empty tags_matched avoids sending null/empty values to the Cockpit backend.
if !tags_matched.is_empty() {
diffguard_data["tags_matched"] =
serde_json::to_value(&tags_matched).expect("serialize tags_matched");
Expand Down Expand Up @@ -131,6 +149,15 @@ pub fn render_sensor_report(receipt: &CheckReceipt, ctx: &SensorReportContext) -
}

/// Renders a CheckReceipt as a sensor.report.v1 JSON string.
///
/// This is a convenience wrapper around [`render_sensor_report`] that serializes
/// the result to a pretty-printed JSON string. Use this when you need the JSON
/// representation directly (e.g., for file output or HTTP responses).
///
/// # Errors
///
/// Returns a [`serde_json::Error`] if serialization fails (e.g., if the `SensorReport`
/// contains values that cannot be serialized to JSON).
pub fn render_sensor_json(
receipt: &CheckReceipt,
ctx: &SensorReportContext,
Expand Down
24 changes: 24 additions & 0 deletions crates/diffguard-diff/src/unified.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,19 @@ use std::path::Path;

use diffguard_types::Scope;

/// The kind of change represented by a diff line.
///
/// This is used to distinguish between:
/// - `Added`: a line that was added (starts with `+`)
/// - `Changed`: a line that was added but immediately followed a removed line (changed context)
/// - `Deleted`: a line that was removed (starts with `-`)
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum ChangeKind {
/// A line that was added in the new version.
Added,
/// An added line that appears in a changed hunk (directly follows a removed line).
Changed,
/// A line that was deleted in the new version.
Deleted,
}

Expand Down Expand Up @@ -99,17 +108,32 @@ pub fn parse_rename_to(line: &str) -> Option<String> {
parse_rename_path(rest)
}

/// A single line extracted from a unified diff.
///
/// Returned by [`parse_unified_diff`] as part of the result vector.
/// Each `DiffLine` represents one line of content with its
/// file path, 1-based line number in the post-image, and change kind.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct DiffLine {
/// The path of the file this line belongs to (post-image path for renames).
pub path: String,
/// The 1-based line number in the post-image file.
pub line: u32,
/// The content of the line (without the leading `+`, `-`, or ` ` sigil).
pub content: String,
/// The kind of change this line represents.
pub kind: ChangeKind,
}

/// Aggregate statistics from parsing a unified diff.
///
/// Returned by [`parse_unified_diff`] alongside the extracted lines.
/// Use this to determine how many files and lines were processed.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
pub struct DiffStats {
/// The number of unique files that had changes.
pub files: u32,
/// The total number of lines extracted (not the total lines in the diff).
pub lines: u32,
}

Expand Down
16 changes: 16 additions & 0 deletions crates/diffguard-domain/src/evaluate.rs
Original file line number Diff line number Diff line change
Expand Up @@ -574,12 +574,28 @@ fn trim_snippet(s: &str) -> String {
out
}

/// Extracts a substring from `s` in the range `[start, end)`, with bounds clamping.
///
/// `end` is first clamped to `s.len()`, then `start` is clamped to the
/// adjusted `end`. This guarantees `start <= end <= s.len()`, making the
/// range always valid for direct indexing.
///
/// Returns the substring as a new `String`.
fn safe_slice(s: &str, start: usize, end: usize) -> String {
// Clamp end first, then clamp start to the adjusted end.
// After these two lines: start <= end <= s.len(), so the range is always valid.
let end = end.min(s.len());
let start = start.min(end);
s.get(start..end).unwrap_or("").to_string()
}

/// Converts a byte index to a 1-based column number (character count).
///
/// Returns `None` if `byte_idx` exceeds the string length, otherwise returns
/// the number of characters in `s[..byte_idx]` plus one (to get 1-based column).
///
/// Uses direct slicing `s[..byte_idx]` because the guard on line 590 guarantees
/// `byte_idx <= s.len()`, making the range always valid.
fn byte_to_column(s: &str, byte_idx: usize) -> Option<usize> {
if byte_idx > s.len() {
return None;
Expand Down
Loading
Loading