Skip to content

incident-management: tighten IR template structure and pipeline runbook#424

Open
frameworks-volunteer wants to merge 2 commits intosecurity-alliance:developfrom
frameworks-volunteer:matta/ir-template-tightening
Open

incident-management: tighten IR template structure and pipeline runbook#424
frameworks-volunteer wants to merge 2 commits intosecurity-alliance:developfrom
frameworks-volunteer:matta/ir-template-tightening

Conversation

@frameworks-volunteer
Copy link
Contributor

Summary

This PR is a first pass on the recently added Incident Response Template section.

The goal is not to expand the section broadly, but to make it clearer, tighter, and more operationally credible without adding filler or speculative content.

This pass focuses on three things:

  1. clarifying the distinction between framework guidance, templates, runbooks, and playbooks
  2. reducing a few over-absolute statements
  3. upgrading the weakest runbook in the set (build-pipeline-compromise) into something more responder-oriented

What changed

1) Clarified content taxonomy

Added concise framing so readers can understand what each layer is for:

  • incident-management/overview.mdx
  • clarifies that the section now contains both:
  • framework guidance
  • operational templates
  • incident-management/playbooks/overview.mdx
  • reframes playbooks as reference material, not drop-in internal operating procedures
  • points readers to the template/runbook sections for copy-and-adapt operational docs
  • incident-response-template/overview.mdx
  • clarifies that the broader incident-management pages explain concepts/practices
  • clarifies that the template section is intended to be copied/customized for internal use
  • distinguishes:
  • policy / roles / communications / contacts
  • templates
  • runbooks
  • incident-response-template/templates/overview.mdx
  • clarifies when to use templates vs runbooks vs policy pages
  • incident-response-template/runbooks/overview.mdx
  • clarifies that runbooks are operational procedures, distinct from framework playbooks and blank templates

2) Tightened a few absolute statements

  • incident-response-template/incident-response-policy.mdx
  • changed:
  • "Monitor for at least a week"
  • to:
  • "Monitor based on residual risk, blast radius, and incident type"
  • incident-response-template/roles-and-staffing.mdx
  • changed:
  • "These people should be reachable 24/7"
  • to:
  • "There should be a 24/7 escalation path to these people"

These changes are meant to make the guidance more realistic and less doctrinal.

3) Upgraded the build pipeline compromise runbook

incident-response-template/runbooks/build-pipeline-compromise.mdx was previously a thin stub. This PR upgrades it into a more credible example runbook by adding:

  • better identification criteria
  • scope questions
  • differentiation from adjacent incident classes
  • immediate actions that reflect actual responder priorities:
  • freeze pipeline
  • preserve evidence
  • rotate credentials by blast radius
  • stop trusting recent outputs
  • investigation questions focused on access path, permissions, credential exposure, and affected outputs
  • containment / recovery options:
  • rebuild from known-good commit using clean pipeline
  • rollback to last known-good release
  • keep service paused until trust is re-established
  • a verification gate before normal delivery resumes
  • a concise hardening checklist after the incident

What this PR does not do

Intentionally out of scope for this first pass:

  • broad content expansion
  • adding new Web3-specific runbooks just to fill gaps
  • renaming sections or restructuring the sidebar deeply
  • inventing protocol-specific operational steps without high confidence

I would rather leave gaps visible than fill them with weak or speculative guidance.

Why this scope

The Incident Response Template addition is already valuable, but right now it mixes:

  • framework/reference material
  • internal templates
  • runbooks

This first pass tries to make that structure easier to understand, while also strengthening one page that felt materially underdeveloped.

Follow-up ideas (not included here)

Possible future passes, if useful:

  • strengthen frontend-compromise and dependency-attack
  • add battle-tested Web3-native scenarios only where confidence is high
  • revisit naming/IA if the team wants clearer labels than the current playbook/runbook/template split

@github-actions
Copy link

github-actions bot commented Mar 21, 2026

built with Refined Cloudflare Pages Action

⚡ Cloudflare Pages Deployment

Name Status Preview Last Commit
frameworks ✅ Ready (View Log) Visit Preview 56913d1

@frameworks-volunteer
Copy link
Contributor Author

Second pass update

This second pass keeps the same philosophy as the first one:

  • no broad expansion
  • no filler
  • no speculative Web3-specific guidance
  • only tighten pages where the operational value is clear and high-confidence

What changed in this pass

This pass focuses on the two runbooks that still felt materially underpowered:

  • incident-response-template/runbooks/frontend-compromise.mdx
  • incident-response-template/runbooks/dependency-attack.mdx

1) Strengthened frontend-compromise

This page now better reflects how frontend incidents actually behave in practice, especially in Web3 where a frontend compromise often becomes a user-signing or
approval-theft incident very quickly.

Changes include:

  • clearer identification and scope questions
  • stronger focus on stopping service quickly
  • explicit emphasis on warning users early and clearly
  • preserving evidence before cleanup
  • tighter framing around identifying the real trust-boundary failure:
  • DNS
  • CDN/hosting
  • dependency
  • build pipeline
  • improved recovery conditions before restoring service
  • more practical affected-user support guidance

The goal here was to make the page more useful during the first minutes of an actual incident, not just more complete on paper.

2) Strengthened dependency-attack

This page was still too close to a stub. It now better distinguishes between a generic vulnerable package and a dependency incident that may have affected real build
outputs, releases, or users.

Changes include:

  • better scope questions:
  • production vs build-only exposure
  • build-time vs runtime execution
  • possible credential / artifact impact
  • clearer differentiation from:
  • frontend compromise
  • build pipeline compromise
  • stronger immediate actions:
  • freeze releases
  • identify the exact package/version path
  • stop trusting recent outputs
  • preserve evidence
  • improved investigation questions
  • more credible containment / recovery options
  • a verification gate before resuming normal delivery
  • tighter prevention guidance focused on dependency discipline and build trust

What I intentionally did not change

Still intentionally out of scope:

  • adding new runbooks just to close every possible gap
  • speculative guidance for scenarios that need deeper expertise or stronger repo context
  • touching pages that did not clearly benefit from high-confidence tightening

For example, I left key-compromise unchanged in this pass rather than make lower-confidence edits.

Why this is the last pass

At this point, the highest-value weak spots in the imported IR template section have been addressed without turning the PR into a broad rewrite.

This keeps the contribution focused on:

  • clearer information architecture
  • more realistic wording
  • stronger responder-oriented runbooks where they were obviously too thin

@mattaereal mattaereal requested a review from scode2277 March 22, 2026 00:54
@mattaereal mattaereal self-assigned this Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants