rwyn

rwyn ("arwin") means run what you need.

rwyn is a stage-aware planner and executor for change-driven verification. Given a code change, it determines which repository requirements are plausibly at risk, gathers and weighs the relevant evidence, constructs the smallest practical plan it can justify for the current stage of the code lifecycle, and executes that plan automatically.

The goal is simple: get the confidence you need with the least unnecessary work.

Install And Get Started

Install rwyn:

curl -fsSL https://get.rwyn.dev/install.sh | sh
brew install rwyn
cargo install rwyn

Initialize a repository:

cd your-repo
rwyn init
rwyn run --stage save
rwyn plan --stage merge
rwyn explain

For contributors or local development from source:

cargo install --path .

Or build a release binary locally:

cargo build --release
./target/release/rwyn --help

Set Up A Repo

Set up a repository in five steps:

install the CLI
run rwyn init
review .rwyn/config.yaml
run rwyn doctor
run rwyn run --stage save

rwyn init creates the initial repository model:

.rwyn/config.yaml
an initial set of stages
an initial set of steps
obvious repository structure and toolchain assumptions
suggested CI wiring

A minimal config looks like:

requirements:
  - id: tests-pass
    description: TypeScript tests pass

stages:
  save:
    default_confidence: medium
  commit:
    default_confidence: high
  merge:
    default_confidence: certain

steps:
  - id: test
    kind: test
    command: bun test
    inputs:
      - "src/**/*.ts"
    satisfies:
      - tests-pass

Set Up With An Agent

Bootstrap is the heavy-agent phase of the loop described in How The Model Is Built And Improved; after the first session, the same skill drives lighter ongoing iteration.

If you are using Claude Code or Codex, the best initial setup flow is:

ask the agent to inspect the repo and scaffold .rwyn/config.yaml
have it add declarative plugins for obvious repo-specific structure
have it run rwyn doctor
have it run rwyn plan --stage save
have it explain any surprising selections

The agent uses the rwyn skill or plugin surface for setup. Repository truth lives in config and plugins, not in prompts.

Improving The Model

The model improves as the repository gives rwyn more information:

a minimal working config
declared prerequisites and hard relationships
dynamic evidence such as coverage
plugins for hidden structure
gaps, replay, and compare for ongoing refinement

Practical habits that move the model forward:

keep steps narrow and scopeable
declare obvious prerequisites and hard relationships explicitly
collect coverage so test scoping and confidence improve
model generated artifacts and hidden dependencies with plugins
treat repeated expensive early-stage work as a sign that the repo needs a cheaper signal

Collect Better Evidence

Coverage tells rwyn what code a step actually exercises, which sharpens scoping and confidence beyond what declared and static evidence can give.

Use this loop to keep dynamic evidence current:

rwyn coverage status
rwyn coverage refresh
rwyn coverage collect --kind bun-typescript --step bun-test

Useful evidence is:

incremental
scope-aware
fresh enough to trust
shared across local runs, CI, and agents when possible

Add Plugins When The Repo Has Hidden Structure

Plugins capture repository truth that is real but not obvious from plain file layout:

generated-artifact relationships
hidden dependency edges
interface-to-implementation links
path-derived scopes
repository-specific structure that affects relevance or confidence

Why `rwyn` exists

Most teams still treat verification as tribal knowledge.

Developers learn rules like "if you touch this area, run these tests." CI pipelines encode partial logic in scattered configs and scripts. Agents miss important repo-specific checks, or overrun by falling back to "run everything."

rwyn exists to replace that folklore with a repository model. Instead of teaching every human and every agent what to run for every kind of change, the repository declares what it cares about once, and rwyn plans and executes from that model everywhere.

Fundamentals

rwyn is built around a small set of concepts:

requirement A property the repository wants to hold, such as formatting being correct, generated artifacts being current, relevant builds succeeding, or relevant tests passing.
step An executable action that provides evidence about, verifies, satisfies, or helps satisfy one or more requirements.
evidence The raw information rwyn uses to decide what is relevant, which steps are useful, and when a plan is sufficient.
plan A stage-specific decision about which steps to run, in what order, at what scope, for which requirements.
stage A repo-defined lifecycle checkpoint with a default confidence target for relevant requirements.
confidence A global concept applied per requirement. Different requirements do not redefine what confidence means; they differ in what evidence is needed to reach it.

The model is many-to-many:

a requirement may be supported by multiple steps
a step may support multiple requirements
some steps fully satisfy a requirement
some steps provide only partial evidence

That lets the repository express realities like:

a formatter satisfying a formatting requirement
a non-mutating formatting step verifying the same requirement
a narrow unit test providing partial evidence about a broader integration risk
a generation step satisfying an artifact-freshness requirement that later verification depends on

The same logical requirement may admit different operational strategies at different stages. The repository model defines those choices explicitly.

How `rwyn` Thinks

rwyn is fundamentally an evidence system.

For a given change, rwyn first asks which requirements have non-zero plausible risk. Then it asks which steps provide the best next evidence for those requirements at the current stage. Planning stops when every relevant requirement reaches its effective confidence target.

This means rwyn distinguishes between two phases:

selection Which requirements are plausibly in play for this change?
planning Which steps should run now so each relevant requirement reaches the confidence needed for this stage?

A plan gathers enough evidence so that every relevant requirement reaches its target with the least unnecessary work. The planning question is:

what is the cheapest evidence I can gather now that reduces the chance of later-stage failure enough for this stage?

If a slow step is genuinely necessary at an early stage, rwyn runs it. Repeated expensive early-stage work is diagnostic: the repo is missing a cheaper earlier signal for that risk.

Evidence Model

Evidence remains inspectable. rwyn can explain:

why a requirement is relevant
why a step is useful
why the selected plan is sufficient

Those are three distinct layers of evidence:

requirement evidence for relevance
step evidence for usefulness
plan evidence for sufficiency

Relevance is computed from a stack of evidence sources, with stronger evidence preferred before weaker evidence:

declared repository knowledge
static structural evidence
semantic or AST-level evidence
dynamic execution evidence such as coverage or traces
historical empirical evidence
heuristics and priors last

Freshness, scope, reliability, cost, contradiction, and recency are all inputs to the calculation.

Confidence Model

Confidence is the probability, for a given requirement, that the selected subset of relevant checks catches what the full set of relevant checks would catch, measured against observed outcomes.

A target of 0.75 for a requirement means: calibrated against observed history, the selected subset is expected to catch the same set of failures the full set would catch with at least 0.75 probability. 1 - 0.75 = 0.25 is the acceptable probability that a failure surfaces later.

Confidence is tracked per relevant requirement, not as one global score for the whole change.

For each change:

rwyn identifies the requirements with non-zero plausible risk.
Each relevant requirement gets a confidence estimate from the evidence the planner has, the priors it carries, and the calibration accumulated from prior runs.
Candidate steps are evaluated by how much useful evidence they provide relative to cost.
rwyn keeps selecting steps until every relevant requirement reaches its effective confidence target.

The stage ladder is a probability budget spread across the lifecycle: early-stage checks accept a higher probability of missed failures because later stages re-verify at higher targets.

Confidence targets inherit cleanly:

stage default -> requirement override

A stage supplies a default confidence target for the requirements relevant at that lifecycle point, declared with the default_confidence: field in .rwyn/config.yaml. Every stage must declare one; missing defaults are surfaced by rwyn doctor.

Confidence is configured on one global scale. Repositories can use either named labels or numeric values, and both resolve to the same underlying targets.

The built-in confidence labels map to:

Label	Numeric target
`low`	`0.25`
`medium`	`0.50`
`high`	`0.75`
`very_high`	`0.90`
`certain`	`1.00`

Numeric values use the same 0.00 to 1.00 scale. For example, confidence: 0.85 sets a stricter target than high and a looser target than very_high.

Within a single planning pass, confidence accumulation is monotonic: adding valid evidence only maintains or increases confidence for a requirement.

Calibration is empirical. In a fresh repo, targets are reached using declared evidence and priors; as run history accumulates, calibration sharpens. The planner reports per requirement whether its confidence number is calibrated against history or still relying on priors.

How The Model Is Built And Improved

The planner is one artifact of rwyn's verification model. Building and maintaining that model is the rest of the system.

The model is built and improved in three modes that share the same machinery:

Bootstrap. Turn a fresh repo into a usable model in one good agent session. Sources include programmatic analysis (file structure, AST, language detection), dynamic evidence (coverage when collected), and AI-elicited declared knowledge (the strongest tier in the evidence stack). The only evidence source legitimately missing at this point is historical outcomes; everything else can be in place from the first session.
Iteration. When rwyn misses a failure or pays too much for confidence, the diagnostic surface (gaps, explain, replay) describes the gap as honestly as it can: clean attribution where the data supports it, candidate causes where it does not, and explicit "I cannot tell" where it cannot. The agent reads that report, proposes a model change, validates with replay, and commits.
Calibration. Background sharpening of probability estimates from accumulated runs. The planner's predictions become more honest as outcomes flow back into the model.

Declared And Learned

rwyn combines declared semantics with empirical evidence.

Users declare what they know for sure:

explicit requirement and step relationships
prerequisites
obvious full-satisfaction cases
stage configuration
scope rules
repo-specific structure

Everything else is learned empirically over time:

how predictive a step really is for a requirement
which early steps substitute well for broader later steps
how much confidence a scoped run really buys
which failure surfaces are under-modeled
where the repo is missing a cheaper earlier signal

Declared configuration remains authoritative for planning and execution. When observed outcomes repeatedly contradict declared assumptions, rwyn surfaces the divergence through warnings, reports, and recommendations.

Misses Are Typed

When a step fails at a later stage, rwyn looks backward to find the earlier stages where the same step was a candidate but was skipped. The miss type comes from why it was skipped:

Selection miss. The relevance gate filtered the step out wrongly. Fix: tighten relevance.
Weight miss. The step was a candidate, but the planner believed another step substituted for it. Fix: adjust evidence weights.
Set miss. The step was not a candidate for the relevant requirement at the earlier stage. Fix: declare or learn the link.
Link miss. The change-to-step relationship was not modeled at the earlier stage. Fix: add a plugin or declared edge.

Failures where no current step would have caught the problem (a novel failure mode, an unmodeled risk) are a different gap class — "no earlier signal exists for this failure type" — and surface separately, not as miss attribution.

The Skill Is The Loop's Driver

rwyn produces diagnostics and accepts model changes. The orchestration of "read gap → propose change → validate → commit" lives in the skill. The bundled Claude Code and Codex skills are reference drivers; the loop they implement is one example among many.

JSON outputs (rwyn gaps --json, rwyn explain --json, rwyn plan --json) are the public APIs the loop writes against. Their schemas are stable across versions.

What `rwyn` Produces

For each change, rwyn produces a run record — the durable artifact that powers replay, compare, gaps, and calibration over time.

A run record contains:

identity: change ref (commit, diff, or range), stage, environment, timestamp, rwyn version, model state hash
plan: the selected steps, the candidate steps that were skipped, the per-requirement confidence reached, scopes
decisions: for each candidate step, why it was selected or skipped — the data that powers explain and miss attribution
outcomes: per executed step, pass/fail, duration, exit code, captured evidence (coverage paths, traces)
provenance: source of the record (a local rwyn run, a CI run, or an external ingest)

Plans are proposals before execution and records after. The same object survives both phases, so intent, outcomes, and later attribution all reference the same artifact.

Storage And Sync

By default, run records live locally in .rwyn/runs/ as JSON files, one per run, and that directory is gitignored. The schema is stable across versions and admits external sources, so any record — local, CI, future hosted — can be ingested by any environment.

The engine ships two primitives:

rwyn export runs — write records out for transport (CI artifacts, archival, manual sharing)
rwyn ingest runs <path> — bring records from elsewhere into the local model

Local↔remote sync is orchestrated by the skill. A typical flow: CI uploads .rwyn/runs/ as a build artifact at the end of a stage; the skill, on git pull or session start, downloads new artifacts and ingests them.

An opt-in runs_storage: git_branch mode stores records on a parallel branch like rwyn/runs. The tradeoffs (repo bloat, paths and outcomes in git history, a tool writing to a branch) are why it is opt-in.

How It Works

rwyn works like this:

Model the repository.
Map a change onto that model.
Select requirements with non-zero plausible risk.
Evaluate candidate steps as evidence.
Build the smallest practical sufficient plan for the stage.
Execute that plan with ordering, prerequisites, and environment contracts preserved.
Record outcomes and feed them back into future planning.

Typical Workflow

During development:

rwyn run --stage save
rwyn run --stage commit

When work is pushed remotely:

rwyn run --stage push

Before or during integration:

rwyn run --stage merge

When the result surprises you:

rwyn explain
rwyn gaps

When the repository model needs work:

rwyn doctor
rwyn gaps

Stage Model

rwyn is stage-aware, but stages are repo-defined lifecycle checkpoints, not platform nouns like "PR" or "merge queue".

A stage provides:

a default confidence target for relevant requirements
a lifecycle marker that steps reference to declare when they apply

The planner's objective is already cheapest sufficient evidence, so cost lives in the planner, not in stage configuration. Which steps run when is decided by step-level stage applicability.

The default stage vocabulary is:

save
commit
push
merge
post_merge
release

These are examples, not a universal lifecycle. Repos define the stages that match how they actually work, including names like:

nightly
staging
hotfix
perf
security
deploy

Stages are also flexible enough to support immediate local or operational goals, such as keeping the workspace healthy, validating post-merge behavior, or preparing release artifacts.

Command Overview

The command surface is small and role-oriented.

Shared Inputs

Most commands operate on the same core inputs:

stage Which lifecycle checkpoint you are planning for, such as save, commit, push, merge, or a repo-defined custom stage.
change The change under consideration. By default this is the current local diff, but it can also be a base/head range, a commit, a pushed change, or an explicit diff artifact.
scope overrides Optional narrowing or explicit step selection when a user wants to override automatic planning.
output mode Human-readable explanation by default, with machine-readable output available for CI, agents, and tooling.

Most commands use flags in the shape of:

--stage <stage>
--base <rev>
--head <rev>
--change <change-ref>
--step <step-id>
--scope <scope>
--json

Setup And Health

`rwyn init`

Bootstrap rwyn in a repository.

init initializes the repository model and gets the repo to a usable baseline quickly.

rwyn init is responsible for:

detecting languages, tools, and common repo patterns
inferring an initial set of requirements and steps
creating .rwyn/config.yaml
suggesting stage defaults
suggesting CI wiring
optionally scaffolding plugins for common repo-specific structure

Examples:

rwyn init
rwyn init --yes
rwyn init --stage-defaults save,commit,push,merge

`rwyn doctor`

Validate installation, repo model, tools, environment, evidence state, and integrations.

doctor is the trust and diagnosis command. It answers questions like:

is rwyn installed correctly?
did the repo load the configuration I expect?
are required tools available?
are required environment contracts satisfied?
is the repository model stale or broken?
is coverage or other evidence missing or obviously inconsistent?

Examples:

rwyn doctor
rwyn doctor --json
rwyn doctor --stage merge

`rwyn build`

Build or refresh repository structure and derived evidence indexes.

build refreshes the repository model itself. In a mature setup it may happen automatically when needed; the explicit command is for debugging, CI bootstrap, and large repo changes.

Examples:

rwyn build
rwyn build --full
rwyn build --refresh

Planning And Execution

`rwyn run`

Plan and execute the right steps for the current change and stage.

run is the primary command and the one most day-to-day use lives in.

run is responsible for:

selecting relevant requirements
evaluating candidate steps as evidence
building a sufficient plan for the requested stage
executing the selected steps in the right order
recording results for replay, comparison, analytics, and learning

Examples:

rwyn run --stage save
rwyn run --stage commit
rwyn run --stage merge
rwyn run --stage merge --json

run also supports explicit user intent when needed:

rwyn run --stage save --step rust-test
rwyn run --stage commit --scope src/foo.ts
rwyn run --stage merge --change origin/main...HEAD

`rwyn plan`

Show the selected plan without executing it.

plan shows:

what would run
why it would run
what is being scoped
what prerequisites would be pulled in
what confidence targets are driving the decision

Examples:

rwyn plan --stage save
rwyn plan --stage merge
rwyn plan --stage merge --json
rwyn plan --stage merge --change origin/main...HEAD

`rwyn explain`

Explain a single planning decision.

explain operates on one decision at a time — the most recent plan, or a specific target like a file, requirement, or step. It answers:

why a requirement is relevant
why a step was selected
why a scope was chosen
why a broader or cheaper alternative was not chosen
why the final plan is sufficient for the stage

For model-wide introspection — where the model itself is wrong, weak, or contradicted by observed outcomes — use gaps.

Examples:

rwyn explain
rwyn explain path/to/file.ts
rwyn explain --step integration-tests
rwyn explain --requirement formatting

Evidence And Diagnostics

`rwyn coverage ...`

Manage coverage and related dynamic execution evidence.

Coverage is one evidence source among many. These commands let the repo inspect, refresh, collect, and ingest coverage without treating it as the whole system.

Examples:

rwyn coverage status
rwyn coverage refresh
rwyn coverage collect --kind bun-typescript --step bun-test
rwyn coverage ingest path/to/lcov.info

`rwyn ingest ...`

Ingest external evidence or historical results.

This command family brings externally generated evidence into rwyn's model, including coverage, execution reports, CI artifacts, and learned priors.

Examples:

rwyn ingest coverage path/to/lcov.info
rwyn ingest runs path/to/run-records/
rwyn ingest evidence path/to/report.json

`rwyn replay`

Re-evaluate historical changes against the current model.

replay answers: if the current planner had existed in the past, what would it have chosen, and what would it have missed?

This matters for:

validating model changes
measuring recall
understanding regressions
improving trust before changing policy

Examples:

rwyn replay
rwyn replay --stage merge
rwyn replay --since 30d

`rwyn compare`

Compare behavior across stages, environments, or time.

compare helps answer questions like:

what changed between local and CI behavior?
what changed after a policy update?
why does merge run more than commit here?
where plans diverge in ways that matter

Examples:

rwyn compare --group change
rwyn compare --stage commit --stage merge
rwyn compare --environment local --environment ci

`rwyn gaps`

Surface where the model itself is wrong, weak, or contradicted.

Where explain introspects a single decision, gaps introspects the model against ground truth and accumulated outcomes. It surfaces two classes of gaps:

correctness gaps Missing early signals, contradicted declarations, under-modeled requirements, weak evidence paths.
efficiency gaps Expensive early-stage work, broad steps that need narrower scopes, missing cheaper proxies, repeated unnecessary evidence gathering.

Calibration of evidence weights from observed outcomes happens automatically as runs accumulate; gaps is how that calibration surfaces.

Examples:

rwyn gaps
rwyn gaps --stage commit
rwyn gaps --kind efficiency
rwyn gaps --json

Configuration And Integration

`rwyn config ...`

Inspect, validate, and edit effective configuration.

This command family answers questions like:

what config is actually in effect?
where did this setting come from?
how are stage defaults resolving?
what does this requirement or step currently look like?

Examples:

rwyn config show
rwyn config show --effective
rwyn config explain stages.merge
rwyn config validate

`rwyn plugin ...`

Manage declarative repository-model extensions.

Plugins define repository-specific structure and evidence logic in the repo model.

Examples:

rwyn plugin list
rwyn plugin validate
rwyn plugin scaffold relation

`rwyn ci ...`

Scaffold, inspect, and validate CI integration.

Examples:

rwyn ci init github-actions
rwyn ci init circleci
rwyn ci doctor
rwyn ci show

Configuration Model

The primary config surface is .rwyn/config.yaml.

It describes:

requirements
steps
stages
plugins
runtime paths
evidence and learning policy

Split files are fine for larger repos, but the default experience is one obvious entry point.

Example:

graph: .rwyn/graph.json
coverage_data: .rwyn/coverage-data
runs_dir: .rwyn/runs

requirements:
  - id: rust-tests-pass
    description: Rust unit and integration tests pass

  - id: typescript-tests-pass
    description: TypeScript tests pass

  - id: bindings-current
    description: Generated Go bindings match the Solidity sources

plugins:
  - id: solidity-interface-link
    type: relation
    from: "interfaces/**/*.sol"
    to: "src/**/*.sol"
    edge: imports
    match_rule:
      by: normalized_basename
      from_strip_prefix: I

  - id: solidity-bindings
    type: generate
    from: "src/**/*.sol"
    to: "bindings/**/*.go"
    match_rule:
      by: normalized_basename

stages:
  save:
    default_confidence: medium
  commit:
    default_confidence: high
  merge:
    default_confidence: certain

steps:
  - id: rust-test
    name: Rust tests
    kind: test
    language: rust
    command: cargo test --all-targets --all-features
    tools: [cargo]
    inputs:
      - "src/**/*.rs"
    satisfies:
      - rust-tests-pass

  - id: bun-test
    name: Bun tests
    kind: test
    language: typescript
    command: bun test
    scopeable: true
    scope_flag: ""
    scope_type: test_paths
    tools: [bun]
    inputs:
      - "src/**/*.ts"
      - "src/**/*.tsx"
    coverage:
      kind: bun-typescript
      pass_scopes: true
    satisfies:
      - typescript-tests-pass

Explicit CLI flags still override config when needed.

Requirements

Requirements are first-class declared objects. Each one names a property the repository wants to hold; steps reference requirements to declare what they provide evidence for.

A requirement describes:

identity (id, optional human-readable name)
description
optional confidence override (replaces the stage default for this requirement when relevant)

Example:

requirements:
  - id: rust-tests-pass
    description: Rust unit and integration tests pass

  - id: security-checks-pass
    description: All critical security checks pass
    confidence: certain    # always certain, regardless of stage default

  - id: bindings-current
    description: Generated Go bindings match the Solidity sources

Steps reference requirements by id, with relationship strength:

satisfies: — the step's success fully addresses the requirement
evidence_for: — the step is candidate evidence for the requirement; its lift is learned from outcomes

steps:
  - id: cargo-fmt-check
    satisfies:
      - formatting-clean

  - id: rust-test
    satisfies:
      - rust-tests-pass
    evidence_for:
      - bindings-current     # rust tests indirectly exercise generated bindings

evidence_for contributes zero confidence until the planner has enough observed outcomes to calibrate the lift. The declaration marks the step as candidate evidence: when it runs (because it satisfies something else, or because it is cheap), its outcomes accumulate against the requirement and a learned weight emerges over time.

A mutating step (a formatter applying fixes) and a non-mutating step (a formatter in check mode) are two different steps. Each can declare stage applicability — stages: [list] to limit to specific stages, exclude_stages: [list] to remove specific ones, or neither to apply at every stage. The planner picks from stage-eligible steps:

steps:
  - id: cargo-fmt
    kind: format
    mutating: true
    stages: [save, commit]
    satisfies:
      - formatting-clean

  - id: cargo-fmt-check
    kind: format
    stages: [merge, push]
    satisfies:
      - formatting-clean

Mutation is a step property recorded on the step itself; behavior across stages is controlled by which step is listed where.

Steps And Execution

A step describes:

identity and kind
command
inputs and outputs
explicit prerequisites for non-file dependencies (requires:)
which requirements it satisfies or provides evidence_for
stage applicability (stages: to allowlist, exclude_stages: to blocklist; defaults to all stages)
whether it mutates (mutating: true)
whether and how it can be scoped
toolchain requirements
required environment variables
optional evidence collectors such as coverage

Explicit step invocation uses the normal planner and executor, so prerequisites, layering, and evidence rules still apply.

Examples:

rwyn plan --step rust-test
rwyn plan --step bun-test --scope src/foo.test.ts
rwyn plan --step lint --step test --step-scope test=src/foo.ts

rwyn run always executes; rwyn plan never does. They share the same arg shape, so any preview is the same invocation with plan instead of run.

Step ordering is derived from declared inputs and outputs by default. If step B's inputs include a path that step A's outputs produce — directly, or via a generate-type plugin relationship — the planner runs A before B without anything explicit.

For dependencies that are not file-based (a service that must be running, a setup script that exports environment, a remote resource that must be initialized), declare them explicitly with requires::

steps:
  - id: db-migrate
    kind: setup
    command: ./scripts/migrate.sh
    stages: [save, commit, merge]

  - id: integration-test
    kind: test
    command: bun test --integration
    requires: [db-migrate]
    inputs:
      - "src/**/*.ts"
    satisfies:
      - integration-tests-pass

The planner combines implicit (file-derived) and explicit (requires:) ordering into a single dependency graph and executes steps in valid topological order. Cycles are surfaced by rwyn doctor.

Steps can also declare environment contracts:

steps:
  - id: slice-v5
    name: Slice adapter v5
    kind: test
    language: typescript
    command: bun run scripts/integration-run.ts --adapter v5
    tools: [bun]
    required_env:
      - FELDERA_API_URL
      - FELDERA_API_TOKEN

CI Integration

rwyn integrates with existing CI systems:

CI remains the execution substrate
rwyn becomes the planner and executor
local development, agents, and CI all use the same verification model

rwyn works adopted entirely locally. With local and CI both routed through it, plans, evidence, and outcomes reinforce each other over time.

A CI setup looks like:

- name: Install rwyn
  run: curl -fsSL https://get.rwyn.dev/install.sh | sh

- name: Run merge-stage verification
  run: rwyn run --stage merge

CI bootstrap commands look like:

rwyn ci init github-actions
rwyn ci init circleci
rwyn ci doctor

Coverage And Dynamic Evidence

rwyn treats coverage as one dynamic execution signal among many, used for scoping and confidence updates.

Coverage and other evidence are:

incremental
scope-aware
freshness-aware
reusable across local runs, CI, and agents

Examples:

rwyn coverage status
rwyn coverage refresh
rwyn coverage collect --kind bun-typescript --step bun-test

Executed plans also produce normalized run records that feed replay, compare, and gaps. Calibration of evidence weights from those records happens automatically; the loop that uses them to improve the model over time is described in How The Model Is Built And Improved.

Repository Modeling And Plugins

Repo-specific structure lives in declarative repository knowledge: hidden dependency relationships, generated-artifact relationships, path-to-scope derivation, and any repository-specific structure that affects relevance or confidence.

The plugin DSL is extensible. New types can be added as the engine learns new repository patterns; the existing types remain stable.

`relation`

Declares an edge between two sets of files. When a file in from: changes, files matched in to: are treated as semantically affected, and the planner uses the edge during relevance computation. The edge: label is a free-form string that surfaces in explain output ("touched via interface link") but does not drive planning logic — the planner cares that an edge exists, not what it is named.

- id: solidity-interface-link
  type: relation
  from: "interfaces/**/*.sol"
  to: "src/**/*.sol"
  edge: imports
  match_rule:
    by: normalized_basename
    from_strip_prefix: I

`generate`

Declares that files in from: produce files in to:. Two effects: a generator step runs before any step that consumes the output, and changes to from: invalidate the freshness of the corresponding to: files until regenerated.

- id: solidity-bindings
  type: generate
  from: "src/**/*.sol"
  to: "bindings/**/*.go"
  match_rule:
    by: normalized_basename

`scope`

Derives an execution scope for a scopeable step from a change. When changed files match from:, the named target_step:'s scope becomes the matching to: paths. Lets the planner narrow a broad step to the part of the repo a change actually affects, instead of running it across everything.

- id: typescript-tests-by-module
  type: scope
  target_step: bun-test
  from: "src/**/*.ts"
  to: "tests/**/*.test.ts"
  match_rule:
    by: normalized_basename

`match_rule.by` Modes

How from: and to: glob matches are paired. Three modes ship today; more may be added as the engine grows.

Mode	Behavior	Notes
`normalized_basename`	Match by filename, stripped of optional prefix/suffix	Use `from_strip_prefix` / `from_strip_suffix` to normalize before comparison
`directory_path`	Match by directory path	Useful for "src/X/* maps to tests/X/*" style mappings
`regex`	Capture groups in `from:`, substitution in `to:`	Most flexible escape hatch; use when neither basename nor directory matching fits

The goal is to keep repository truth in the repository model itself.

Plugins

Claude Code

This repo includes an official Claude Code plugin and marketplace layout:

marketplace: .claude-plugin/marketplace.json
plugin manifest: plugins/rwyn/.claude-plugin/plugin.json

Local testing:

claude --plugin-dir ./plugins/rwyn

Public install after adding this repo as a marketplace:

claude plugin marketplace add smartcontracts/rwyn
claude plugin install rwyn@rwyn-plugins

Codex

This repo also includes a Codex plugin scaffold:

marketplace entry: .agents/plugins/marketplace.json
plugin manifest: plugins/rwyn/.codex-plugin/plugin.json

The bundled skill content lives under plugins/rwyn/skills/.

These skills are reference drivers for the loop described in How The Model Is Built And Improved. They demonstrate one good bootstrap-and-iteration flow against rwyn's diagnostic surface; users can replace them with their own.

Notable bundled skills:

rwyn Operate and debug an existing rwyn workflow.
setup Inspect a repo, scaffold .rwyn/config.yaml, and add declarative transforms.
doctor Diagnose a repo's rwyn setup and verification surface.
select Explain and inspect the chosen plan for a change.
plan Preview what rwyn would execute without running it.
explain Explain why a file or change selected a given plan item.

Benchmarking

There is a parity harness at scripts/benchmark-parity.sh.

It compares rwyn against a legacy selector on a commit corpus and reports:

selected item count
missing selections vs legacy
extra selections vs legacy
per-commit runtime

Development

Run the core checks locally:

cargo fmt
cargo clippy --all-targets --all-features -- -D warnings
cargo test --all-targets --all-features

License

MIT, see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.agents/plugins		.agents/plugins
.claude-plugin		.claude-plugin
.github/workflows		.github/workflows
.rwyn		.rwyn
plugins/rwyn		plugins/rwyn
scripts		scripts
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

rwyn

Install And Get Started

Set Up A Repo

Set Up With An Agent

Improving The Model

Collect Better Evidence

Add Plugins When The Repo Has Hidden Structure

Why rwyn exists

Fundamentals

How rwyn Thinks

Evidence Model

Confidence Model

How The Model Is Built And Improved

Declared And Learned

Misses Are Typed

The Skill Is The Loop's Driver

What rwyn Produces

Storage And Sync

How It Works

Typical Workflow

Stage Model

Command Overview

Shared Inputs

Setup And Health

rwyn init

rwyn doctor

rwyn build

Planning And Execution

rwyn run

rwyn plan

rwyn explain

Evidence And Diagnostics

rwyn coverage ...

rwyn ingest ...

rwyn replay

rwyn compare

rwyn gaps

Configuration And Integration

rwyn config ...

rwyn plugin ...

rwyn ci ...

Configuration Model

Requirements

Steps And Execution

CI Integration

Coverage And Dynamic Evidence

Repository Modeling And Plugins

relation

generate

scope

match_rule.by Modes

Plugins

Claude Code

Codex

Benchmarking

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Why `rwyn` exists

How `rwyn` Thinks

What `rwyn` Produces

`rwyn init`

`rwyn doctor`

`rwyn build`

`rwyn run`

`rwyn plan`

`rwyn explain`

`rwyn coverage ...`

`rwyn ingest ...`

`rwyn replay`

`rwyn compare`

`rwyn gaps`

`rwyn config ...`

`rwyn plugin ...`

`rwyn ci ...`

`relation`

`generate`

`scope`

`match_rule.by` Modes

Packages