Skip to content

Add agent cli#682

Draft
waltsims wants to merge 4 commits intomasterfrom
add-agent-cli
Draft

Add agent cli#682
waltsims wants to merge 4 commits intomasterfrom
add-agent-cli

Conversation

@waltsims
Copy link
Copy Markdown
Owner

@waltsims waltsims commented Mar 27, 2026

Greptile Summary

This PR introduces an agent-first CLI (kwave) for k-Wave simulations, allowing LLM agents (or humans) to drive the full simulation workflow — session management, phantom generation, source/sensor definition, pre-flight planning, and execution — entirely through structured JSON-in / JSON-out commands. It also adds a thin progress_callback hook into the Python solver so the run command can stream real-time progress events to callers.

Key changes:

  • New kwave/cli/ package: session.py (file-backed state), schema.py (JSON envelope + error types), main.py (Click group wiring), and six command modules (session, phantom, source, sensor, plan, run).
  • kspaceFirstOrder and kspace_solver.Simulation.run extended with an optional progress_callback(step, total) — only the Python backend honours it; the cpp backend silently ignores it.
  • pyproject.toml gains an optional [cli] extra (click>=8.0), a kwave console script entry point, and the new packages added to the wheel target.
  • Two integration test suites (test_e2e.py, test_1d_ivp.py) do bit-exact comparisons between CLI output and direct Python API calls, giving strong correctness coverage.

Findings:

  • traceback is imported but unused in schema.py.
  • _parse_int_tuple / _resolve_scalar_or_path in phantom.py call int() / float() on raw user input without catching ValueError, so invalid inputs bypass the structured CLIError response contract.
  • Session file writes in session.py._save() are not atomic; an interrupt mid-write leaves a corrupt session.json.
  • progress_callback is silently ignored when --backend cpp is used; callers receive the started event and then silence until completion.
  • progress_callback is undocumented in kspaceFirstOrder's docstring.

Confidence Score: 5/5

Safe to merge; all findings are P2 style/polish issues with no impact on correctness of simulation results.

All five findings are P2. Core simulation logic is unchanged and two integration tests verify bit-exact parity between CLI and direct Python API paths. The new CLI is gated behind an optional [cli] extra so it cannot regress existing users.

kwave/cli/session.py (non-atomic writes), kwave/cli/commands/phantom.py (unguarded int/float conversions), kwave/cli/commands/run.py (silent cpp progress gap).

Important Files Changed

Filename Overview
kwave/cli/session.py Core session state manager backed by ~/.kwave/session.json; non-atomic writes risk corruption on interrupt, and no file locking guards against concurrent CLI processes.
kwave/cli/schema.py Defines structured JSON envelope, exit codes, and json_command decorator; unused traceback import, otherwise clean.
kwave/cli/commands/run.py Emits streaming progress events then a final CLIResponse; progress_callback is silently ignored when --backend cpp is used.
kwave/cli/commands/phantom.py Phantom load/generate commands; unguarded int()/float() calls on user input will emit raw Python tracebacks instead of structured CLIError JSON on bad values.
kwave/cli/main.py CLI entry point wiring all command groups; deferred imports below the group definition cleanly avoid circular import issues.
kwave/kspaceFirstOrder.py Adds progress_callback parameter forwarded to the Python solver; missing docstring entry and silently no-ops for cpp backend.
kwave/solvers/kspace_solver.py Minimal surgical change: run() now accepts and invokes an optional progress_callback(step, total) after each simulation step.
tests/test_cli/test_e2e.py Good end-to-end coverage: session lifecycle, phantom generation, sensor definition, plan, and a full run compared bit-for-bit against the direct Python API.
tests/test_cli/test_1d_ivp.py 1D CLI test with heterogeneous medium arrays, custom CFL, and sparse sensor; results compared exactly against the Python API.
pyproject.toml Adds optional [cli] extra (click>=8.0), kwave console script entry point, and includes kwave.cli packages in wheel build targets.
tests/test_cli/conftest.py Shared fixtures and invoke() helper that correctly extracts the last top-level JSON object from mixed streaming output.

Sequence Diagram

sequenceDiagram
    participant Agent
    participant CLI as kwave CLI
    participant Session as session.json
    participant Solver as kspace_solver

    Agent->>CLI: kwave session init
    CLI->>Session: write {id, state: defaults}
    CLI-->>Agent: {"status":"ok","result":{"session_id":...}}

    Agent->>CLI: kwave phantom generate --type disc ...
    CLI->>Session: load()
    CLI->>Session: update_many({grid, medium, source})
    CLI-->>Agent: {"status":"ok","result":{grid_size, p0_max, ...}}

    Agent->>CLI: kwave sensor define --mask full-grid
    CLI->>Session: load() then update("sensor", {...})
    CLI-->>Agent: {"status":"ok","result":{mask_type, record}}

    Agent->>CLI: kwave plan
    CLI->>Session: load() + assert_ready()
    CLI->>CLI: make_grid() / make_medium()
    CLI-->>Agent: {"status":"ok","result":{grid, cfl, memory_mb, ...}}

    Agent->>CLI: kwave run
    CLI->>Session: load() + assert_ready()
    CLI->>Solver: kspaceFirstOrder(..., progress_callback)
    loop every 5% progress
        Solver-->>CLI: callback(step, Nt)
        CLI-->>Agent: {"event":"progress","pct":...}
    end
    Solver-->>CLI: result dict
    CLI->>Session: update("result_path", data_dir)
    CLI-->>Agent: {"event":"completed",...}
    CLI-->>Agent: {"status":"ok","result":{"outputs":{p:{path,shape},...}}}
Loading

Reviews (1): Last reviewed commit: "Skip CLI tests when click is not install..." | Re-trigger Greptile

Greptile also left 5 inline comments on this PR.

waltsims and others added 4 commits March 27, 2026 10:35
Structured JSON I/O at every step, deterministic exit codes,
file-backed session model. Commands: session init/status/reset,
phantom generate, sensor define, plan, run.

Acceptance test: new_api_ivp_2D.py runs entirely via CLI with
exact numerical parity to the Python API.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- phantom load: accepts .npy files for heterogeneous medium properties
- source define: loads custom p0 from .npy file
- sensor define: supports file-based masks (sparse sensors)
- Grid materializer passes full sound speed array to makeTime (not just max)
- Custom CFL support via --cfl flag
- Test: ivp_1D_simulation.py replicated via CLI with exact parity

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Normalize medium state to unified {field}_scalar/{field}_path schema
  (phantom generate now uses same format as phantom load)
- Remove bogus PPW check from plan (Nyquist-based PPW was a false positive
  for all valid CFL values)
- Remove dead code: unused min_wavelength, unused sys import, unused Optional
- Add Session.assert_ready() and Session.update_many()
- Rename _completeness() to completeness() (public API)
- Extract _invoke helper to tests/test_cli/conftest.py
- Shrink e2e test grid from 128x128 to 48x48
- Extract _parse_int_tuple and _resolve_scalar_or_path in phantom.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Uses pytest.importorskip so test collection succeeds in CI
environments that don't install the cli extra.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Codecov Report

❌ Patch coverage is 88.46154% with 51 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.97%. Comparing base (785c0bd) to head (9ec566e).

Files with missing lines Patch % Lines
kwave/cli/session.py 84.89% 9 Missing and 12 partials ⚠️
kwave/cli/commands/phantom.py 80.55% 10 Missing and 4 partials ⚠️
kwave/cli/schema.py 89.23% 6 Missing and 1 partial ⚠️
kwave/cli/__init__.py 33.33% 4 Missing ⚠️
kwave/cli/commands/plan.py 93.54% 1 Missing and 1 partial ⚠️
kwave/cli/commands/run.py 95.12% 1 Missing and 1 partial ⚠️
kwave/cli/commands/session_cmd.py 95.83% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #682      +/-   ##
==========================================
+ Coverage   74.23%   74.97%   +0.74%     
==========================================
  Files          56       66      +10     
  Lines        8012     8453     +441     
  Branches     1566     1603      +37     
==========================================
+ Hits         5948     6338     +390     
- Misses       1441     1473      +32     
- Partials      623      642      +19     
Flag Coverage Δ
3.10 74.22% <33.33%> (-0.02%) ⬇️
3.11 74.22% <33.33%> (-0.02%) ⬇️
3.12 74.97% <88.46%> (+0.74%) ⬆️
3.13 74.22% <33.33%> (-0.02%) ⬇️
macos-latest 74.19% <33.33%> (-0.02%) ⬇️
ubuntu-latest 74.19% <33.33%> (-0.02%) ⬇️
windows-latest 74.96% <88.46%> (+0.74%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@waltsims
Copy link
Copy Markdown
Owner Author

@greptile-app


import json
import sys
import traceback
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unused traceback import

traceback is imported but never referenced anywhere in the module. The exception handler currently serializes str(e) into the error suggestion field and never calls traceback.format_exc().

Suggested change
import traceback
from dataclasses import asdict, dataclass, field

Comment on lines +34 to +44
@click.option("--cfl", type=float, default=None, help="CFL number for time step calculation")
@pass_session
@json_command("phantom.load")
def load(sess, grid_size, spacing, sound_speed, density, cfl):
"""Load medium properties from scalar values or .npy files."""
sess.load()

grid_n = _parse_int_tuple(grid_size)
ndim = len(grid_n)
grid_spacing = (spacing,) * ndim

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unguarded int() and float() conversions produce unformatted errors

_parse_int_tuple and _resolve_scalar_or_path call int() / float() on raw user input without catching ValueError. An invalid value like --grid-size 128,abc or --sound-speed notanumber raises a bare Python traceback rather than a structured ValidationError / CLIError JSON response, breaking the agent-parseable contract that the rest of the CLI upholds.

Consider wrapping these with explicit error handling:

def _parse_int_tuple(s: str) -> tuple[int, ...]:
    try:
        return tuple(int(x) for x in s.split(","))
    except ValueError:
        raise ValidationError(CLIError(
            code="INVALID_GRID_SIZE",
            field="grid_size",
            value=s,
            constraint="must be comma-separated integers, e.g. 128 or 128,128",
        ))

Comment on lines +220 to +226
def _save(self):
raw = {
"id": self._id,
"created_at": self._created_at,
"state": self._state,
}
self.session_file.write_text(json.dumps(raw, indent=2, default=str))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Non-atomic session file write can corrupt state on interrupt

write_text is not atomic — if the process is killed (SIGINT, SIGKILL, OOM) during the write the session file will be partially written and unreadable on the next CLI invocation. Using a write-to-temp-then-rename pattern eliminates this race:

def _save(self):
    import os
    raw = {
        "id": self._id,
        "created_at": self._created_at,
        "state": self._state,
    }
    tmp = self.session_file.with_suffix(".tmp")
    tmp.write_text(json.dumps(raw, indent=2, default=str))
    os.replace(tmp, self.session_file)

os.replace is atomic on POSIX and Windows (same filesystem).

Comment on lines +57 to +66
result = kspaceFirstOrder(
kgrid,
medium,
source,
sensor,
backend=backend,
device=device,
quiet=True,
progress_callback=progress_callback,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 progress_callback silently ignored for cpp backend

progress_callback is wired into the Python backend path in kspace_solver.py, but when --backend cpp is selected the callback is accepted by kspaceFirstOrder and then never forwarded to the C++ solver. The caller will receive only the {"event": "started", ...} line and then wait silently until the simulation finishes, with no intermediate progress events.

Consider emitting a warning event when backend != "python" and a callback is provided:

if backend != "python":
    _emit_event({"event": "warning", "detail": "progress_callback is only supported with the python backend"})

debug: bool = False,
num_threads: Optional[int] = None,
device_num: Optional[int] = None,
progress_callback=None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing docstring entry for progress_callback

The new parameter has no documentation in the existing docstring block, which documents all other keyword arguments. Agents or users introspecting help(kspaceFirstOrder) will have no description of the expected signature or when it is invoked.

Add to the docstring Keyword Args section:

        progress_callback: Optional callable ``(step: int, total: int) -> None``.
            Called after each simulation step. Only supported with the
            ``"python"`` backend; silently ignored for ``"cpp"``.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant