Skip to content

Add problem automation contract and validator#63

Open
saarang123 wants to merge 1 commit intotensara:mainfrom
saarang123:codex/problem-automation-contract
Open

Add problem automation contract and validator#63
saarang123 wants to merge 1 commit intotensara:mainfrom
saarang123:codex/problem-automation-contract

Conversation

@saarang123
Copy link
Copy Markdown

@saarang123 saarang123 commented Mar 31, 2026

Summary

  • add a backward-compatible authoring contract for def.py and problem.md
  • add a validation contract covering structural CI, local CUDA validation, and Modal/product-runtime validation
  • add a generic scripts/validate_problem.py validator with text and JSON output
  • add templates for new agent-authored problems
  • add PR-time structural validation in GitHub Actions

Why

The goal is to make tensara/problems much more agent-friendly and support reliable automated problem growth.

This PR is the first layer:

  • stable authoring format
  • stable validation format
  • machine-readable diagnostics
  • CI enforcement that does not break the current corpus

It is intentionally contract-first and backward-compatible, not a broad migration.

Included

  • docs/problem-authoring-contract.md
  • docs/problem-validation-contract.md
  • docs/problem-automation-roadmap.md
  • scripts/validate_problem.py
  • templates/problem-template.def.py
  • templates/problem-template.md
  • .github/workflows/validate-problems.yml
  • README updates for the new contract and validation flow

Validation Model

This PR defines a 3-tier validation model:

  1. structural validation in normal CI
  2. local CUDA validation on real GPUs such as Together H100
  3. Modal/product-runtime validation as the authoritative final gate

That means cheap local GPU checks are still useful, but long-term runtime truth should come from the same Modal-backed path used by the real product.

What validate_problem.py does

Structural mode checks:

  • required files exist
  • required frontmatter exists
  • slug consistency
  • Problem subclass exists
  • required methods exist
  • method signatures match the stable contract
  • parameters/signature contract is present

Runtime mode is designed to validate problem behavior, not just schema:

  • load the problem
  • run sample / generated cases
  • run the reference path
  • confirm the verifier accepts correct outputs
  • confirm perturbed wrong outputs are rejected

Today:

  • structural validation is fully wired and CI-safe
  • local runtime validation is supported
  • Modal/product-runtime validation is the intended authoritative path and should be the next acceptance-layer to rely on for automation

Backward Compatibility

  • existing published problems are not forced to adopt new metadata immediately
  • optional metadata such as source, authoring, and validation is additive
  • current corpus passes structural validation without breaking changes
  • one legacy warning remains:
    • problems/mse-loss/problem.md uses mse_loss instead of mse-loss

Validation

Ran:

python3 scripts/validate_problem.py --runtime none --format text

Result on current main corpus:

  • 84 problems checked
  • 0 errors
  • 1 warning
  • 84 infos

Also verified:

python3 scripts/validate_problem.py relu --runtime none --format json

So the validator is working both as a human-readable structural checker and as a machine-readable surface for agents.

Follow-up

This PR does not yet make Modal validation the default enforcement path in CI. The next step should be wiring the real Modal/sample/checker runtime into the automation flow so product-runtime validation becomes part of the actual merge
pipeline.

If you want, I can also give you a shorter reviewer-oriented version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants