Skip to content

Harden e2e CI tests and copy job schema at MCP startup#252

Merged
nhorton merged 13 commits intomainfrom
claude/fix-e2e-ci-failures-stream-json
Mar 5, 2026
Merged

Harden e2e CI tests and copy job schema at MCP startup#252
nhorton merged 13 commits intomainfrom
claude/fix-e2e-ci-failures-stream-json

Conversation

@nhorton
Copy link
Contributor

@nhorton nhorton commented Mar 4, 2026

Summary

  • MCP server copies job.schema.json to .deepwork/job.schema.json on startup so agents always have a stable reference path regardless of install location. Overwrites stale copies. Added formal requirement (JOBS-REQ-001.1.13) and 6 tests.
  • Added schema instructions to common_job_info in deepwork_jobs/job.yml — agents are now told to read the JSON schema before creating/editing any job.yml, with explicit callouts for commonly-misused fields (oneOf inputs, no type/path fields, etc.)
  • Fixed 9 stale description field references in define.md and implement.md that referenced a removed root-level field (now common_job_info_provided_to_all_steps_at_runtime)
  • Improved repair hint in tools.py to tell agents to fix files directly rather than starting the repair workflow
  • Added .deepreview rule (job_schema_instruction_compatibility) to catch future schema-instruction drift
  • Prompt clarity improvements in define.md and implement.md based on review feedback
  • E2e CI hardening: --dangerously-skip-permissions, max turns lowered from 30→20, explicit output path guidance, model upgraded to Sonnet 4.6, conditional PR runs, stream-json debugging

Test plan

  • 6 new unit tests for schema copy behavior (tests/unit/jobs/mcp/test_server.py)
  • All existing tests pass (uv run pytest)
  • Ruff lint + format clean
  • DeepWork reviews pass (job_definition, prompt_best_practices, python_code_review, python_lint, schema_compatibility, doc_sync, suggest_new_reviews, requirements_traceability)
  • E2e CI passes in merge queue

🤖 Generated with Claude Code

nhorton and others added 4 commits March 3, 2026 17:07
The claude-code-e2e job has been failing since Mar 2 — Claude exits after
~60s with no output, likely due to AskUserQuestion in --print mode. This
adds stream-json output for tool call visibility, --max-turns 30 to
prevent early exit, stronger prompt guardrails (no AskUserQuestion, must
complete all tool calls), the missing go_to_step MCP permission, and
PR-level runs when the workflow file itself changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nhorton nhorton added this pull request to the merge queue Mar 4, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 4, 2026
@nhorton nhorton added this pull request to the merge queue Mar 4, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 4, 2026
nhorton and others added 6 commits March 4, 2026 14:21
The merge queue failure showed the agent hitting schema validation errors
during job creation, then switching to the "repair" workflow instead of
fixing the schema and resubmitting. Add explicit instruction to never
start repair/learn workflows.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The old hint said "This project likely needs /deepwork:repair" first,
which led agents to start the repair workflow even when they had just
created the file and could fix it themselves. Reword to lead with "fix
it directly if you edited it" and only suggest repair as a fallback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The define.md and implement.md instruction files still referenced a root-level
"description" field that was replaced by "common_job_info_provided_to_all_steps_at_runtime"
in the schema. This caused e2e CI agents to create invalid job.yml files with
additionalProperties errors. Fixed 4 references in define.md and 5 in implement.md.

Also added a .deepreview rule (job_schema_instruction_compatibility) that reviews
deepwork_jobs instruction files against the job schema whenever either changes,
preventing this drift from recurring.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address prompt best practices review findings:
- Define <job_dir> and [job_dir] placeholders upfront before first use
- Consolidate overlapping guideline sections in implement.md
- Make completion checklist verifiable with concrete criteria
- Nest sub-sections under Step 2 (H4) to fix step numbering hierarchy
- Move mandatory review requirement to top of Step 4 in define.md
- Make "rich context" guideline specific about what to include
- Add bridging note connecting patterns to Q&A flow

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The example showed adding a changelog field to job.yml, but changelog
was removed from the schema. The fix_jobs repair step already instructs
removing changelog sections — this aligns the iterate example.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The MCP server now copies job.schema.json to .deepwork/job.schema.json on
every startup, giving agents a stable reference path regardless of where
DeepWork is installed. The common_job_info in deepwork_jobs/job.yml now
includes rigorous instructions to read the schema before creating or
editing any job.yml, with explicit callouts for commonly-misused fields.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nhorton nhorton force-pushed the claude/fix-e2e-ci-failures-stream-json branch from 2addf99 to 00b4be1 Compare March 4, 2026 23:40
Adds formal requirement that the MCP server copies job.schema.json to
.deepwork/job.schema.json on startup (overwriting stale copies). Includes
6 tests covering: copy behavior, overwrite of existing files, directory
creation, graceful failure with warning, content fidelity, and
integration with create_server.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nhorton nhorton force-pushed the claude/fix-e2e-ci-failures-stream-json branch from 00b4be1 to afe3387 Compare March 4, 2026 23:41
nhorton and others added 2 commits March 4, 2026 15:55
Observed turn counts from last 2 successful runs: 10-12 for job creation,
7-10 for workflow execution. Lowering from 30 to 20 catches runaway agents
faster while leaving headroom. Adding --dangerously-skip-permissions removes
permission prompts that waste turns in CI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Agent was writing output files to .deepwork/jobs/fruits/ instead of
./fruits/ relative to project root. Add explicit instruction to write
outputs relative to the working directory, not inside .deepwork/jobs/.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nhorton nhorton enabled auto-merge March 5, 2026 00:20
@nhorton nhorton changed the title Fix failing e2e CI tests with debugging and prompt hardening Harden e2e CI tests and copy job schema at MCP startup Mar 5, 2026
@nhorton nhorton added this pull request to the merge queue Mar 5, 2026
Merged via the queue into main with commit 10b2f58 Mar 5, 2026
5 checks passed
@nhorton nhorton deleted the claude/fix-e2e-ci-failures-stream-json branch March 5, 2026 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant