Summary
When orch serve runs multiple agents in parallel, tasks that complete successfully (subtype: success in logs) are incorrectly marked as failed. The run JSON also retains status: running instead of being updated to completed.
Steps to Reproduce
- Create 3+ agents and assign them independent tasks (no dependencies)
- Start
orch serve --verbose
- All agents start in parallel (confirmed by
orch task list showing multiple in_progress)
- Wait for agents to complete their work
Expected Behavior
- Run JSON:
status: completed
- Task status:
review or done
- Agent output captured in task result
Actual Behavior
- Run JSONL log shows:
{"type":"done","data":"{\"type\":\"result\",\"subtype\":\"success\",...}"}
- Run JSON stays:
status: running, exit_code: null
- Task status:
failed (no failure_reason)
- Task result: empty
Evidence
Three parallel runs — all show success in logs but fail in state:
# JSONL (last line of each run) — all SUCCESS
run_CA9htig: type=done, data.subtype=success
run_caCyEJC: type=done, data.subtype=success
run_BbF5_Nn: type=done, data.subtype=success
# Run JSON — all still RUNNING
run_CA9htig: status=running, exit_code=null
run_caCyEJC: status=running, exit_code=null
run_BbF5_Nn: status=running, exit_code=null
# Tasks — all FAILED
tsk_t56iAPO: status=failed, failure_reason=none
tsk_AomcObW: status=failed, failure_reason=none
tsk_AVsqW5_: status=failed, failure_reason=none
Hypothesis
The orch serve tick loop may have a race condition when processing multiple concurrent run completions. Possibly:
- The completion handler doesn't update run JSON when multiple runs finish in the same tick
- The stale
status: running in run JSON causes the orchestrator to treat the run as timed out / crashed, marking the task as failed
Workaround
Run tasks sequentially with orch run <task-id> one at a time.
Impact
Critical — this breaks the core value proposition of parallel agent orchestration. Multi-agent teams cannot function via orch serve.
Environment
- orch: latest (npm @oxgeneral/orch)
- OS: macOS Darwin 25.2.0
- Node: v20.19.5
- Agents: 4 concurrent (claude adapter, claude-opus-4-6)
Summary
When
orch serveruns multiple agents in parallel, tasks that complete successfully (subtype: successin logs) are incorrectly marked asfailed. The run JSON also retainsstatus: runninginstead of being updated tocompleted.Steps to Reproduce
orch serve --verboseorch task listshowing multiplein_progress)Expected Behavior
status: completedreviewordoneActual Behavior
{"type":"done","data":"{\"type\":\"result\",\"subtype\":\"success\",...}"}status: running,exit_code: nullfailed(nofailure_reason)Evidence
Three parallel runs — all show success in logs but fail in state:
Hypothesis
The
orch servetick loop may have a race condition when processing multiple concurrent run completions. Possibly:status: runningin run JSON causes the orchestrator to treat the run as timed out / crashed, marking the task asfailedWorkaround
Run tasks sequentially with
orch run <task-id>one at a time.Impact
Critical — this breaks the core value proposition of parallel agent orchestration. Multi-agent teams cannot function via
orch serve.Environment