Skip to content

Bug: parallel runs via orch serve — successful runs marked as failed #8

@oxgeneral

Description

@oxgeneral

Summary

When orch serve runs multiple agents in parallel, tasks that complete successfully (subtype: success in logs) are incorrectly marked as failed. The run JSON also retains status: running instead of being updated to completed.

Steps to Reproduce

  1. Create 3+ agents and assign them independent tasks (no dependencies)
  2. Start orch serve --verbose
  3. All agents start in parallel (confirmed by orch task list showing multiple in_progress)
  4. Wait for agents to complete their work

Expected Behavior

  • Run JSON: status: completed
  • Task status: review or done
  • Agent output captured in task result

Actual Behavior

  • Run JSONL log shows: {"type":"done","data":"{\"type\":\"result\",\"subtype\":\"success\",...}"}
  • Run JSON stays: status: running, exit_code: null
  • Task status: failed (no failure_reason)
  • Task result: empty

Evidence

Three parallel runs — all show success in logs but fail in state:

# JSONL (last line of each run) — all SUCCESS
run_CA9htig: type=done, data.subtype=success
run_caCyEJC: type=done, data.subtype=success  
run_BbF5_Nn: type=done, data.subtype=success

# Run JSON — all still RUNNING
run_CA9htig: status=running, exit_code=null
run_caCyEJC: status=running, exit_code=null
run_BbF5_Nn: status=running, exit_code=null

# Tasks — all FAILED
tsk_t56iAPO: status=failed, failure_reason=none
tsk_AomcObW: status=failed, failure_reason=none
tsk_AVsqW5_: status=failed, failure_reason=none

Hypothesis

The orch serve tick loop may have a race condition when processing multiple concurrent run completions. Possibly:

  • The completion handler doesn't update run JSON when multiple runs finish in the same tick
  • The stale status: running in run JSON causes the orchestrator to treat the run as timed out / crashed, marking the task as failed

Workaround

Run tasks sequentially with orch run <task-id> one at a time.

Impact

Critical — this breaks the core value proposition of parallel agent orchestration. Multi-agent teams cannot function via orch serve.

Environment

  • orch: latest (npm @oxgeneral/orch)
  • OS: macOS Darwin 25.2.0
  • Node: v20.19.5
  • Agents: 4 concurrent (claude adapter, claude-opus-4-6)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions