Skip to content

Catalyst sandbox exec/job create intermittently returns 500 while still creating queued executions or pending jobs #1370

@rblalock

Description

@rblalock

Catalyst sandbox control-plane intermittently returns 500 Internal Server Error for exec and job create, while still partially
creating the execution/job record.

CLI Repro A
This reproduces the exec failure directly.

ORG_ID=org_2u8RgDTwcZWrZrZ3sZh24T5FCtz

SANDBOX=$(
agentuity cloud sandbox create
--org-id "$ORG_ID"
--runtime bun:1
--name repro-exec
--memory 512Mi
--cpu 500m
--disk 1Gi
--network
--json | jq -r '.sandboxId'
)

echo "SANDBOX=$SANDBOX"

agentuity cloud sandbox exec "$SANDBOX" --timeout 30s -- bash -lc 'echo exec-ok'

agentuity cloud sandbox execution list "$SANDBOX" --json
agentuity cloud sandbox events "$SANDBOX" --reverse --limit 20 --json

Observed result:

  • agentuity cloud sandbox exec intermittently fails with:
  • Even though the CLI got a 500, execution list still shows a newly-created execution stuck in status: "queued".

I reproduced that today with:

  • sandbox: sbx_096157e80bc61f5da9e37cb10ec9c85f7bf4688e88d36511fde8a8de3067
  • execution left behind: exe_17c5885e791c6776616eefe8fe9a98af075dc301b8039ccf85c99ecbfe34

CLI Repro B
This reproduces the background job failure directly.

ORG_ID=org_2u8RgDTwcZWrZrZ3sZh24T5FCtz

SANDBOX=$(
agentuity cloud sandbox create
--org-id "$ORG_ID"
--runtime bun:1
--name repro-job
--memory 512Mi
--cpu 500m
--disk 1Gi
--network
--json | jq -r '.sandboxId'
)

echo "SANDBOX=$SANDBOX"

agentuity cloud sandbox exec "$SANDBOX" --timeout 30s -- bash -lc 'echo warmup-ok'

agentuity cloud sandbox job create "$SANDBOX" -- bash -lc 'echo job-ok; sleep 5'

agentuity cloud sandbox job list "$SANDBOX" --json
agentuity cloud sandbox events "$SANDBOX" --reverse --limit 20 --json

Observed result:

  • agentuity cloud sandbox job create intermittently fails with:
  • Even though the CLI got a 500, job list still shows a newly-created job stuck in:
    • status: "pending"
    • startedAt: null
    • exitCode: null
    • error: null

I reproduced that today with:

  • sandbox: sbx_2a78f762bed43396c4429f5a7ce1c0364dd23f807ff132fc37810e6af644
  • job left behind: job_7c83f4f40ac5cc534411f5c4

One nuance:

  • plain create -> job create succeeded once for me
  • create -> exec -> job create has been a better trigger, and it mirrors the Hub bootstrap path more closely

If the first run doesn’t fail, retry on a fresh sandbox. I hit both variants multiple times today.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions