Workers and signal files
When /catalyst-dev:orchestrate runs a wave, it dispatches one worker per ticket. Each worker is a separate Claude Code subprocess running /catalyst-dev:oneshot inside a dedicated git worktree. The worker communicates progress back to the orchestrator exclusively through its signal file.
Worker lifecycle
Section titled “Worker lifecycle”Orchestrator Worker subprocess │ │ │─ creates signal file ──────────>│ │─ launches humanlayer ──────────>│ starts /catalyst-dev:oneshot │ │ Phase 1: researching │ │ Phase 2: planning │ │ Phase 3: implementing │ │ Phase 4: validating │ │ Phase 5: shipping (opens PR, arms auto-merge) │ │ │<─ signal file: pr-created ──────│ │ │ exits │ │ │ (Phase 4 poll loop, orchestrator side) │ gh pr view → merged? ──> yes: writes mergedAt, Linear=DoneThe split between worker and orchestrator matters: the worker subprocess reliably exits at its final tool-use, before merge completes. Polling until merged is the orchestrator’s responsibility, not the worker’s. A worker that claims to poll-until-merged burns tokens and produces false signals.
The signal file
Section titled “The signal file”Located at <orchestrator-dir>/workers/<ticket>.json. The orchestrator creates an empty skeleton; the worker writes into it.
Fields
Section titled “Fields”| Field | Type | Description |
|---|---|---|
ticket | string | The ticket ID (e.g., CTL-48) |
orchestrator | string | Orchestrator ID |
workerName | string | Human-readable worker name (used for tmux titles, etc.) |
status | string | Current status — see state machine below |
phase | number | 0–6, matching oneshot phases |
startedAt | ISO string | When the worker was dispatched |
updatedAt | ISO string | Last write to the signal file |
lastHeartbeat | ISO string | Most recent heartbeat (~60s cadence during long phases) |
completedAt | ISO string or null | Set when terminal state reached |
worktreePath | string | Absolute path to the worker’s git worktree |
phaseTimestamps | object | Map of status → ISO timestamp; populated at each transition |
pr | object or null | Populated at Phase 5 PR creation |
linearState | string or null | Current Linear state name |
definitionOfDone | object | Populated at Phase 4 + 5 with real results |
pid | number | Worker’s Claude process PID |
The pr subobject
Section titled “The pr subobject”{ "number": 123, "url": "https://github.com/org/repo/pull/123", "ciStatus": "pending", "prOpenedAt": "2026-04-14T19:15:30Z", "autoMergeArmedAt": "2026-04-14T19:15:32Z", "mergedAt": null}ciStatus:pending|passing|failing|unknown|mergedprOpenedAt— set by worker the moment the PR is createdautoMergeArmedAt— set by worker aftergh pr merge --squash --automergedAt— always set by the orchestrator (or standalone/merge-pr), never by the worker
State machine
Section titled “State machine”dispatched → researching → planning → implementing → validating → shipping → pr-created │ (orchestrator polls) │ v merging → done │ v (at any stage) → failed | stalledThe worker writes statuses up through pr-created. The orchestrator writes merging and done (or failed/stalled if the wave times out or verification fails).
The global state
Section titled “The global state”In parallel with the signal file, workers also write to ~/catalyst/state.json via catalyst-state.sh worker. This is the fleet-wide aggregate that the dashboard reads — it unifies workers across multiple orchestrators. Writes are atomic (jq + flock).
Schema:
{ "orchestrators": { "orch-2026-04-14-abc123": { "project": "CTL", "startedAt": "...", "lastHeartbeat": "...", "wave": 2, "totalWaves": 3, "workers": { "CTL-48": { /* same shape as signal file */ } }, "attention": [ { "type": "verification-failed", "ticket": "CTL-48", "message": "..." } ] } }}The attention array is the orchestrator’s way of flagging something that needs human decision. Never auto-resolved by the orchestrator itself.
Heartbeats
Section titled “Heartbeats”During long phases (implementation, CI waits), workers update lastHeartbeat without changing status. The orch-monitor treats a worker as stalled if now - lastHeartbeat > 15 minutes. A stalled worker is never auto-restarted — it becomes an attention item.
If you’re writing a custom worker, heartbeat every ~60s at minimum. More often is fine; less often trips false stalled detections.
Terminal states
Section titled “Terminal states”A worker reaches a terminal state in one of three ways:
| State | Means | Signal writer |
|---|---|---|
done | PR merged, Linear=Done | Orchestrator (after observing merge) |
failed | Unrecoverable error, quality gates exhausted, or human escalation | Worker (writes attention reason) |
stalled | No heartbeat / no progress for 15+ min | Orchestrator |
Terminal states set completedAt. No further writes to the signal file happen once terminal is reached.
Why signal files and not IPC
Section titled “Why signal files and not IPC”File-based signals are intentionally boring:
- Debuggable with
cat workers/*.json | jq - Survive process death on both sides — neither the worker crashing nor the orchestrator restarting destroys state
- Atomic on POSIX via tmp+rename writes
- Pickled history — old signal files live in archived orchestrator dirs, so you can audit past waves
The cost is polling latency — the orch-monitor uses fs.watch to avoid polling, but some consumers do poll. That’s fine for a one-machine setup; for multi-host you’d front this with a real event bus.