Skip to content

Event architecture

Catalyst’s observability layer has three tiers of durability, and frontends (web UI, terminal UI, scripts) consume a fused stream built from all three.

SourceKindWriterLifetime
Worker signal file (<orch-dir>/workers/<ticket>.json)Mutable snapshotWorker, orchestratorArchived with the orchestrator dir
Global state (~/catalyst/state.json)Mutable snapshotcatalyst-state.sh worker / orchestratorPersists across all orchestrations
Global event log (~/catalyst/events.jsonl)Append-onlycatalyst-state.sh eventNever truncated automatically

Snapshots answer “what is the state right now?” — the event log answers “what happened and when?”. Both matter.

An earlier design used only the signal file. It was insufficient because:

  1. Workers exit before merge — the subprocess running /oneshot reliably terminates at its final tool-use, which happens before PR merge completes. If pr.mergedAt lived only in the signal file, it would never be written by the worker itself.
  2. Multi-orchestrator aggregation — a single signal file describes one worker. The dashboard needs to query across all orchestrators to show “how many waves are active right now?”
  3. Audit trails — status snapshots overwrite each other. The event log keeps the full history (researching → planning → implementing → ...) even when the worker is long gone.

So signal files handle the first layer (per-worker snapshot), global state handles the second (fleet snapshot), and the event log handles the third (append-only history).

Using a status transition as the example:

Worker writes ───> Signal file (local snapshot, atomic via tmp+mv)
└───> catalyst-state.sh worker ───> Global state (atomic via jq+flock)
└──> Event (appended to events.jsonl)
orch-monitor ──> fs.watch on signal files ───> Recomputes snapshot
└──> fs.watch on state.json ───> Fan out via SSE
└──> tail -f on events.jsonl ───> SSE event stream

The monitor never writes — it only reads. Remediation (advancing a ticket, re-dispatching a worker) always goes through the skill layer, which in turn goes through catalyst-state.sh to maintain the write ordering invariants.

The orch-monitor exposes GET /events as a Server-Sent Events stream. Events are JSON objects following the same schema as events.jsonl:

event: worker-update
data: {"orchestrator":"orch-...","worker":"CTL-48","status":"implementing","phase":3,"ts":"2026-04-14T19:03:01Z"}
event: pr-update
data: {"orchestrator":"orch-...","worker":"CTL-48","pr":123,"ciStatus":"passing","ts":"2026-04-14T19:20:44Z"}
event: liveness-change
data: {"orchestrator":"orch-...","worker":"CTL-48","alive":false,"pid":63709,"ts":"2026-04-14T19:22:00Z"}
event: snapshot
data: {"orchestrators":[...],"generatedAt":"2026-04-14T19:22:00Z"}
EventSourceWhen
snapshotGenerated by monitorOn connect, every 60s, and on any state change
worker-updateSignal file changeWorker writes a new status or phase
pr-updateGitHub poll (every 30s)PR state or CI status changed
liveness-changePID check (every 5s)A worker’s PID stopped responding
attention-raisedGlobal state changeOrchestrator added an attention item
wave-completedEvent log tailOrchestrator emitted wave-completed

Clients (web UI, terminal UI, custom dashboards) subscribe once and render incrementally. No polling.

Any SSE-capable client works. Example in Node:

import { EventSource } from 'eventsource';
const es = new EventSource('http://localhost:7400/events');
es.addEventListener('worker-update', (e) => {
const { worker, status } = JSON.parse(e.data);
console.log(`${worker}: ${status}`);
});
es.addEventListener('pr-update', (e) => {
const { worker, pr, ciStatus } = JSON.parse(e.data);
if (ciStatus === 'failing') notifySlack(`${worker} PR #${pr} CI failed`);
});

Or in Bash, for quick ad-hoc piping:

Terminal window
curl -N http://localhost:7400/events \
| grep -E '^event: (worker-update|pr-update)' -A 1 \
| grep ^data:

If a client disconnects (network blip, process restart), the SSE spec gets it reconnected automatically — the browser/library handles retry. On reconnect the monitor immediately sends a fresh snapshot event so the client can reconcile any missed updates.

The event log (events.jsonl) is the durable fallback: if a client missed events while disconnected, it can replay from the log by its last-seen timestamp:

Terminal window
awk -F'"ts":' '$2 > "\"2026-04-14T19:00:00Z\"" {print}' ~/catalyst/events.jsonl

File-based append-only logs and filesystem watches are intentionally boring. They:

  • Require no additional process (no Redis, no Kafka)
  • Survive monitor restarts (events.jsonl is the source of truth)
  • Are debuggable with cat, tail, and jq
  • Work offline

The cost is that you’re limited to one machine — if you need multi-host aggregation, pipe events.jsonl into your regular log shipping stack (Vector, Fluent Bit, whatever you already run). The schema is stable.