Event architecture

Catalyst’s observability layer has three tiers of durability, and frontends (web UI, terminal UI, scripts) consume a fused stream built from all three.

The three sources of truth

Source	Kind	Writer	Lifetime
Worker signal file (`<orch-dir>/workers/<ticket>.json`)	Mutable snapshot	Worker, orchestrator	Archived with the orchestrator dir
Global state (`~/catalyst/state.json`)	Mutable snapshot	`catalyst-state.sh worker` / `orchestrator`	Persists across all orchestrations
Global event log (`~/catalyst/events.jsonl`)	Append-only	`catalyst-state.sh event`	Never truncated automatically

Snapshots answer “what is the state right now?” — the event log answers “what happened and when?”. Both matter.

Why three, not one

An earlier design used only the signal file. It was insufficient because:

Workers exit before merge — the subprocess running /oneshot reliably terminates at its final tool-use, which happens before PR merge completes. If pr.mergedAt lived only in the signal file, it would never be written by the worker itself.
Multi-orchestrator aggregation — a single signal file describes one worker. The dashboard needs to query across all orchestrators to show “how many waves are active right now?”
Audit trails — status snapshots overwrite each other. The event log keeps the full history (researching → planning → implementing → ...) even when the worker is long gone.

So signal files handle the first layer (per-worker snapshot), global state handles the second (fleet snapshot), and the event log handles the third (append-only history).

How writes propagate

Using a status transition as the example:

Worker writes ───> Signal file          (local snapshot, atomic via tmp+mv)
            │
            └───> catalyst-state.sh worker   ───> Global state  (atomic via jq+flock)
                                             └──> Event         (appended to events.jsonl)

orch-monitor ──> fs.watch on signal files   ───> Recomputes snapshot
            └──> fs.watch on state.json     ───> Fan out via SSE
            └──> tail -f on events.jsonl    ───> SSE event stream

The monitor never writes — it only reads. Remediation (advancing a ticket, re-dispatching a worker) always goes through the skill layer, which in turn goes through catalyst-state.sh to maintain the write ordering invariants.

SSE event stream

The orch-monitor exposes GET /events as a Server-Sent Events stream. Events are JSON objects following the same schema as events.jsonl:

event: worker-update
data: {"orchestrator":"orch-...","worker":"CTL-48","status":"implementing","phase":3,"ts":"2026-04-14T19:03:01Z"}

event: pr-update
data: {"orchestrator":"orch-...","worker":"CTL-48","pr":123,"ciStatus":"passing","ts":"2026-04-14T19:20:44Z"}

event: liveness-change
data: {"orchestrator":"orch-...","worker":"CTL-48","alive":false,"pid":63709,"ts":"2026-04-14T19:22:00Z"}

event: snapshot
data: {"orchestrators":[...],"generatedAt":"2026-04-14T19:22:00Z"}

Event types the monitor emits

Event	Source	When
`snapshot`	Generated by monitor	On connect, every 60s, and on any state change
`worker-update`	Signal file change	Worker writes a new status or phase
`pr-update`	GitHub poll (every 30s)	PR state or CI status changed
`liveness-change`	PID check (every 5s)	A worker’s PID stopped responding
`attention-raised`	Global state change	Orchestrator added an attention item
`wave-completed`	Event log tail	Orchestrator emitted wave-completed

Clients (web UI, terminal UI, custom dashboards) subscribe once and render incrementally. No polling.

Connecting your own frontend

Any SSE-capable client works. Example in Node:

import { EventSource } from 'eventsource';

const es = new EventSource('http://localhost:7400/events');

es.addEventListener('worker-update', (e) => {
  const { worker, status } = JSON.parse(e.data);
  console.log(`${worker}: ${status}`);
});

es.addEventListener('pr-update', (e) => {
  const { worker, pr, ciStatus } = JSON.parse(e.data);
  if (ciStatus === 'failing') notifySlack(`${worker} PR #${pr} CI failed`);
});

Or in Bash, for quick ad-hoc piping:

curl -N http://localhost:7400/events \
  | grep -E '^event: (worker-update|pr-update)' -A 1 \
  | grep ^data:

Backpressure and reconnection

If a client disconnects (network blip, process restart), the SSE spec gets it reconnected automatically — the browser/library handles retry. On reconnect the monitor immediately sends a fresh snapshot event so the client can reconcile any missed updates.

The event log (events.jsonl) is the durable fallback: if a client missed events while disconnected, it can replay from the log by its last-seen timestamp:

awk -F'"ts":' '$2 > "\"2026-04-14T19:00:00Z\"" {print}' ~/catalyst/events.jsonl

Why not a real event bus?

File-based append-only logs and filesystem watches are intentionally boring. They:

Require no additional process (no Redis, no Kafka)
Survive monitor restarts (events.jsonl is the source of truth)
Are debuggable with cat, tail, and jq
Work offline

The cost is that you’re limited to one machine — if you need multi-host aggregation, pipe events.jsonl into your regular log shipping stack (Vector, Fluent Bit, whatever you already run). The schema is stable.