Observability Overview
Catalyst ships with an agent observability stack so you can see what autonomous workers are doing in real time — which phase they’re in, whether their PR is still open, whether their process is still alive, and how long each phase took.
The stack has four layers:
| Layer | What it does | Where it runs |
|---|---|---|
| Instrumentation | Claude Code emits OTLP telemetry (events, metrics, logs) via claude-code-otel | Per-worker, in the Claude Code process |
| Signal files + global state | Workers write JSON status to workers/<ticket>.json; orchestrator aggregates into ~/catalyst/state.json | Filesystem |
| orch-monitor web dashboard | Aggregates signal files, polls GitHub, serves a live web UI with SSE event streams | Single Bun process |
| Terminal UI | ANSI-rendered compact dashboard for headless environments | Same process (optional) |
What You’ll See
Section titled “What You’ll See”When you run a /catalyst-dev:orchestrate wave, the observability stack lets you answer:
- Is the worker’s Claude process still alive? (PID liveness check every 5s)
- What phase is it in — researching, planning, implementing, validating, shipping?
- How long has the PR been open? How long from open to merged?
- Did CI pass, fail, or is it still running?
- Are there unresolved review threads blocking merge?
- Where is the wave stuck — which ticket, which phase, for how long?
All of this is visible without tailing logs or attaching to the Claude session.
Quick Start
Section titled “Quick Start”If you just want to get the dashboard running for an active orchestrator:
# In a separate terminal (from any worktree)bun run plugins/dev/scripts/orch-monitor/server.ts# Dashboard: http://localhost:7400The server watches ~/catalyst/wt/ for orchestrator directories (matching orch-*) and reads their signal files automatically.
For headless environments, pass --terminal to get an ANSI-rendered view in the same process:
bun run plugins/dev/scripts/orch-monitor/server.ts --terminalDeeper Dives
Section titled “Deeper Dives”- Setting up the OTel stack — instrument Claude Code with
claude-code-otelfor full OpenTelemetry export - Making agents record data properly — shell wrapper, resource attributes, environment variables
- Using the web monitor — dashboard walkthrough, API endpoints, filters
- Using the terminal UI — when to prefer ANSI over the browser
- Event architecture — how SSE streams and the global event log fit together
When Not to Enable Observability
Section titled “When Not to Enable Observability”The OTel stack adds process startup overhead and network I/O. For single-ticket, single-session work — just running /catalyst-dev:oneshot or manual phase-by-phase skills — you usually don’t need it. Observability becomes essential when:
- You run
/catalyst-dev:orchestratewith multiple workers in parallel - You want post-hoc phase timelines (Gantt-style) for retros
- You need to page an on-call human when a worker stalls
- You’re debugging reward-hacking or verification failures across many runs
For everything else, the default signal file + global state layer is enough — no external infrastructure required.