firmd.ai
firmd core

Telemetry

Two streams of evidence. The firm watches itself — does discourse converge, does delivery ship, do bets win more often than they lose. The firm also watches what it ships, so the prediction attached to a Strategic Intent has something real to measure against.

What it is

Telemetry inside firmd splits into two realms, on purpose.

Firm telemetry — the meta level. Captures how firmd itself is operating. Discourse signals (convergence, participation balance, judge verdicts), delivery signals (lifecycle transitions, build and deploy outcomes, retry counts), and roll-up metrics (win rate across bets, prediction-error trends across missions). This is the substrate eval reports are built on, and the lens through which firmd improves its own defaults.

Deliverable telemetry — the tenant level. Captures what the firm's shipped product is doing in the real world. Tenant-bound: configured through the tenant's DeliveryProfile, pointed at the actual deployed surface. These signals close the prediction-error loop — did the Strategic Intent move the outcome it predicted, by how much, with what tail?

Why it exists

An agentic firm without telemetry is a firm that cannot learn. The two realms answer two different questions, and neither answer is dispensable.

Firm telemetry tells the firm whether its own machinery is working — whether discourse converges, whether delivery ships, whether the loop closes. Without it, defaults stop improving and the system drifts. Deliverable telemetry tells the firm whether its bets pay off in the real world. Without it, every shipped increment evaporates into anecdote and the causal model never sharpens.

How it works today
Firm telemetry — OpenTelemetry across the stack

firmd emits OTLP signals — traces, metrics, business events — from every part of itself. Local dev points them at a Grafana LGTM container; production-style deployments place an OpenTelemetry Collector in between. Reasoning is observable, not just uptime: what agents argued, where missions converged, how often the human intervened.

Firm telemetry — eval reports as the substrate

At the end of a mission, firm-level signals are aggregated into a per-mission evaluation report — convergence, challenge density, token economy, judge verdicts, retries. The report is what drives the next round of defaults; most rebalances on this site started as a finding in a report. See Evaluation.

Deliverable telemetry — configured per tenant

The tenant's DeliveryProfile names the observation adapter. Today: a webhook — "poor man's observation": a small script injected into the sandbox that posts events to a firmd endpoint. Useful when the tenant has no dedicated telemetry stack.

Planned Tenant-grade observability paths. The webhook is fine for the pilot, thin for tenants that already run a real observability stack. A fuller adapter will integrate with the tenant's own telemetry rather than handing it back through a script.

Deliverable telemetry — closes the prediction loop

When a tenant's deliverable emits its signals, the parent Strategic Intent's reflection phase compares the measured outcome against the prediction. The gap is Prediction Error. Big gap → the causal model updates; small gap → confidence in this edge of the graph grows.

In the product
screenshot pending Firm trace of a mission
screenshot pending Prediction Error on a deliverable bet
screenshot pending Deliberation health (firm-level)
← Back to overview