AIObserver — Watching machines think.

From Postgres to ClickHouse to a custom columnar store. Each migration earned its keep. The numbers, in the order we learned them.

I re-ran 38,000 evals with a confidence-interval-first methodology. 22% of "wins" became ties. Publishing the protocol.

The cost-aggregation pipeline silently dropped 0.3% of spans. The trace pattern that exposed it and the property test written in response.

A practical walkthrough of distributional, behavioral, and self-consistency drift signals — with reference implementations.

A surprisingly long-lived architecture, the bottlenecks hit along the way, and the indexes that earned their salary.

A short essay on signal-to-noise and the costs of pretty data viz that doesn't change anyone's behavior.

Reconciling provider invoices against my own counters surfaced a 1.7% gap. The story of where the missing tokens went.