Why we replaced our trace database three times in two years.
From Postgres to ClickHouse to a custom columnar store. Each migration earned its keep. The numbers, in the order we learned them.
Pairwise model judging is statistically dishonest. Here's the fix.
I re-ran 38,000 evals with a confidence-interval-first methodology. 22% of "wins" became ties. Publishing the protocol.
A two-hour outage caused by a single tokenizer regex.
The cost-aggregation pipeline silently dropped 0.3% of spans. The trace pattern that exposed it and the property test written in response.
Drift detection without ground truth.
A practical walkthrough of distributional, behavioral, and self-consistency drift signals — with reference implementations.
Streaming traces at 14k/sec on a single Postgres replica.
A surprisingly long-lived architecture, the bottlenecks hit along the way, and the indexes that earned their salary.
Why my dashboard has no charts on the homepage.
A short essay on signal-to-noise and the costs of pretty data viz that doesn't change anyone's behavior.
Token counts lie. Here's how I measure spend now.
Reconciling provider invoices against my own counters surfaced a 1.7% gap. The story of where the missing tokens went.