blog.fosspilled.dev
I solve infrastructure problems that don't have documented solutions.
I've built production Kubernetes on bare metal with custom observability (full LGTM stack), migrated workloads across CPU architectures, and revived abandoned open-source tooling by rewriting its build system. Most of my work involves distributed systems, production reliability, and the kind of operational complexity where "just use the managed service" isn't an option.
~10 years building this stuff. Free and open-source source software enthusiast.
Featured Articles
Metrics are the first signal of system health, but collecting them at scale is only the start. In my bare-metal Talos Linux cluster, I needed to instrument everything from Longhorn and Postgres to the monitoring stack itself. The challenge isn’t just collecting metrics; it’s also structuring, aggregating, and alerting on them in a way that surfaces meaningful operational insight without overwhelming the system.
Below is a live dashboard showing the ~500k active series at ~20k samples/sec that the described setup is handling.
What follows is everything that went into standing this up.up.Children