eBPF Tracing

Traces are a core pillar of observability, providing end-to-end visibility into how requests move through a system. On a fully bare-metal Kubernetes cluster, where everything from Longhorn storage to validation webhooks and operators runs in-house, tracing surfaces interactions and bottlenecks that metrics alone can’t reveal—making it critical for both debugging and performance validation.

This visibility informed the architecture of the tracing pipeline itself, guiding decisions around instrumentation, data flow, and storage—setting the stage for the design choices I implemented.

Goals

The objective was to implement a tracing system that operates seamlessly across the entire Kubernetes cluster, without requiring manual instrumentation of each individual service. Any service deployed—whether a custom operator, a webhook, or a stateless microservice—should automatically emit traces that integrate into a unified pipeline.

At the same time, the system needed to support interactive querying and analysis. It should be possible to quickly identify long-running requests, isolate performance issues by namespace or service, and trace dependencies across components. This balance between automated coverage and query efficiency informed decisions around trace collection, aggregation, and storage, ensuring that the observability layer provides both depth and speed without imposing operational overhead.

Design

Designing the tracing pipeline required careful consideration of both operational overhead and cluster complexity. The cluster already runs Grafana Alloy in clustered mode for metrics scraping, but it is not deployed as a daemonset. Extending Alloy to also receive traces would have increased its responsibility significantly, or required deploying a separate Alloy instance as a daemonset. Either approach risked confusion during high-stress scenarios. I would need to distinguish which Alloy instance handled which function, and that introduces potential points of failure.

To avoid these complications, the OpenTelemetry agent was chosen as the trace receiver. Its operator simplifies deployment and management across the cluster, allowing traces from all services to be automatically collected without manual instrumentation. Following a similar pattern, Grafana Beyla was selected as the trace generator. Beyla can run as a daemonset, ensuring node-local instrumentation, which reduces cross-node traffic and preserves performance. Additionally, its eBPF-based tracing satisfies the requirement for low-instrumentation observability, enabling automated, cluster-wide coverage with minimal disruption to running workloads.

This design balances operational clarity, efficiency, and observability coverage, providing a scalable, low-friction tracing solution for a complex Kubernetes environment.

Overall, the tracing architecture looked like this:

graph LR subgraph deployment alloy end subgraph "node" subgraph bar[daemonset] otelcol[OTEL collector] beyla end pod[application pod] end subgraph monitoring tempo mimir loki end pod -.trace info .-> beyla --traces, metrics, log association--> otelcol --traces--> tempo otelcol --RED metrics--> mimir otelcol --log association ---> loki pod --container logs, metrics--> alloy --container logs --> loki tempo --service graphs--> mimir alloy -- metrics --> mimir

Implementation

Deploying the tracing pipeline required careful tuning to respect the cluster’s existing resource constraints. Both the OpenTelemetry and Grafana Beyla daemonsets were assigned dedicated priority classes, ensuring they could preempt lower-priority workloads if node resources became scarce. This guarantees that trace collection and generation remain reliable even under high utilization, without risking starvation of the observability layer.

For storage, Grafana Tempo was configured to forward trace metrics to an existing Mimir installation. Because Tempo’s trace metrics generated roughly 60,000 series—negligible compared to the ~500,000 series already in Mimir—no additional scaling or partitioning considerations were required. This allowed me to leverage existing infrastructure for long-term trace retention and analysis, keeping operational complexity minimal while providing comprehensive, cluster-wide observability.