Alloy Migration

When Grafana Alloy's helm chart reached its 1.0 release, I had already been experimenting with it in parallel to my existing metrics pipeline. My Prometheus-based setup worked, but it had grown increasingly high-friction: CPU usage was high, ingestion into Mimir required HA deduplication logic, and tools like cAdvisor were struggling as my cluster scaled.

In this post, I'll walk through why and how I replaced that setup with a tiered metrics pipeline built on Grafana Alloy, OpenTelemetry, and Kubernetes-native metrics. The goal was to reduce complexity, improve resilience, and retain observability without relying on heavyweight components like Prometheus or cAdvisor.

The Legacy Stack

My original metrics setup used Prometheus deployed via kube-prometheus-stack, along with cAdvisor for container-level metrics. Prometheus was deployed in a highly-available manner, with three replicas total. While functional, this stack became harder to manage and scale cleanly over time.

The problems were clear:

Prometheus configuration (especially relabeling logic) was difficult to templatize. The kube-prometheus-stack chart tends to centralize large swaths of configuration, making reuse and small overrides awkward or brittle.
As the cluster grew, cAdvisor became a bottleneck. Because it traverses the entire cgroup tree to collect container stats, response times ballooned to ~30 seconds with only ~50 pods. This not only delayed scrape cycles, but significantly increased network traffic across the cluster.
Running Prometheus in HA mode also increased resource usage on the ingestion side. Mimir's distributors and ingesters had to handle triple the metrics traffic, wasting CPU cycles on deduplication. Even slight network latency differences (amplified due to aforementioned cAdvisor latency) between Prometheus replicas led to differing metrics with identical timestamps, which made deduplication noisy for metrics like CPU usage.

Over time, it became clear that this monitoring stack was causing more pain than I was willing to accept.

The Modern Setup

At a high level, the stack I settled on is structured in two tiers:

Grafana Alloy (in clustered mode) handles all general metrics scraping. It discovers targets via Kubernetes EndpointSlices and scrapes the kubelet's built-in /metrics/cadvisor endpoint.
OpenTelemetry Collector runs as a DaemonSet and scrapes containerd metrics directly from the local socket. This fills in some of the detail gaps that kubelet metrics omit, while eliminating cross-node traffic when collecting these of metrics.

Below is a diagram illustrating the overall architecture; the arrows indicate direction of the flow of metrics data.

graph LR subgraph deployment alloy end subgraph "node A" C1[containerd] subgraph bar[daemonset] O1[OTEL collector] end K1[kubelet] C1 --scrape--> O1 end subgraph "node B" K2[kubelet] C2[containerd] subgraph foo[daemonset] O2[OTEL collector] end C2 --scrape--> O2 end K1 --scrape--> alloy K2 --scrape--> alloy alloy --remote write--> mimir O1 --remote write--> mimir O2 --remote write--> mimir

Grafana Alloy offered several operational advantages that made the system easier to maintain and reason about:

Clustered scraping means scrape targets are redistributed automatically across Alloy agents. If a node fails, targets are rebalanced without any need for deduplication or replay logic.
No more HA ingestion overhead: With Alloy's internal coordination, I no longer have to run multiple Prometheus replicas that duplicate scrape traffic, only for Mimir to discard extra copies.
Kubernetes autodiscovery feels cleaner and more precise. The discovery.kubernetes block supports EndpointSlices and lets me apply relabeling before the scrape happens, which results in smaller, more efficient scrape sets.
Smaller, modular config files reduce cognitive load. Each file has a narrow responsibility (e.g. discovery, relabeling, scraping, remote write) which makes the system easier to reason about and safer to update incrementally, while also allowing code reuse.
Contextual labels (__meta_kubernetes_*) are automatically added, making relabeling cleaner and more expressive than raw Prometheus.
The built-in GUI and tools, especially target inspection and live debugging, are surprisingly useful when diagnosing configuration mistakes or understanding scrape failures.

For more granular container-level metrics, I wanted a solution that didn't have the overhead of cAdvisor. Since I use Talos Linux, which can expose containerd metrics, it made sense to take advantage of that built-in telemetry source. Here's why I went with OpenTelemetry Collector to scrape containerd:

Local-only access; each collector instance scrapes its own node's containerd port, avoiding the need to expose it over the network. This keeps the surface area small and avoids potential cross-node access issues.
Clear separation of responsibilities: Alloy handles scrape orchestration and general node/pod metrics, while OTEL is scoped solely to containerd. This separation makes it easier to reason about and debug each component.
Future flexibility and locality: I plan to introduce Grafana Beyla in the future, which will export eBPF-based application metrics and traces. Having OTEL already present on each node means I can route Beyla's output directly into OTEL (and then to Tempo), minimizing cross-node traffic and keeping the pipeline consistent.

This overall setup gives me high-resolution container metrics without compromising on security or simplicity, and it integrates cleanly with how Talos is designed to operate. It also reduces load on Mimir's distributors and ingesters, since scrape traffic is naturally partitioned across nodes and each individual remote write request is smaller.

Implementation

With the design nailed down, I started by setting up Grafana Alloy in clustered mode, using kustomize to deploy the base helm chart and a config map from the alloy configuration fragments:

resources:
- route.yml

configMapGenerator:
- name: alloy-config
  files:
  - discovery.alloy
  - metrics.alloy
  - write.alloy
  - opts.alloy
  options:
    disableNameSuffixHash: true

patchesJson6902:
  - target:
      kind: ClusterRole
      name: alloy
      version: v1
    path: patch-clusterrole.json

The patchesJson6902 section included is necessary to grant permission to Alloy for scraping the kubelet; the patch file simply allows get on the nodes/metrics resource.

The included files are fairly basic discovery.kubernetes and prometheus.scrape configurations, so I'll only mention the non-standard parts:

There is a discovery.relabel config that adds the "standard" labels used in most community dashboards, directly from the meta labels:

discovery.relabel "endpointslices_containers" {
  targets = discovery.kubernetes.endpointslices.targets
  rule {
      source_labels = ["__meta_kubernetes_endpointslice_label_app_kubernetes_io_component"]
      target_label = "component"
  }
  rule {
      source_labels = ["__meta_kubernetes_pod_container_name"]
      target_label = "container"
  }
  ...
}

The kubelet scrape job needs to supply the token, as well as an alternative path:

prometheus.scrape "nodes_kubelet" {
  scheme = "https"
  bearer_token_file = "/var/run/secrets/kubernetes.io/serviceaccount/token"
  metrics_path = "/metrics/cadvisor"
  ...
}

Finally, in some cases it is necessary to relabel the container label, removing the containerd:// prefix for congruency with other metrics sources:

rule {
  source_labels = ["container_id"]
  target_label = "container_id"
  regex = "^containerd://(.*)$"
  replacement = "${1}"
}

Altogether, in the UI, it looked like this:

After getting Alloy working, I set up the OpenTelemetry collector. The first step was updating my Talos machine configuration to enable containerd metrics. Talos Linux documentation provides an example; I simply changed the bind address to 127.0.0.1.

machine:
  files:
    - content: |
        [metrics]
          address = "127.0.0.1:11234"
      path: /etc/cri/conf.d/20-customization.part
      op: create

Since applying this machineconfig patch required a restart, I rolled out the changes one-by-one. Afterwards, installing the OpenTelemetry collector as a daemonset was simple, as was adding a receivers.prometheus section for scraping 127.0.0.1:11234.

Results

The new setup has been running smoothly, with significantly lower scrape latency and reduced resource consumption on both the metric collection and the Mimir ingestion side. The impact of the change was significant:

aspect	previous setup	current setup
scrape duration	>40s	<2s
scrape frequency	1/30s	1/15s
distributor write latency (p99)	25s	500ms
distributor transmit bandwith	40MB/s	30MB/s

Quite interestingly, I'm receiving more metrics, with higher resolution, but with less overall network traffic. This can be directly attributed to the difference between Alloy's clustering and Prometheus high-availability; I'm sending more discrete data points, but only one instance of each.

Another interesting data point that doesn't map 1-to-1 is overall CPU and RAM usage. Prometheus and cAdvisor, summed across all 14 nodes of the cluster, used a total of ~13 vCPU and 20Gi RAM. Alloy and the OTEL collector account for ~1 vCPU and ~3Gi RAM.

One point of friction with this setup is actually querying the containerd metrics. The containerd metrics scraped are in the following form:

container_cpu_throttled_usec_microseconds{
  container_id="00c4eb855d53564f59a4a11763d3371d5a0c3adf65f5bd201c32e85e2da058fe",
  instance="127.0.0.1:11234",
  job="containerd",
  namespace="k8s.io",
  runtime="io.containerd.runc.v2"
}

The lack of labels aside from container_id make it hard to query for containers as a human would, by dimensions like pod, namespace, etc. To solve this I deployed kube-state-metrics to the cluster. It provides the kube_pod_container_info series, which includes these labels, among others:

kube_pod_container_info{
  container_id="00c4eb855d53564f59a4a11763d3371d5a0c3adf65f5bd201c32e85e2da058fe",
  exported_container="node-driver-registrar",
  exported_namespace="secrets-store",
  exported_pod="secrets-store-csi-driver-xnr6j",
  uid="dfa53b5f-e062-4543-a08d-72753c636ae8"
}

With this, I was able to query based on the pod's attributes without knowing container ids:

(container_cpu_throttled_usec_microseconds * on(container_id) group_left(exported_namespace, exported_container, exported_pod) kube_pod_container_info{container_id!="",exported_namespace=$namespace, exported_pod=$pod})

Closing Thoughts

This migration was absolutely worth the effort. Grafana Alloy and OpenTelemetry have given me a more stable, more efficient, and more transparent metrics pipeline, with significantly less operational friction. Prometheus was already starting to feel redundant in my stack, and since I was using Mimir for remote storage, I didn't need Prometheus's embedded TSDB. Managing three full replicas just to push duplicate data upstream never sat right either. Alloy's clustering model finally gave me a clean alternative that does one job well: scrape and forward.

More importantly, the overall system is more resilient. The old stack had enough moving parts to trigger cascading failures when one component degraded, like cAdvisor hanging or usage spikes in Mimir's distributors and ingesters when handling conflicting samples. With Alloy and OTEL, the architecture is simpler, resource usage is lower, and failure domains are narrower.

This setup feels like a better foundation to build on, and I don't miss Prometheus at all.

Backlinks

Services