Observability Logging

Logs are a core component of observability. They provide detailed insight into application behavior, system events, and failure modes that are often difficult to understand through metrics or traces alone.

In the course of operating a bare-metal Kubernetes cluster, I needed a reliable way to collect and centralize logs from both cluster infrastructure and running workloads. This required designing a logging pipeline that could ingest logs from multiple nodes, process them consistently, and store them in a system suitable for querying and analysis.

This is the approach I used to implement log ingestion for that environment, along with the design considerations and trade-offs involved.

Goals

The logging solution needed to meet several practical requirements for operating a Kubernetes cluster in production:

Automatic coverage: Logs should be collected from every pod running in the cluster without requiring additional configuration per application. New workloads should automatically have their logs ingested as they are deployed.
Observable logging pipeline: The logging system itself should expose metrics and health signals so that failures, backlogs, or degraded performance can be detected and investigated.
Resilience to temporary outages: If the logging backend becomes temporarily unavailable, logs should be buffered and delivered once connectivity is restored rather than being lost during the outage window, where feasible.
Human-friendly querying Logs should include Kubernetes metadata so they can be queried and filtered by dimensions such as namespace, pod, and container. This allows operators to quickly narrow investigations to relevant workloads or environments.

Constraints

The cluster environment imposed a few important constraints that influenced the logging design. The constraints are fundamentally the same as outlined in my article about implementing a metrics pipeline. To reiterate, low node headroom is the primary concern, followed by high iowait times.

Technology Selection

Several logging backends are common in Kubernetes environments, including ELK/EFK, Loki, and VictoriaLogs.

ElasticSearch was ruled out due to its high resource requirements and indexing model designed for broad log search, which is unnecessary for typical Kubernetes queries that already narrow results by metadata and time.

Given the choice between Loki and VictoriaLogs, I opted to use Loki, as I was already deeply familiar with how it works under-the-hood regarding indexes vs structured metadata and unpacking at query time, given my work in enabling queryable logs in a Docker-based environment.

The log collector is the OpenTelemetry collector. This was chosen because it is easily managed as a DaemonSet through the OpenTelemetry operator, and it consumes less resources than Grafana Alloy, which is essential when node headroom is already low. This also has the added benefit of taking pressure off the control-plane, as Alloy needs to query the API server to enumerate pods, whereas node-local filesystem collection avoids this entirely.

Design

graph LR subgraph node subgraph filesystem pod["/var/log/containers/*"] other["/var/log/*"] end otel[otel collector] end subgraph loki end pod --inject service.name--> otel other --regex parse--> otel otel --enriched logs--> loki --archive--> s3["s3 filestore"]

Design is straightforward: the OTel collector tails system logs on /var/logs/* and uses their names to enrich them with metadata. Container filenames can be parsed with the following expression:

'^((?P<pod>.*?)-?(?P<templateHash>[0-9a-fA-F]{3,})?)-?(?P<podSuffix>[^-]{4,})?_(?P<namespace>.*)_(?P<containerName>.*)-(?P<containerId>.*)\.log'

Implementation

Implementation is also straightforward, the OTel collector only needs access to /var/log and everything gets forwarded to Loki as expected.

For Loki, there was one non-default configuration option that needed additional consideration. Loki receives all logs from the OTel collector with additional fields as structured metadata, rather than indexes, as to avoid cardinality explosions.

One of the fields I am exporting is k8s.node.name; I have found it immensely valuable to query by specific node to determine node-level vs cluster-level issues. This means the field must be a label rather than structured metadata, so that irrelevant streams can be pruned at query time rather than parsed and then dropped. It is not in the default list, so Loki had to be configured to rewrite this as a label.

Monitoring / Alerting

Monitoring and alerting for the logging stack required very little additional configuration because of the previously-mentioned existing Mimir and Alloy metrics pipeline.

To recap, Alloy is responsible for discovering PrometheusRule resources in the cluster and loading their alerting and recording rules. This mechanism was originally implemented to support metrics ingestion for Grafana Mimir, allowing rule definitions to be managed through standard Kubernetes resources.

The Loki Helm chart includes a set of pre-defined monitoring rules that cover common operational concerns such as ingestion failures, component restarts, and storage backpressure. When monitoring is enabled in the chart, these rules are exposed automatically as PrometheusRule resources.

Because Alloy was already configured to discover these resources cluster-wide, the Loki alerts were automatically loaded into the existing monitoring pipeline without requiring any additional configuration. As a result, the logging stack immediately benefited from a baseline set of operational alerts with essentially zero additional integration work.

Conclusion

Building a reliable logging pipeline for a bare-metal Kubernetes cluster requires careful attention to operational constraints, resource limitations, and maintainability. By selecting Loki as the backend and leveraging Grafana Alloy for log collection, it was possible to achieve automatic, cluster-wide log ingestion without adding operational complexity or requiring changes to application workloads.

The combination of metadata-aware log processing, hot-reloadable collector configuration, and integration with the existing metrics and alerting system provides a robust, observable solution. This approach ensures that logs are available when needed for troubleshooting, while minimizing resource overhead and administrative effort.

What I Would Do Differently

There is one glaring issue with the setup I've selected: pushing labels requires configuration of Loki rather than the collection agent. If there were more node headroom, I'd instead deploy Grafana Alloy instead of the OTel collector, enriching logs at the source and passing them to Loki. This would also allow for more granular normalization. However, normalizing kernel logs, apid logs, etc, isn't particularly high-value because of the relative infrequency of queries being conducted against them.

A middle ground worth exploring would be forwarding logs from the OTel collector to Alloy, enriching them there, and then passing them to Loki.

Backlinks

Production-Grade Bare Metal Kubernetes