Initial Setup

There's a bootstrapping problem when building a Kubernetes cluster from scratch: you need DNS to provision nodes, but you can't run DNS in the cluster until you have nodes. Same with configuration management, load balancing, and discovery services. The standard approach—cloud provider DNS, managed load balancers, cloud-init—wasn't an option for a completely self-hosted setup.

So I built a bootstrap node: a single external machine running the supporting infrastructure needed to get the cluster off the ground. Once Kubernetes was operational, these services could migrate into the cluster itself. But first, something had to exist outside the cluster to bring it into existence.

Setting up the Bootstrap Server

An HAProxy instance on the bootstrap node acted as a load balancer in front of the control plane nodes. This made it possible to run Talos client commands without targeting a specific node, enabling more flexible and resilient bootstrapping. A simple diagram of this behavior has been provided below.

flowchart LR User([User]) subgraph k8s [k8s cluster] CP1[karina.aespa] CP2[seulgi.redvelvet] CP3[irene.redvelvet] end subgraph bootstrap node haproxy disc[talos discovery] minio coredns end User -->|talosctl request| haproxy haproxy -->|internal request| CP1 haproxy -->|internal request| CP2 haproxy -->|internal request| CP3 minio --node config--> k8s coredns --"dns answers"--> k8s disc --"cluster info"-->k8s

An excerpt of the relevant HAProxy config is below; the resolvers block is necessary to resolve the custom DNS name.

resolvers talosnameserver
    nameserver ns1 xx.xx.xx.xx:53

backend talos_controlplane
    mode tcp
    balance     roundrobin
        server kube-cp-01 karina.aespa.sm.infra.testlab.kube:50000 resolvers talosnameserver
        server kube-cp-02 irene.redvelvet.sm.infra.testlab.kube:50000 resolvers talosnameserver
        server kube-cp-03 seulgi.redvelvet.sm.infra.testlab.kube:50000 resolvers talosnameserver

To provide internal DNS resolution, I ran CoreDNS on the bootstrap node. It resolved internal hostnames such as karina.aespa.sm.infra.testlab.kube to their corresponding IP addresses within the network, allowing Talos and Kubernetes components to operate without relying on external DNS.

MinIO was also hosted on the bootstrap node and used as a remote configuration store. Talos nodes fetched their machine configurations from it during initialization, enabling consistent configuration delivery across the cluster.

To streamline node provisioning, I built two custom Talos ISO images (one for control planes, and one for workers) that embedded a few key parameters:

  • The IP address of the CoreDNS service.
  • The remote configuration URL hosted on MinIO.
  • Environment-specific customizations needed for local hardware.
  • Required plugins such as iscsi and btrfs (needed later for volume provisioning)

With this setup, any bare-metal node could be booted using the custom ISO and automatically connect to the correct services to initialize and join the cluster.

The final step was to patch the hostname on each node, running talosctl mc patch against the following configuration:

- op: add
  path: /machine/network/hostname
  value: "karina.aespa.sm.infra.testlab.kube"

This allowed the certificate SANs to match the hostnames of the nodes. Below is the final output of talosctl get members, after all patches were applied:

> talosctl get members --endpoints seulgi.redvelvet.sm.infra.testlab.kube --nodes seulgi.redvelvet.sm.infra.testlab.kube --talosconfig=./talosconfig
NODE                                      NAMESPACE   TYPE     ID         VERSION   HOSTNAME                                  MACHINE TYPE   OS               ADDRESSES
seulgi.redvelvet.sm.infra.testlab.kube   cluster     Member   giselle    1         giselle.aespa.sm.infra.testlab.kube      worker         Talos (v1.8.3)   ["xx.xx.xx.xx"]
seulgi.redvelvet.sm.infra.testlab.kube   cluster     Member   irene      1         irene.redvelvet.sm.infra.testlab.kube    controlplane   Talos (v1.8.3)   ["xx.xx.xx.xx"]
seulgi.redvelvet.sm.infra.testlab.kube   cluster     Member   joy        1         joy.redvelvet.sm.infra.testlab.kube      worker         Talos (v1.8.3)   ["xx.xx.xx.xx"]
seulgi.redvelvet.sm.infra.testlab.kube   cluster     Member   karina     1         karina.aespa.sm.infra.testlab.kube       controlplane   Talos (v1.8.3)   ["xx.xx.xx.xx"]
seulgi.redvelvet.sm.infra.testlab.kube   cluster     Member   ningning   1         ningning.aespa.sm.infra.testlab.kube     worker         Talos (v1.8.3)   ["xx.xx.xx.xx"]
seulgi.redvelvet.sm.infra.testlab.kube   cluster     Member   seulgi     1         seulgi.redvelvet.sm.infra.testlab.kube   controlplane   Talos (v1.8.3)   ["xx.xx.xx.xx"]
seulgi.redvelvet.sm.infra.testlab.kube   cluster     Member   wendy      1         wendy.redvelvet.sm.infra.testlab.kube    worker         Talos (v1.8.3)   ["xx.xx.xx.xx"]
seulgi.redvelvet.sm.infra.testlab.kube   cluster     Member   winter     1         winter.aespa.sm.infra.testlab.kube       worker         Talos (v1.8.3)   ["xx.xx.xx.xx"]
seulgi.redvelvet.sm.infra.testlab.kube   cluster     Member   yeri       1         yeri.redvelvet.sm.infra.testlab.kube     worker         Talos (v1.8.3)   ["xx.xx.xx.xx"]

Bootstrapping ArgoCD

With the Kubernetes control plane operational, the next step was to deploy ArgoCD to manage the cluster using GitOps.

I started with a simple, non-HA installation of ArgoCD using helmfile. This initial deployment was enough to get the UI and API up and running, allowing basic interaction and configuration.

Once ArgoCD was available, I connected it to a Forgejo instance running on the bootstrap node. This repository served as the source of truth for cluster configuration and workloads. All manifests and Helm charts were stored here, enabling version-controlled, reproducible infrastructure management.

After confirming connectivity and repository sync, I applied a manifest to ArgoCD that defined its own high-availability configuration. This included multiple replicas and appropriate resource definitions - effectively allowing ArgoCD to manage itself via GitOps. Once this was in place, the bootstrap install could be replaced entirely by the version managed through the repo, completing the loop.

This transition marked the handoff from manual setup to fully declarative infrastructure management, with ArgoCD continuously reconciling the desired state of the cluster from Git.

I also added the following section to configs.cm:

application.resourceTrackingMethod: annotation+label
kustomize.buildOptions: --enable-helm
resource.customizations: |
    argoproj.io/Application:
    health.lua: |
        hs = {}
        hs.status = "Progressing"
        hs.message = ""
        if obj.status ~= nil then
        if obj.status.health ~= nil then
            hs.status = obj.status.health.status
            if obj.status.health.message ~= nil then
            hs.message = obj.status.health.message
            end
        end
        end
        return hs

The kustomize.buildOptions section is necessary because some helm charts (such as the one for kube-prometheus) have CRDs that exceed the default maximum character limit. The resource.customizations section changes ArgoCD's behavior so that child apps in the app-of-apps pattern prevent progression, which is important when using the sync-wave annotation.


Backlinks