Sunsetting Contabo
For a little over a year, I've been running Kubernetes via Talos Linux on Contabo VPSes.
I originally chose Contabo because it was the cheapest provider that Talos Linux images worked out-of-the-box with. At the time, I also found value in having unreliable nodes to passively test the resilience of high availability setups.
However, as my workloads grew, Contabo stopped being a good fit. The biggest problem was absurdly high steal times (>90%!) causing timeout issues, especially with systems that must maintain consensus (e.g. Longhorn volume replication, etcd, Kafka). Additionally, the network bandwith was limited to 100Mb/s, which meant aggressive downsampling of metrics and logs, which was at odds with my desire to add even more instrumentation via ebpf, with tools like Parca.
I'm not going to pretend this wasn't a significant amount of work. In total, this was a 12-node cluster spanning three control planes and nine workers, managing ~70 vCPUs, ~200GB RAM, and 6.2TB of storage across three separate node pools.
So, before I close this chapter, here's how my setup progressed:
SM Cluster
Aespa nodepool
I started my self-hosted Kubernetes journey with this nodepool, setting up a single control plane and three workers to create a highly available foundation for my core workloads.
| name | role | vCPU | RAM (GB) | Storage |
|---|---|---|---|---|
| Karina | control plane | 6 | 12 | 200GB |
| Giselle | worker | 8 | 24 | 1.2TB |
| Ningning | worker | 8 | 24 | 1.2TB |
| Winter | worker | 8 | 24 | 1.2TB |
Red Velvet nodepool
As the cluster grew and I needed more capacity, I added this nodepool, expanding the control plane to three nodes and adding three more workers.
| name | role | vCPU | RAM (GB) | Storage |
|---|---|---|---|---|
| Irene | control plane | 6 | 12 | 200GB |
| Seulgi | worker (was control plane) | 4 | 6 | 400GB |
| Wendy | worker | 12 | 48 | 1.6TB |
| Joy | worker | 6 | 16 | 400GB |
| Yeri | worker | 6 | 16 | 400GB |
Hearts2Hearts nodepool
By early 2025, the cluster's demands had grown - particularly on the control plane. API server load was increasing, especially during deploy-heavy periods and when running resource-intensive workloads like Prometheus or ArgoCD. To keep things responsive and maintain headroom, I brought in a third node pool with beefier control-plane nodes and additional workers.
This also allowed me to spread the control plane across all three node pools, improving availability and reducing the blast radius of any single failure domain.
| name | role | vCPU | RAM (GB) | Storage |
|---|---|---|---|---|
| Jiwoo | control plane | 6 | 12 | 400GB |
| Stella | worker | 6 | 12 | 400GB |
| Ian | worker | 6 | 12 | 400GB |
| Yuha | worker | 4 | 6 | 400GB |
| Yeon | worker | 4 | 6 | 400GB |
Backlinks