Skip to content

Target environment

This chapter describes the exact stack we deploy onto and the reasoning behind each piece. The constraints these choices impose get their own next chapter.

The cluster: k3s on Hetzner via Terraform

We provision the cluster with the terraform-hcloud-kube-hetzner module. It is a well-maintained, opinionated way to run Kubernetes cheaply on Hetzner Cloud. Key facts:

  • k3s — a certified, lightweight Kubernetes distribution. It tracks upstream Kubernetes closely; the module currently targets a recent release (k3s ~v1.35, i.e. Kubernetes 1.35), which is comfortably new enough for every feature in this guide. Verify yours with kubectl version.
  • openSUSE MicroOS — an immutable, auto-updating container OS. The root filesystem is read-only and updates are transactional with automatic rollback.
  • Auto-upgrades — node OS reboots are coordinated by Kured, and k3s upgrades by the system-upgrade-controller. This is a feature, but for a stateful database it has real consequences (covered next chapter).
flowchart TB
    tf["kube.tf (Terraform/OpenTofu)"] -->|terraform apply| hc["Hetzner Cloud API"]
    hc --> n1["Node 1 (amd64, MicroOS)"]
    hc --> n2["Node 2 (amd64, MicroOS)"]
    hc --> n3["Node 3 (amd64, MicroOS)"]
    n1 & n2 & n3 --> k3s["k3s cluster"]
    k3s --> addons["cert-manager + Longhorn<br/>(installed by the module)"]

Our cluster is 3 amd64 nodes in a single Hetzner location. Same architecture everywhere (no x86/ARM mix) and one location keeps storage simple — both reasons are explained in Constraints & decisions.

Storage: Longhorn on node storage

The module can give you either Hetzner's block-storage CSI driver or Longhorn, a distributed block storage system for Kubernetes. We chose Longhorn, configured to use node-local storage (fast) rather than attached Hetzner volumes (slower). The module's own example file recommends exactly this for databases.

Why Longhorn and not the Hetzner CSI?

  • The Hetzner CSI driver does not support volume snapshots, removing a useful recovery option.
  • Longhorn does support CSI volume snapshots and backups to S3-compatible object storage (including R2), giving us an independent safety net.

We deliberately keep Longhorn's redundancy low (1 replica) for the database, because PostgreSQL already keeps its own copies. The reasoning — and the trade-off — is in the next chapter.

TLS plumbing: cert-manager

cert-manager is the standard Kubernetes tool for issuing and renewing TLS certificates. We need it for one specific reason: the Barman Cloud Plugin requires cert-manager to secure the TLS channel between the plugin and the operator. The kube-hetzner module can install cert-manager for us, so it becomes part of Layer 1 rather than a manual step.

Backups: Cloudflare R2

Cloudflare R2 is S3-compatible object storage with no egress fees. CloudNativePG (via the Barman Cloud Plugin) writes base backups and archived WAL to it.

flowchart LR
    pg["PostgreSQL Cluster"] -->|base backup + WAL| r2[("Cloudflare R2 bucket")]
    r2 -->|restore / PITR| new["A fresh Cluster<br/>(bootstrap from backup)"]

R2 caveat, stated up front

R2 works with the Barman Cloud Plugin but needs a known workaround for an S3 checksum incompatibility, and there is a reported failure restoring from R2 with the plugin. We use it, but we prove restore works before relying on it, and we keep Longhorn's own R2 backups as a fallback. Details in Disaster recovery.

The three-layer model (how it all fits)

flowchart TB
    subgraph L1["Layer 1 — Platform (Terraform / kube.tf)"]
        nodes["3 nodes + k3s"]
        cm["cert-manager"]
        lh["Longhorn + longhorn-postgres StorageClass"]
    end
    subgraph L2["Layer 2 — Postgres stack (kubectl, then manifests)"]
        oper["CNPG operator"]
        bcp["Barman Cloud Plugin"]
        icat["ImageCatalog"]
        os["ObjectStore (R2)"]
        clu["Cluster + Pooler + ScheduledBackup + NetworkPolicy"]
    end
    subgraph L3["Layer 3 — Full IaC (Kustomize / GitOps)"]
        fold["Layer 2 folded into code"]
        boot["Bootstrap-from-R2 on fresh clusters"]
    end
    L1 --> L2 --> L3
  • Layer 1 is declared in kube.tf from day one, so every fresh cluster comes up with storage and cert-manager ready.
  • Layer 2 we do by hand with kubectl while learning, one resource at a time, so each piece is understood.
  • Layer 3 folds Layer 2 into the module's extra-manifests (or a GitOps tool) so the whole stack — and a restore from R2 — comes up from nothing.

Your daily workflow (and why the layers help)

You plan to create the cluster each morning and destroy it each night. The layer split is what makes that bearable: Layer 1 is one terraform apply, and once Layer 2 becomes Layer 3, the entire database returns — data included — without manual steps. Until then, the guide's Layer 2 chapters are the repeatable "muscle memory" part.

Where to go deeper

Next: Constraints & decisions — the sharp edges of this environment and how they shaped the build.