1. Platform layer (kube.tf)¶
Goal of this chapter: declare the cluster, cert-manager, Longhorn, and a Postgres-tuned StorageClass in Terraform, so every fresh cluster comes up with the platform ready. This is Layer 1 — pure Infrastructure-as-Code.
Prerequisites
- A Hetzner Cloud project + API token (Read & Write).
terraform/tofu,packer,kubectl, andhcloudinstalled.- You have run the kube-hetzner
create.shscript once to generate the MicroOS snapshot and a starterkube.tf. See the kube-hetzner Getting Started.
What we are declaring¶
flowchart TB
tf["kube.tf"] --> cluster["3 amd64 nodes, one location,<br/>all schedulable"]
tf --> certm["enable_cert_manager = true"]
tf --> longhorn["enable_longhorn = true<br/>(node storage)"]
sc["longhorn-postgres StorageClass<br/>(1 replica, disposable)"]
longhorn --> sc
Step 1.1 — Pin the module¶
In your kube.tf, pin the module version. Never float it for reproducible
infra:
module "kube-hetzner" {
source = "kube-hetzner/kube-hetzner/hcloud"
version = "2.20.0" # (1)!
# ... provider, hcloud_token, ssh keys, network_region ...
}
- Pin to the version you tested. Check the releases page and bump deliberately, not automatically.
Step 1.2 — Nodes: 3 × amd64, one location, schedulable¶
# All three nodes are control-planes AND run workloads (learning-phase layout).
allow_scheduling_on_control_plane = true # (1)!
control_plane_nodepools = [
{
name = "cp"
server_type = "cpx31" # (2)! amd64 — adjust size to your workload
location = "nbg1" # (3)! one location for all three
labels = [
"node.longhorn.io/create-default-disk=true", # (4)!
"node.kubernetes.io/server-usage=storage",
]
taints = []
count = 3
longhorn_volume_size = 0 # (5)! 0 = node storage (fast, recommended for DBs)
}
]
agent_nodepools = []
- Production should use dedicated agent nodes instead — see Toward full IaC. For daily create/destroy, this is fine.
- amd64 only.
cpx31is an AMD shared-vCPU type. Never usecax*(ARM) here, or you risk a mixed-architecture cluster CNPG cannot run. - A single location avoids zonal-volume scheduling problems.
- Tells Longhorn to create a default disk on these nodes.
longhorn_volume_size = 0uses the node's own disk (faster) instead of an attached Hetzner volume — the module's documented recommendation for databases.
Step 1.3 — Enable cert-manager and Longhorn¶
# cert-manager: required by the Barman Cloud Plugin's TLS channel.
enable_cert_manager = true
# cert_manager_version = "v1.x.y" # pin after first deploy (see Versions)
# Longhorn: our storage layer.
enable_longhorn = true
# longhorn_version = "vX.Y.Z" # pin after first deploy
longhorn_values = <<-EOT
defaultSettings:
defaultDataPath: /var/lib/longhorn
persistence:
defaultClassReplicaCount: 2 # (1)!
EOT
- This is the cluster-wide default Longhorn class replica count, used by things other than the database. Postgres gets its own class with 1 replica in the next step.
Enabling Longhorn also enables iscsid
Longhorn needs iscsid on the nodes. The module turns it on automatically
when enable_longhorn = true, so there is nothing extra to do on MicroOS.
Step 1.4 — A StorageClass tuned for Postgres¶
PostgreSQL owns redundancy, so the database volumes use 1 Longhorn replica
(disposable storage). Save this as longhorn-postgres.yaml. During the learning
phase apply it with kubectl; in Layer 3 it moves into the module's
extra-manifests.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: longhorn-postgres
provisioner: driver.longhorn.io
allowVolumeExpansion: true # (1)!
reclaimPolicy: Delete # (2)!
volumeBindingMode: WaitForFirstConsumer
parameters:
numberOfReplicas: "1" # (3)!
staleReplicaTimeout: "2880"
dataLocality: "best-effort" # (4)!
fsType: "ext4"
- Lets a PVC grow later. Shrinking is never supported, so size generously.
Deleteremoves the volume when the PVC is deleted — appropriate for disposable storage where R2 is the durable copy. UseRetainfor the production cluster if you want volumes to survive accidental deletion.- One replica. If the node dies, the operator rebuilds that instance from the primary. PostgreSQL is the redundancy.
- Keep a replica on the same node as the pod → lower latency.
Step 1.5 — Apply and verify¶
cd <your-project-folder>
terraform init --upgrade
terraform validate
terraform apply # review the plan, then approve
export KUBECONFIG=$PWD/<clustername>_kubeconfig.yaml
kubectl get nodes -o wide # 3 nodes, all Ready, amd64
kubectl get pods -n cert-manager # cert-manager Running
kubectl get pods -n longhorn-system # Longhorn Running
kubectl apply -f longhorn-postgres.yaml
kubectl get storageclass # longhorn-postgres listed
Confirm cert-manager's API is actually ready (the Barman plugin will need it):
What could go wrong¶
caxserver type by mistake → mixed/ARM cluster. Double-check everyserver_typeiscx/cpx/ccx.- Nodes in different locations → Postgres pods can get stuck
Pendingafter a reschedule because their volume is in another location. Keep one location. - cert-manager not ready before the plugin → plugin install fails to get certificates. Always verify cert-manager first.
- MicroOS auto-reboots during a long session → expected; your workloads should tolerate it. This is the failover reality from the constraints chapter.
Where to go deeper¶
Next: Operator & cnpg plugin — Layer 2 begins.