Constraints & decisions¶
Every choice in this guide traces back to a constraint of the environment. This chapter collects them so the manifests later make sense. Think of it as the "design rationale".
1. Hetzner block volumes are zonal and snapshot-less¶
Two properties of Hetzner's native block storage shaped our storage choice:
- Zonal: a volume lives in one Hetzner location. A pod using it can only run on a node in that same location. Spread your 3 nodes across locations and a database instance can only be rescheduled where its volume already is.
- No CSI snapshots: the Hetzner CSI driver has never implemented volume snapshots (long-standing open request). That removes a common recovery and cloning mechanism.
Decisions:
- Keep all 3 nodes in one location to avoid zonal-volume scheduling pain.
- Use Longhorn instead of the Hetzner CSI, because Longhorn provides both snapshots and S3 backups.
2. Postgres already replicates — so don't double-replicate heavily¶
Longhorn can keep multiple replicas of every volume. But our PostgreSQL cluster already keeps three copies of the data (one primary + two standbys, via streaming replication). If Longhorn also kept 3 copies of each volume, every write would be amplified across both layers — slow and wasteful.
flowchart TB
w["1 write"] --> pg["Postgres: replicated to<br/>2 standbys (3 copies)"]
pg --> badcase{"Longhorn replicas?"}
badcase -->|"3 (default)"| bad["≈ 9 physical copies<br/>high latency, high cost"]
badcase -->|"1 (our choice)"| good["3 copies total<br/>Postgres owns redundancy"]
Decision: numberOfReplicas: "1" for the Postgres StorageClass. Storage is
disposable; PostgreSQL is the single source of redundancy.
What 'disposable storage' implies
With 1 Longhorn replica, if a node dies the data on that node's volume is gone. That is fine: the operator detects the lost instance and rebuilds it from scratch from the primary (a fresh base copy). You trade a heavier rebuild for far less day-to-day write overhead and a simpler mental model. The cluster as a whole never loses data as long as a healthy primary or in-sync standby survives.
We also use Longhorn's dataLocality: best-effort so each volume keeps a
replica on the same node as its pod, minimizing latency, and node storage
(not attached Hetzner volumes) because the module documents node storage as the
right choice for databases.
3. Same CPU architecture everywhere¶
CloudNativePG does not support a cluster with mixed CPU architectures. kube-hetzner allows mixing x86 (CX/CPX/CCX) and ARM (CAX) nodes, so this is a trap you must avoid.
Decision: all 3 nodes are amd64. Use cx/cpx/ccx server types, never
cax (those are ARM). This means no nodeSelector gymnastics and the
standard ...-system-trixie operand image just works.
4. The OS auto-reboots — so failover is not a rare event¶
With auto-upgrades enabled, Kured periodically drains and reboots nodes for OS updates, and the system-upgrade-controller updates k3s. Draining a node means its pods are evicted. For your database, that means switchovers and rebuilds happen routinely, not just in disasters.
sequenceDiagram
participant K as Kured
participant N as Node (primary's)
participant O as CNPG operator
participant App as Application
K->>N: cordon + drain for OS update
N->>O: primary pod evicted
O->>O: promote an in-sync standby
O->>App: rw Service now points to new primary
Note over App: brief reconnect; then normal
N->>N: reboot, rejoin
O->>N: rebuild the old primary as a standby
Decisions / consequences:
- The operator creates PodDisruptionBudgets that stop a drain from taking the primary and a standby down at the same time — keep them.
- Your application must reconnect gracefully (retry on connection drop). This is not optional in this environment.
- Testing failover is mandatory, because it will happen on its own. Better to see it in a controlled test first.
- For the eventual production cluster, consider scheduling Kured maintenance windows so reboots happen at quiet hours.
5. Three nodes, three instances¶
With exactly 3 nodes we run 3 Postgres instances, one per node, using the operator's pod anti-affinity so they spread out. Losing one node leaves a primary plus one in-sync standby — still safe.
Control-plane co-location is a learning-phase compromise
To keep costs at three nodes, the simplest layout makes all three nodes
schedulable control-planes (allow_scheduling_on_control_plane = true).
Running Longhorn and a busy database on the same nodes as etcd is fine
for learning and tearing down daily, but not ideal for production:
etcd is sensitive to disk latency. For the
final IaC, move the database to dedicated
agent nodes and keep control-planes clean.
6. The R2 backup caveats¶
R2 is S3-compatible but not perfectly so:
- A recent change in the AWS SDK's data-integrity checksums can cause an
x-amz-content-sha256error with S3-compatible providers. The fix is a pair of environment variables on theObjectStore(we include them). - More seriously, there is a reported case where backups upload to R2 but restore fails with the Barman Cloud Plugin.
Decisions:
- Apply the checksum workaround from the start.
- Validate a full backup → restore cycle early (in the PITR chapter), not on the day you need it.
- Keep Longhorn's own backup-to-R2 as an independent fallback, and be ready to switch the Barman target to a known-good S3 provider (AWS S3, or Backblaze B2 with the documented workaround) if validation fails.
Summary table¶
| Constraint | Decision |
|---|---|
| Hetzner volumes zonal & snapshot-less | One location; use Longhorn |
| Postgres self-replicates | Longhorn numberOfReplicas: "1", disposable storage |
| No mixed architectures allowed | All amd64 (cx/cpx/ccx, never cax) |
| OS auto-reboots cause failovers | Keep PDBs; app must retry; test failover; maintenance windows later |
| 3 nodes | 3 instances, one per node, anti-affinity |
| R2 imperfect S3 + restore bug | Checksum workaround; validate restore; Longhorn-to-R2 fallback |
Next: the version matrix — exactly which versions we pin and why.