Skip to content

PostgreSQL on Kubernetes — A Study Guide

This guide teaches you how to run a production-grade PostgreSQL cluster on a small k3s Kubernetes cluster hosted on Hetzner Cloud, using the CloudNativePG operator. By the end you will have a database that heals itself when a node dies, takes automated backups to Cloudflare R2, and can be rebuilt from scratch by re-running infrastructure code.

It is written for someone who is comfortable on the command line but new to Kubernetes operators and Postgres replication. Concepts come before commands.

What you will build

flowchart TB
    subgraph cloud["Hetzner Cloud — 3 amd64 nodes, one location"]
        direction TB
        subgraph k3s["k3s cluster (deployed by terraform-hcloud-kube-hetzner)"]
            op["CloudNativePG operator<br/>(watches Cluster resources)"]
            subgraph pg["PostgreSQL Cluster (3 instances)"]
                p["Primary<br/>(read-write)"]
                s1["Standby 1<br/>(streaming replica)"]
                s2["Standby 2<br/>(streaming replica)"]
            end
            pool["PgBouncer (Pooler)"]
            ln["Longhorn<br/>(node storage, 1 replica)"]
        end
    end
    app["Your application"] --> pool --> p
    p -- "streaming replication" --> s1
    p -- "streaming replication" --> s2
    p -- "base backups + WAL" --> r2[("Cloudflare R2<br/>object storage")]
    op -. "reconciles desired state" .-> pg
    pg --- ln

    classDef store fill:#e8eaf6,stroke:#3949ab;
    class r2,ln store;

The application never talks to a specific pod. It talks to a Service that the operator keeps pointed at whichever instance is currently the primary. If the primary dies, the operator promotes a standby and re-points the Service — usually within seconds.

The three layers

We build in three layers so the whole thing is reproducible from zero:

Layer Tool What lives here
1. Platform Terraform / OpenTofu (kube.tf) The 3 nodes, k3s, cert-manager, Longhorn
2. Postgres stack kubectl (learning), then manifests Operator, backup plugin, the database Cluster, backups, security
3. Full IaC Kustomize / GitOps Layer 2 folded into code so apply + restore-from-R2 rebuilds everything

The end goal

A single terraform apply brings up the cluster, the operator deploys the database, and a bootstrap-from-backup step restores your data from R2 — a working database in minutes, from nothing.

A note on honesty and safety

Two facts shape every backup decision in this guide, and we will keep returning to them:

  1. The Hetzner block-storage CSI driver does not support volume snapshots. That removes one common safety net, which is part of why we chose Longhorn.
  2. The Barman Cloud Plugin has a reported restore problem with Cloudflare R2. Backups upload fine; restoring has failed for some users. We therefore validate a full backup → restore cycle before trusting it (see Disaster recovery).

Drafts, not production secrets

Every manifest and Terraform snippet in this guide is a draft to review and adapt. Placeholders (bucket names, account IDs, server types, passwords) must be replaced with your own values, and you should read the linked upstream docs before applying anything to a real cluster.

Start with the Kubernetes primer if the words "operator", "CRD", or "reconcile" are unfamiliar. Otherwise jump to the target environment.