Skip to content

Why CloudNativePG

Now we connect the two previous chapters: how an operator turns the PostgreSQL high-availability concepts into a few lines of declarative YAML.

The problem with doing it yourself

You could run Postgres on a plain Kubernetes StatefulSet. That gets you a pod with a persistent volume — and nothing else. You would still have to build and operate, by hand:

  • primary election and automatic failover when the primary dies,
  • streaming replication setup and standby bootstrapping,
  • backup scheduling, WAL archiving, and point-in-time recovery,
  • connection pooling,
  • TLS certificates and their rotation,
  • rolling minor and major version upgrades.

That is months of work and a permanent operational burden. An operator packages all of it.

What CloudNativePG is

CloudNativePG (CNPG) is an open-source Kubernetes operator dedicated to PostgreSQL. It is a CNCF project. You describe the database you want with a Cluster custom resource, and the operator continuously reconciles reality toward it:

flowchart TB
    spec["Cluster spec:<br/>instances: 3<br/>storage: 20Gi<br/>backup: → R2<br/>synchronous: 1"]
    spec --> op["CNPG operator"]
    op --> a["Bootstraps primary + 2 standbys"]
    op --> b["Wires streaming replication"]
    op --> c["Creates rw / ro / r Services"]
    op --> d["Runs base backups + WAL archive"]
    op --> e["Promotes a standby on failure"]
    op --> f["Manages TLS certificates"]

Its capabilities, at a glance:

Capability What the operator does
High availability Multi-instance clusters with automatic primary election
Failover Promotes the most up-to-date standby in seconds, re-points the Service
Replication Streaming, synchronous or asynchronous, configurable
Backup & PITR Native integration with object storage (via the Barman Cloud Plugin)
Pooling Built-in PgBouncer via the Pooler resource
TLS Certificates issued and rotated automatically
Upgrades Rolling minor upgrades; declarative major upgrades

How CNPG does HA — and what it is not

This is worth getting right, because a lot of older tutorials say it wrong.

CloudNativePG does not use an external consensus system (no etcd, no Consul, no Raft) to decide who the primary is. Instead it uses the Kubernetes API server itself as the single source of truth, eliminating the need for external coordination tools. The operator observes instance health and records the authoritative primary in Kubernetes.

Common misconception

If you read "CloudNativePG elects a primary via Raft consensus", that is incorrect. Raft is how tools like Patroni-with-etcd or the Kubernetes control plane's own etcd work. CNPG leans on the API server that is already there. Newer versions additionally offer an optional quorum-based failover to choose which replica to promote more safely — but that is a promotion-safety feature, not a separate consensus cluster you run.

Another structural detail: CNPG manages PersistentVolumeClaims directly rather than using a StatefulSet. This gives it finer control over how instances and their volumes are created, replaced, and rejoined — which is exactly the kind of Postgres-aware behavior an operator exists to provide.

The resources you will create

Across this guide you will write a handful of CNPG custom resources. Meet them now so they are familiar later:

Kind API group Purpose
Cluster postgresql.cnpg.io The database itself: instances, storage, config, backup wiring
Pooler postgresql.cnpg.io A PgBouncer deployment in front of the cluster
ScheduledBackup postgresql.cnpg.io A cron-like schedule for base backups
ImageCatalog / ClusterImageCatalog postgresql.cnpg.io Maps a Postgres major version to a specific operand image
ObjectStore barmancloud.cnpg.io Backup destination + credentials (provided by the Barman Cloud Plugin)
flowchart LR
    cat["ClusterImageCatalog<br/>(which image)"] --> cl
    obj["ObjectStore<br/>(where backups go)"] --> cl
    cl["Cluster"] --> pooler["Pooler"]
    cl --> sched["ScheduledBackup"]

Why an operator fits your goal

Your aim is reproducible infrastructure that comes back from zero. An operator is ideal for that: the entire database — topology, replication, backup policy — is captured in a few YAML files you can store in Git. Re-applying them on a fresh cluster recreates the database, and a bootstrap-from-backup clause restores the data from R2. That is the whole pitch of full IaC.

Where to go deeper

Next: the target environment — the specific stack we are deploying onto and why.