Why CloudNativePG¶

Now we connect the two previous chapters: how an operator turns the PostgreSQL high-availability concepts into a few lines of declarative YAML.

The problem with doing it yourself¶

You could run Postgres on a plain Kubernetes StatefulSet. That gets you a pod with a persistent volume — and nothing else. You would still have to build and operate, by hand:

primary election and automatic failover when the primary dies,
streaming replication setup and standby bootstrapping,
backup scheduling, WAL archiving, and point-in-time recovery,
connection pooling,
TLS certificates and their rotation,
rolling minor and major version upgrades.

That is months of work and a permanent operational burden. An operator packages all of it.

What CloudNativePG is¶

CloudNativePG (CNPG) is an open-source Kubernetes operator dedicated to PostgreSQL. It is a CNCF project. You describe the database you want with a Cluster custom resource, and the operator continuously reconciles reality toward it:

flowchart TB
    spec["Cluster spec:<br/>instances: 3<br/>storage: 20Gi<br/>backup: → R2<br/>synchronous: 1"]
    spec --> op["CNPG operator"]
    op --> a["Bootstraps primary + 2 standbys"]
    op --> b["Wires streaming replication"]
    op --> c["Creates rw / ro / r Services"]
    op --> d["Runs base backups + WAL archive"]
    op --> e["Promotes a standby on failure"]
    op --> f["Manages TLS certificates"]

Its capabilities, at a glance:

Capability	What the operator does
High availability	Multi-instance clusters with automatic primary election
Failover	Promotes the most up-to-date standby in seconds, re-points the Service
Replication	Streaming, synchronous or asynchronous, configurable
Backup & PITR	Native integration with object storage (via the Barman Cloud Plugin)
Pooling	Built-in PgBouncer via the `Pooler` resource
TLS	Certificates issued and rotated automatically
Upgrades	Rolling minor upgrades; declarative major upgrades

How CNPG does HA — and what it is not¶

This is worth getting right, because a lot of older tutorials say it wrong.

CloudNativePG does not use an external consensus system (no etcd, no Consul, no Raft) to decide who the primary is. Instead it uses the Kubernetes API server itself as the single source of truth, eliminating the need for external coordination tools. The operator observes instance health and records the authoritative primary in Kubernetes.

Common misconception

If you read "CloudNativePG elects a primary via Raft consensus", that is incorrect. Raft is how tools like Patroni-with-etcd or the Kubernetes control plane's own etcd work. CNPG leans on the API server that is already there. Newer versions additionally offer an optional quorum-based failover to choose which replica to promote more safely — but that is a promotion-safety feature, not a separate consensus cluster you run.

Another structural detail: CNPG manages PersistentVolumeClaims directly rather than using a StatefulSet. This gives it finer control over how instances and their volumes are created, replaced, and rejoined — which is exactly the kind of Postgres-aware behavior an operator exists to provide.

The resources you will create¶

Across this guide you will write a handful of CNPG custom resources. Meet them now so they are familiar later:

Kind	API group	Purpose
`Cluster`	`postgresql.cnpg.io`	The database itself: instances, storage, config, backup wiring
`Pooler`	`postgresql.cnpg.io`	A PgBouncer deployment in front of the cluster
`ScheduledBackup`	`postgresql.cnpg.io`	A cron-like schedule for base backups
`ImageCatalog` / `ClusterImageCatalog`	`postgresql.cnpg.io`	Maps a Postgres major version to a specific operand image
`ObjectStore`	`barmancloud.cnpg.io`	Backup destination + credentials (provided by the Barman Cloud Plugin)

flowchart LR
    cat["ClusterImageCatalog<br/>(which image)"] --> cl
    obj["ObjectStore<br/>(where backups go)"] --> cl
    cl["Cluster"] --> pooler["Pooler"]
    cl --> sched["ScheduledBackup"]

Why an operator fits your goal¶

Your aim is reproducible infrastructure that comes back from zero. An operator is ideal for that: the entire database — topology, replication, backup policy — is captured in a few YAML files you can store in Git. Re-applying them on a fresh cluster recreates the database, and a bootstrap-from-backup clause restores the data from R2. That is the whole pitch of full IaC.

Where to go deeper¶

Next: the target environment — the specific stack we are deploying onto and why.