Why CloudNativePG¶
Now we connect the two previous chapters: how an operator turns the PostgreSQL high-availability concepts into a few lines of declarative YAML.
The problem with doing it yourself¶
You could run Postgres on a plain Kubernetes StatefulSet. That gets you a
pod with a persistent volume — and nothing else. You would still have to build
and operate, by hand:
- primary election and automatic failover when the primary dies,
- streaming replication setup and standby bootstrapping,
- backup scheduling, WAL archiving, and point-in-time recovery,
- connection pooling,
- TLS certificates and their rotation,
- rolling minor and major version upgrades.
That is months of work and a permanent operational burden. An operator packages all of it.
What CloudNativePG is¶
CloudNativePG (CNPG) is an open-source Kubernetes operator dedicated to
PostgreSQL. It is a CNCF project.
You describe the database you want with a Cluster custom resource, and the
operator continuously reconciles reality toward it:
flowchart TB
spec["Cluster spec:<br/>instances: 3<br/>storage: 20Gi<br/>backup: → R2<br/>synchronous: 1"]
spec --> op["CNPG operator"]
op --> a["Bootstraps primary + 2 standbys"]
op --> b["Wires streaming replication"]
op --> c["Creates rw / ro / r Services"]
op --> d["Runs base backups + WAL archive"]
op --> e["Promotes a standby on failure"]
op --> f["Manages TLS certificates"]
Its capabilities, at a glance:
| Capability | What the operator does |
|---|---|
| High availability | Multi-instance clusters with automatic primary election |
| Failover | Promotes the most up-to-date standby in seconds, re-points the Service |
| Replication | Streaming, synchronous or asynchronous, configurable |
| Backup & PITR | Native integration with object storage (via the Barman Cloud Plugin) |
| Pooling | Built-in PgBouncer via the Pooler resource |
| TLS | Certificates issued and rotated automatically |
| Upgrades | Rolling minor upgrades; declarative major upgrades |
How CNPG does HA — and what it is not¶
This is worth getting right, because a lot of older tutorials say it wrong.
CloudNativePG does not use an external consensus system (no etcd, no Consul, no Raft) to decide who the primary is. Instead it uses the Kubernetes API server itself as the single source of truth, eliminating the need for external coordination tools. The operator observes instance health and records the authoritative primary in Kubernetes.
Common misconception
If you read "CloudNativePG elects a primary via Raft consensus", that is incorrect. Raft is how tools like Patroni-with-etcd or the Kubernetes control plane's own etcd work. CNPG leans on the API server that is already there. Newer versions additionally offer an optional quorum-based failover to choose which replica to promote more safely — but that is a promotion-safety feature, not a separate consensus cluster you run.
Another structural detail: CNPG manages PersistentVolumeClaims directly
rather than using a StatefulSet. This gives it finer control over how
instances and their volumes are created, replaced, and rejoined — which is
exactly the kind of Postgres-aware behavior an operator exists to provide.
The resources you will create¶
Across this guide you will write a handful of CNPG custom resources. Meet them now so they are familiar later:
| Kind | API group | Purpose |
|---|---|---|
Cluster |
postgresql.cnpg.io |
The database itself: instances, storage, config, backup wiring |
Pooler |
postgresql.cnpg.io |
A PgBouncer deployment in front of the cluster |
ScheduledBackup |
postgresql.cnpg.io |
A cron-like schedule for base backups |
ImageCatalog / ClusterImageCatalog |
postgresql.cnpg.io |
Maps a Postgres major version to a specific operand image |
ObjectStore |
barmancloud.cnpg.io |
Backup destination + credentials (provided by the Barman Cloud Plugin) |
flowchart LR
cat["ClusterImageCatalog<br/>(which image)"] --> cl
obj["ObjectStore<br/>(where backups go)"] --> cl
cl["Cluster"] --> pooler["Pooler"]
cl --> sched["ScheduledBackup"]
Why an operator fits your goal¶
Your aim is reproducible infrastructure that comes back from zero. An operator is ideal for that: the entire database — topology, replication, backup policy — is captured in a few YAML files you can store in Git. Re-applying them on a fresh cluster recreates the database, and a bootstrap-from-backup clause restores the data from R2. That is the whole pitch of full IaC.
Where to go deeper¶
- CloudNativePG documentation (1.29)
- CloudNativePG architecture
- Operator capability levels
- Why not a StatefulSet? — Custom Pod Controller
Next: the target environment — the specific stack we are deploying onto and why.