Skip to content

6. The PostgreSQL Cluster

Goal: create the Cluster resource — the actual database. Everything so far was scaffolding; this is the centerpiece. We will read the manifest field by field, because each line maps to a concept from the Concepts section.

flowchart TB
    cl["Cluster: pg"] --> p["pg-1 (primary)"]
    cl --> s1["pg-2 (standby)"]
    cl --> s2["pg-3 (standby)"]
    cl -->|creates| rw["Service pg-rw → primary"]
    cl -->|creates| ro["Service pg-ro → standbys"]
    cl -->|creates| r["Service pg-r → any"]
    cl -->|plugin| os["ObjectStore pg-r2-store → R2"]
    cl -->|imageCatalogRef major 18| cat["ClusterImageCatalog"]
    p -. sync replication (number:1) .- s1

The full manifest

Save as cluster.yaml. Read the annotations — they are the lesson.

cluster.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: pg
  namespace: production
spec:
  instances: 3                       # (1)!

  imageCatalogRef:                   # (2)!
    apiGroup: postgresql.cnpg.io
    kind: ClusterImageCatalog
    name: postgresql
    major: 18

  primaryUpdateStrategy: unsupervised   # (3)!

  storage:
    storageClass: longhorn-postgres   # (4)!
    size: 20Gi
  walStorage:                         # (5)!
    storageClass: longhorn-postgres
    size: 10Gi

  resources:                          # (6)!
    requests:
      cpu: "500m"
      memory: "1Gi"
    limits:
      memory: "1Gi"

  postgresql:
    synchronous:                      # (7)!
      method: any
      number: 1
      # dataDurability: required      # (8)!
    parameters:                       # (9)!
      shared_buffers: "256MB"
      effective_cache_size: "768MB"
      max_connections: "200"
      log_min_duration_statement: "1000"

  enableSuperuserAccess: true         # (10)!

  bootstrap:
    initdb:                           # (11)!
      database: app
      owner: app_user
      secret:
        name: pg-app-credentials

  plugins:                            # (12)!
    - name: barman-cloud.cloudnative-pg.io
      isWALArchiver: true
      parameters:
        barmanObjectName: pg-r2-store

  monitoring:                         # (13)!
    enablePodMonitor: false
  1. Three instances = 1 primary + 2 standbys, one per node thanks to the operator's default pod anti-affinity. Losing a node leaves a primary + one in-sync standby.
  2. Resolve the image from the ClusterImageCatalog we created, major version 18.
  3. On operator upgrades / planned changes, perform an automatic switchover. unsupervised = the operator completes it for you. (Single-instance clusters just restart.)
  4. The disposable, 1-replica Longhorn class from Layer 1. Each instance gets its own PVC from it.
  5. Separate volume for WAL. A best practice: isolating WAL I/O from data I/O improves performance and prevents a WAL spike from filling the data volume. Optional but recommended.
  6. Size from real measurement later; start modest. Note limits.memory equals requests.memory so the pod gets a stable (Guaranteed) QoS.
  7. Synchronous replication, quorum-based (method: any), requiring at least 1 standby to confirm each commit before it returns. This is the modern .spec.postgresql.synchronous API (replacing the old minSyncReplicas/maxSyncReplicas).
  8. dataDurability (1.25+) governs the trade-off when standbys are missing. required (default) pauses writes if the sync standby is unavailable (strict, RPO=0). preferred relaxes to self-healing (keeps accepting writes, may lose the last few on a bad failover). Given your frequent Kured reboots, decide consciously — see the note below.
  9. A few sane PostgreSQL settings. log_min_duration_statement: "1000" logs queries slower than 1s. Tune shared_buffers/effective_cache_size to your node memory.
  10. Explicitly enable the superuser secret. Since CNPG 1.21, enableSuperuserAccess defaults to false, so the pg-superuser secret does not exist unless you ask for it. (Older tutorials that tell you to read a superuser secret silently assume this is on.) Leave it false if you do not need superuser login.
  11. Bootstrap an application database app owned by app_user, with the password from the Secret we created. initdb = "initialize a brand-new empty database". (To instead restore from a backup, you use bootstrap.recovery — that is the DR chapter.)
  12. Wire the backup plugin. isWALArchiver: true makes this plugin the WAL archiver; barmanObjectName points at the ObjectStore from the previous chapter. This single block is what sends base backups and WAL to R2.
  13. We will turn metrics on in the Monitoring chapter; leaving it off now keeps the surface small.

Durability vs availability in your environment

With number: 1 and dataDurability: required, if both standbys are momentarily unavailable, writes block until one returns. With three nodes and PodDisruptionBudgets, a single Kured reboot leaves you primary + one standby, so the quorum of 1 is still satisfiable. But if you ever drop to a lone primary, strict mode halts writes. If keeping the app writable matters more than RPO=0, set dataDurability: preferred. There is no free lunch: choose, and document the choice.

Apply and watch it come up

kubectl apply -f cluster.yaml

# Watch the high-level status flip to healthy (2–3 min on first run)
kubectl get cluster pg -n production -w

The phase progresses through Setting up primaryCreating replicaCluster in healthy state.

Inspect with the plugin

kubectl cnpg status pg -n production

This shows the primary, each standby, replication lag, synchronous state, and — once a backup has run — continuous archiving status. Look for the instances spread across all three nodes:

kubectl get pods -n production -o wide -l cnpg.io/cluster=pg

What the operator created for you

kubectl get pods,svc,pvc,secret -n production | grep pg

You will see three pods (pg-1, pg-2, pg-3), three Services (pg-rw, pg-ro, pg-r), three PVCs (plus WAL PVCs), and operator-managed secrets (TLS certs, and pg-superuser because we enabled it). The next chapter explains how to connect through those Services.

What could go wrong

  • Pods Pending → usually storage: check kubectl describe pvc -n production and that longhorn-postgres exists and Longhorn is healthy.
  • Stuck Setting up primary → check kubectl cnpg status and operator logs (kubectl logs -n cnpg-system deploy/cnpg-controller-manager).
  • Writes hanging → likely synchronous replication with no available standby; see the durability note above.
  • No pg-superuser secret → you left enableSuperuserAccess at its default false.

Where to go deeper

Next: Connecting & pooling.