6. The PostgreSQL Cluster¶

Goal: create the Cluster resource — the actual database. Everything so far was scaffolding; this is the centerpiece. We will read the manifest field by field, because each line maps to a concept from the Concepts section.

flowchart TB
    cl["Cluster: pg"] --> p["pg-1 (primary)"]
    cl --> s1["pg-2 (standby)"]
    cl --> s2["pg-3 (standby)"]
    cl -->|creates| rw["Service pg-rw → primary"]
    cl -->|creates| ro["Service pg-ro → standbys"]
    cl -->|creates| r["Service pg-r → any"]
    cl -->|plugin| os["ObjectStore pg-r2-store → R2"]
    cl -->|imageCatalogRef major 18| cat["ClusterImageCatalog"]
    p -. sync replication (number:1) .- s1

The full manifest¶

Save as cluster.yaml. Read the annotations — they are the lesson.

cluster.yaml

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: pg
  namespace: production
spec:
  instances: 3                       # (1)!

  imageCatalogRef:                   # (2)!
    apiGroup: postgresql.cnpg.io
    kind: ClusterImageCatalog
    name: postgresql
    major: 18

  primaryUpdateStrategy: unsupervised   # (3)!

  storage:
    storageClass: longhorn-postgres   # (4)!
    size: 20Gi
  walStorage:                         # (5)!
    storageClass: longhorn-postgres
    size: 10Gi

  resources:                          # (6)!
    requests:
      cpu: "500m"
      memory: "1Gi"
    limits:
      memory: "1Gi"

  postgresql:
    synchronous:                      # (7)!
      method: any
      number: 1
      # dataDurability: required      # (8)!
    parameters:                       # (9)!
      shared_buffers: "256MB"
      effective_cache_size: "768MB"
      max_connections: "200"
      log_min_duration_statement: "1000"

  enableSuperuserAccess: true         # (10)!

  bootstrap:
    initdb:                           # (11)!
      database: app
      owner: app_user
      secret:
        name: pg-app-credentials

  plugins:                            # (12)!
    - name: barman-cloud.cloudnative-pg.io
      isWALArchiver: true
      parameters:
        barmanObjectName: pg-r2-store

  monitoring:                         # (13)!
    enablePodMonitor: false

Three instances = 1 primary + 2 standbys, one per node thanks to the operator's default pod anti-affinity. Losing a node leaves a primary + one in-sync standby.
Resolve the image from the ClusterImageCatalog we created, major version 18.
On operator upgrades / planned changes, perform an automatic switchover. unsupervised = the operator completes it for you. (Single-instance clusters just restart.)
The disposable, 1-replica Longhorn class from Layer 1. Each instance gets its own PVC from it.
Separate volume for WAL. A best practice: isolating WAL I/O from data I/O improves performance and prevents a WAL spike from filling the data volume. Optional but recommended.
Size from real measurement later; start modest. Note limits.memory equals requests.memory so the pod gets a stable (Guaranteed) QoS.
Synchronous replication, quorum-based (method: any), requiring at least 1 standby to confirm each commit before it returns. This is the modern .spec.postgresql.synchronous API (replacing the old minSyncReplicas/maxSyncReplicas).
dataDurability (1.25+) governs the trade-off when standbys are missing. required (default) pauses writes if the sync standby is unavailable (strict, RPO=0). preferred relaxes to self-healing (keeps accepting writes, may lose the last few on a bad failover). Given your frequent Kured reboots, decide consciously — see the note below.
A few sane PostgreSQL settings. log_min_duration_statement: "1000" logs queries slower than 1s. Tune shared_buffers/effective_cache_size to your node memory.
Explicitly enable the superuser secret. Since CNPG 1.21, enableSuperuserAccess defaults to false, so the pg-superuser secret does not exist unless you ask for it. (Older tutorials that tell you to read a superuser secret silently assume this is on.) Leave it false if you do not need superuser login.
Bootstrap an application database app owned by app_user, with the password from the Secret we created. initdb = "initialize a brand-new empty database". (To instead restore from a backup, you use bootstrap.recovery — that is the DR chapter.)
Wire the backup plugin. isWALArchiver: true makes this plugin the WAL archiver; barmanObjectName points at the ObjectStore from the previous chapter. This single block is what sends base backups and WAL to R2.
We will turn metrics on in the Monitoring chapter; leaving it off now keeps the surface small.

Durability vs availability in your environment

With number: 1 and dataDurability: required, if both standbys are momentarily unavailable, writes block until one returns. With three nodes and PodDisruptionBudgets, a single Kured reboot leaves you primary + one standby, so the quorum of 1 is still satisfiable. But if you ever drop to a lone primary, strict mode halts writes. If keeping the app writable matters more than RPO=0, set dataDurability: preferred. There is no free lunch: choose, and document the choice.

Apply and watch it come up¶

kubectl apply -f cluster.yaml

# Watch the high-level status flip to healthy (2–3 min on first run)
kubectl get cluster pg -n production -w

The phase progresses through Setting up primary → Creating replica → Cluster in healthy state.

Inspect with the plugin¶

kubectl cnpg status pg -n production

This shows the primary, each standby, replication lag, synchronous state, and — once a backup has run — continuous archiving status. Look for the instances spread across all three nodes:

kubectl get pods -n production -o wide -l cnpg.io/cluster=pg

What the operator created for you¶

kubectl get pods,svc,pvc,secret -n production | grep pg

You will see three pods (pg-1, pg-2, pg-3), three Services (pg-rw, pg-ro, pg-r), three PVCs (plus WAL PVCs), and operator-managed secrets (TLS certs, and pg-superuser because we enabled it). The next chapter explains how to connect through those Services.

What could go wrong¶

Pods Pending → usually storage: check kubectl describe pvc -n production and that longhorn-postgres exists and Longhorn is healthy.
Stuck Setting up primary → check kubectl cnpg status and operator logs (kubectl logs -n cnpg-system deploy/cnpg-controller-manager).
Writes hanging → likely synchronous replication with no available standby; see the durability note above.
No pg-superuser secret → you left enableSuperuserAccess at its default false.

Where to go deeper¶

Next: Connecting & pooling.