Kubernetes Postgres Backups
Services running in kubernetes backed by a datastore also in kubernetes require backups just like everything else. This post describes one way to make automated backups for a PostgreSQL DB in kubernetes.
With a few modifications, the approach detailed here can be applied to other databases that are not Postgres.
Requirements
- Backups must be taken regularly, once a week is fine for this example
- Backups must be stored on a separate host, as part of a 3-2-1 strategy.
- A 3-2-1 backup strategy is having 3 copies of data, on at least two types of media, with at least one of those copies being offsite. For this post I’ll be using my NAS as a second copy. The NAS will have several backups at any given time. The offsite backup is out of scope for this post.
- Obviously no secrets should be stored in git
- Backups should be encrypted at rest
Out Of Scope
- N backups are kept at any given time. I have other existing automation that handles this.
- Creating kubernetes secrets used in the backup. This post assumes the secrets are either already created, or created as part of your deploy job.
The Plan
- Create a new image based on bitnami’s postgresql image containing a shell script to dump, encrypt, and compress the a database
- Create a kubernetes CronJob to periodically backup the data
- Manually test to verify data is backed up
The Backup Shell Script
This is a quick bash script I’ve put together for taking backups. The script assumes two environment variables have been set.
- PGPASSWORD - used to access PostgreSQL database.
- ENCRYPT_PASSWORD - used to encrypt the dump with a password. With minor alterations to the script a pubkey could be used instead of a password. I chose a password because laziness. I mean seriously, I skipped an entire openssl command here.
If you run this script on something that is not ephemeral, you’ll want to update the script to remove ${TMP_OUT}
since that’s an unencrypted copy of a supposedly important database.
#!/bin/bash
set -e
TMP_OUT=/tmp/dump.sql
Help() {
echo "Takes a backup of a postgresql database."
echo
echo "PGPASSWORD should be set via environment variable"
echo "ENCRYPT_PASSWORD should be set via environment variable"
echo
echo "syntax ./backup.sh -H postgres_host:port -d database_name -u database_user -o /path/to/output/directory/"
echo
echo args:
echo "h Displays this help message."
echo "H The host:port at which to reach the postgres server."
echo "d The database to backup."
echo "u The database user to user. ENCRYPT_PASSWORD must be set."
echo "o The directory to write the backup to. Backup file will be named in YYYY-MM-DD-database_name.sql.gz.enc"
}
Die() {
echo >&2 "$@"
echo
Help
echo
exit 1
}
ParseArgs() {
while getopts "h:H:d:u:o:" option; do
case $option in
h) # display Help
Help
exit;;
H)
DB_HOST=${OPTARG};;
d)
DB=${OPTARG};;
u)
DB_USER=${OPTARG};;
o)
OUT_DIR=${OPTARG};;
\?) # Invalid option
echo "Error: Invalid option"
Help
exit;;
esac
done
}
ValidateArgs() {
! [[ -z ${DB_HOST} ]] || Die "missing required arg -H"
! [[ -z ${DB} ]] || Die "missing required arg -d"
! [[ -z ${DB_USER} ]] || Die "missing required arg -u"
! [[ -z ${OUT_DIR} ]] || Die "missing required arg -o"
! [[ -z ${PGPASSWORD} ]] || Die "PGPASSWORD env var must be set"
! [[ -z ${ENCRYPT_PASSWORD} ]] || Die "ENCRYPT_PASSWORD env var must be set"
}
DumpDB() {
pg_dump -U ${DB_USER} -h ${DB_HOST} ${DB} > ${TMP_OUT}
}
EncryptDump() {
mkdir -p ${OUT_DIR}/${DB}
gzip -c ${TMP_OUT} \
| openssl enc -pbkdf2 \
-salt \
-out ${OUT_DIR}/${DB}/$(date +%F)-${DB}.sql.gz.enc \
-pass pass:${ENCRYPT_PASSWORD}
}
ParseArgs $@
ValidateArgs
DumpDB
EncryptDump
The Image
The Dockerfile here is very simple.
FROM docker.io/bitnami/postgresql:16.2.0-debian-12-r8
COPY backup.sh /backup.sh
The CronJob Definition
Below is a CronJob used with an exampled application named “important-application-here”.
Several small pieces are left as an exercise for the reader.
- The image is left as
image: <Where-You-Host-Your-Image>:latest
. You’ll want to update that with the location of the image generated from the Dockerfile above. - For my purposes the volume
important-application-here-postgres-backup-nfs-claim
is assumed to already exist. I’ve used NFS mounts to mount my DB backup share on my NAS. - Creating secrets will need to be done.
You may also want to update the schedule. Having many services run their backups at the same time could cause an issue, but really that’s a problem for another day.
apiVersion: batch/v1
kind: CronJob
metadata:
name: important-application-here-backup
namespace: important-application-here
spec:
schedule: "0 0 * * 0"
jobTemplate:
spec:
template:
spec:
imagePullSecrets:
- name: regcred
containers:
- name: postgresbackup
image: <Where-You-Host-Your-Image>:latest
imagePullPolicy: Always
env:
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: important-application-here-postgresql
key: password
- name: ENCRYPT_PASSWORD
valueFrom:
secretKeyRef:
name: backup-encrypt
key: ENCRYPT_PASSWORD
command:
- /backup.sh
args:
- -H
- important-application-here-postgresql.important-application-namespace.svc.cluster.local
- -d important-application-here
- -u important-application-here
- -o
- /backups
volumeMounts:
- name: backup-dir
mountPath: /backups
restartPolicy: OnFailure
volumes:
- name: backup-dir
persistentVolumeClaim:
claimName: important-application-here-postgres-backup-nfs-claim
Verification
Once the CronJob has been deployed, you’ll want to verify it works. No one wants to wait up to a week to see if their yaml skills are up to par, so create a manual run.
# starts a job in the important-application-here namespace
kubectl create job --from=cronjob/important-application-here-backup manual-backup -n important-application-here
Verify that your backup shows up in the expected location, and that is has the expected contents! I mounted the backup NFS on /mnt/db_backups
cd /mnt/db_backups
openssl enc -d -pbkdf2 -in 2024-03-24-name-of-db-here.sql.gz.enc -k TopSecretPasswordHere | gzip -d
A smart person would also verify that the service can be restored from backup. That process depends on the application being backed up. Don’t wait until you need the backup to verify your recovery process works. If you don’t verify your recovery process, you do not have a recovery process.