Avoiding Hazards with Postgres on Kubernetes
Running a database right now on Kubernetes can be a bit hazardous. In the future this will be made better by PetSets, but even then we are still going to have to worry about backing up and restoring databases every once in a while.
Edit: There's a better way now using Patroni. I'll write that up soon.
I decided to play with using PGHoard with Init Containers to restore, and a helper container in the same pod to backup. PGHoard is nice as it supports a wide variety of cloud providers for backup targets, and it doesn't require you to use S3 compatibility mode for all of them.
Configs
To start I made the config that I wanted to run PGHoard with and saved that into a ConfigMap.
apiVersion: v1 data: pghoard.json: | { "backup_location": "/tmp", "backup_sites": { "default": { "active_backup_mode": "pg_receivexlog", "pg_data_directory": "/var/lib/postgresql/data/pgdata", "nodes": [ { "host": "127.0.0.1", "password": "hoard_pass", "port": "5432", "user": "pghoard" } ], "object_storage": { "storage_type": "google", "project_id": "alex-kerney", "bucket_name": "pghoard-test", "credential_file": "/google_key/pghoard-test" } } } } kind: ConfigMap metadata: creationTimestamp: null name: pghoard-config
Here I am using Google Cloud Storage as my storage service with a bucket named pghoard-test
and associated Google Cloud credential file. As this will run in the same pod as my Postgres container, it should be set to connect to localhost. We will mount this config into our init container and our backup container.
Seeing that we've specified that we have a Google credential file, let's make a secret for it. This is for when your key file is named pghoard-test
and it is sitting in your local directory google_key
.
> kubectl create secret generic pghoard-google --from-file=google_key
Deployment
Now time to pull out the big guns and create our Deployment.
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: pghoard-test labels: service: pghoard spec: replicas: 1 template: metadata: name: pghoard-test labels: service: pghoard annotations: pod.beta.kubernetes.io/init-containers: '[ { "name": "restore", "image": "abkfenris/postgis-pghoard:9.5-1.4.0" "command": ["/bin/sh", "-c"], "args": ["gosu postgres pghoard_restore get-basebackup --config /pghoard/pghoard.json --target-dir /var/lib/postgresql/data/pgdata --restore-to-master; sleep 1"], "volumeMounts": [ { "name": "pg-data", "mountPath": "/var/lib/postgresql/data/" }, { "name": "google-key", "mountPath": "/google_key", "readOnly": true }, { "name": "pghoard-config", "mountPath": "/pghoard", "readOnly": true } ] } ]' spec: volumes: - name: pg-data emptyDir: {} - name: google-key secret: secretName: pghoard-google - name: pghoard-config configMap: name: pghoard-config - name: postgres-home emptyDir: {} containers: - image: "abkfenris/postgis-pghoard:9.5-1.4.0" name: postgres lifecycle: preStop: exec: command: ["/bin/sh", "-c", "gosu postgres psql -c 'SELECT pg_switch_xlog();' && gosu postgres pg_ctl -d /var/lib/postgresql/data/pgdata -m fast -w stop"] volumeMounts: - mountPath: "/var/lib/postgresql/data/" name: pg-data env: - name: PGDATA value: "/var/lib/postgresql/data/pgdata" - name: PGHOARD_USER value: pghoard - name: PGHOARD_PASS value: hoard_pass - image: "abkfenris/postgis-pghoard:9.5-1.4.0" name: backup volumeMounts: - name: google-key mountPath: /google_key - name: pghoard-config mountPath: /pghoard - name: pg-data mountPath: "/var/lib/postgresql/data/" - name: postgres-home mountPath: /home/postgres/ command: ['gosu', 'postgres', 'pghoard', '--config', '/pghoard/pghoard.json']
Ok, lets break this down.
First we will have our usual Deployment preamble of API version, kind, name, metadata, labels.
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: pghoard-test labels: service: pghoard
Init Container
Then we get into a section of metadata that isn't as common, annotations. Here is where we are going to store our Init Container info in the key pod.beta.kubernetes.io/init-containers
as a string of JSON.
annotations: pod.beta.kubernetes.io/init-containers: '[ { "name": "restore", "image": "abkfenris/postgis-pghoard:9.5-1.4.0" "command": ["/bin/sh", "-c"], "args": ["gosu postgres pghoard_restore get-basebackup --config /pghoard/pghoard.json --target-dir /var/lib/postgresql/data/pgdata --restore-to-master; sleep 1"], "volumeMounts": [ { "name": "pg-data", "mountPath": "/var/lib/postgresql/data/" }, { "name": "google-key", "mountPath": "/google_key", "readOnly": true }, { "name": "pghoard-config", "mountPath": "/pghoard", "readOnly": true } ] } ]'
While init containers are in a list, just like normal containers in a pod, they are executed sequentially, instead of being run at the same time. Here I'm using the image abkfenris/postgis-pghoard
tag 9.5-1.4.0
.
Then I give PGHoard it's commands to run.
- We start with the command
/bin/sh -c
so that we can string together multiple commands as arguments. Kubernetes doesn't natively understand all of the normal shell keys to combine commands on its own. - First we use
gosu
to work as thepostgres
user as the data needs to be accessible to the postgres server. - The
pghoard_restore
command is going to get the most recent basebackup of the server to rebuild the server directory. -
--config
is our PGHoard JSON that we stored in a configmap. -
--target-dir
unsuprisingly points to where we are storing our postgres data. Postgres doesn't like for it's data folder to be the one that Docker or Kubernetes is directly mounting (see PGDATA which we will set later). -
--restore-to-master
If we don't set this it will bring up our system as a standby. - After our restore command, we follow it with
sleep 1
. Using;
means thatsleep
will run no matter whatpghoard_restore
exits with. This way if there is an error it won't cause Kubernetes to crash the pod, and instead it can progress to the usual Postgres initialization. There is probably a better way of doing this without catching all errors, and only continuing when there is no base-backup available, and not when other errors occur.
Volumes
We have a handful of volume mounts here, so let's look at that section of our spec
.
volumes: - name: pg-data emptyDir: {} - name: google-key secret: secretName: pghoard-google - name: pghoard-config configMap: name: pghoard-config - name: postgres-home emptyDir: {}
Init containers will pull volume information from the regular spec.
-
pg-data
is going to be an on-host empty directory. This allows it to be changed by any container in the pod, and last for the life of the pod, but as soon as the pod goes away, so doespg-data
. -
google-key
is our credentials for Google that we are pulling from the secret we created earlier. -
pghoard-config
contains our PGHoard JSON file that we created as a ConfigMap. - Finally we have
postgres-home
as PGHoard likes to store some temporary files in the home directory of whatever user it is running as.
I'm mounting both the ConfigMap and Secret at the root level, it's only the pg-data
directory that is being mounted elsewhere. In both the init container and our later serving container, we are mounting it at /var/lib/postgresql/data/
but storing the data one level deeper in /var/lib/postgresql/data/pgdata/
so that Postgres can have all the control it wants.
What happens in a restore?
When the restore
init container runs, it will first find the available base-backups, then download it to the target-directory.
Found 2 applicable basebackups Basebackup Backup size Orig size Start time ---------------------------------------- ----------- ----------- -------------------- default/basebackup/2016-07-11_0 7 MB 36 MB 2016-07-11T01:06:53Z metadata: {'compression-algorithm': 'snappy', 'compression-level': '0', 'pg-version': '90503', 'start-wal-segment': '000000010000000000000003'} default/basebackup/2016-10-20_0 7 MB 36 MB 2016-10-20T23:33:23Z metadata: {'compression-algorithm': 'snappy', 'compression-level': '0', 'pg-version': '90504', 'start-wal-segment': '000000030000000000000006'} Selecting 'default/basebackup/2016-10-20_0' for restore Download progress: 100.00% Basebackup restoration complete. You can start PostgreSQL by running pg_ctl -D /var/lib/postgresql/data/pgdata start On systemd based systems you can run systemctl start postgresql On SYSV Init based systems you can run /etc/init.d/postgresql start
First time? Let's get you settled in.
What happens when there are no base-backups available? pghoard_restore
errors, but then the error is squelched by sleep 1
and the regular containers in the pod are allowed to boot.
When postgres
boots and doesn't find a valid database, it will setup a new database in the specified PGDATA
directory. Along the way it will run any shell or sql files in /docker-entrypoint-initdb.d/
. In there there is a file that helps configure our new database to be connected to by PGHoard.
A few changes need to occur to postgresql.conf
.
wal_level = archive max_wal_senders = 4 archive_timeout = 300
While PGHoard can be used with Postgres' archive command, it can also control it's own destiny, and connect to Postgres itself. It needs at least 2 WAL senders available to do so. We also are setting our timeout time, so we don't wait for our WAL files to reach full size, so that our data gets backed up at least every 5 minutes.
The other changed file is pg_hba.conf
to allow the PGHOARD_USER
to connect.
host replication '$PGHOARD_USER' '127.0.0.1/32' md5
Finally our startup script adds the PGHOARD_USER
and gives it the privilege to replicate data.
CREATE USER "$PGHOARD_USER" WITH PASSWORD '$PGHOARD_PASS' REPLICATION;
Thankfully as long as you've got PGHOARD_USER
and PGHOARD_PASS
defined (and they match the JSON config), then none of that should really matter.
Postgres
Now at this point our postgres
container looks rather simple. We are just pulling the same image contains both Postgres and PGHoard, mounting the volume, and setting an environment variable to tell Postgres where to store it's data, as we wish it to be one level deeper than normal.
containers: - image: "abkfenris/postgis-pghoard:9.5-1.4.0" name: postgres lifecycle: preStop: exec: command: ["/bin/sh", "-c", "gosu postgres psql -c 'SELECT pg_switch_xlog();' && gosu postgres pg_ctl -d /var/lib/postgresql/data/pgdata -m fast -w stop"] volumeMounts: - mountPath: "/var/lib/postgresql/data/" name: pg-data env: - name: PGDATA value: "/var/lib/postgresql/data/pgdata" - name: PGHOARD_USER value: pghoard - name: PGHOARD_PASS value: hoard_pass
Restoring WAL
While it's tempting to use the standard Docker Postgres image as our server, it isn't quite enough. When the init container finishes, it will have restored a full base-backup (basically a compressed version of the almost entire PGDATA directory), but none of the Write Ahead Log will be included. The Write Ahead Log is important as it gives us the changes since the base-backup was taken. The Write Ahead Log (WAL) allows the changes to be replayed.
PGHoard adds a recovery.conf to the PGDATA directory that specifies an archive_command
to load WAL chunks from the PGHoard server which we will run in another container. Postgres will run the restore_command
until PGHoard signals that there are no more WAL files to download and replay. Then if the server is again in a consistent state, then it will start.
That will look a lot like:
LOG: database system was interrupted; last known up at 2016-10-20 23:33:23 UTC FATAL: the database system is starting up /usr/local/bin/pghoard_postgres_command: ERROR: '00000004.history' not found from archive LOG: starting archive recovery /usr/local/bin/pghoard_postgres_command: ERROR: '00000003.history' not found from archive LOG: restored log file "000000030000000000000006" from archive LOG: redo starts at 0/6000098 LOG: consistent recovery state reached at 0/60000C0 LOG: restored log file "000000030000000000000007" from archive LOG: record with incorrect prev-link 0/6000108 at 0/7000028 LOG: redo done at 0/60000C0 LOG: restored log file "000000030000000000000006" from archive /usr/local/bin/pghoard_postgres_command: ERROR: '00000004.history' not found from archive LOG: selected new timeline ID: 4 FATAL: the database system is starting up /usr/local/bin/pghoard_postgres_command: ERROR: '00000003.history' not found from archive LOG: archive recovery complete LOG: MultiXact member wraparound protections are now enabled LOG: database system is ready to accept connections LOG: autovacuum launcher started
Notice there aren't any ports exposed, which you will probably need for the application you are working with.
Hooking in
We are also adding a lifecycle hook command on pod shutdown. command: ["/bin/sh", "-c", "gosu postgres psql -c 'SELECT pg_switch_xlog();' && gosu postgres pg_ctl -d /var/lib/postgresql/data/pgdata -m fast -w stop"]
This command immediately creates a new WAL file, allowing PGHoard to start backing up. Then it tries to cleanly, but quickly shutdown the server. Upon server shutdown it generates a final WAL segment.
PGHoard
The final part is our backup container.
- image: "abkfenris/postgis-pghoard:9.5-1.4.0" name: backup volumeMounts: - name: google-key mountPath: /google_key - name: pghoard-config mountPath: /pghoard - name: pg-data mountPath: "/var/lib/postgresql/data/" - name: postgres-home mountPath: /home/postgres/ command: ['gosu', 'postgres', 'pghoard', '--config', '/pghoard/pghoard.json']
Our backup
container will run at the same time, in the same instruction space, and same mounted directories as our postgres
server container as they share a pod.
Again we mount the same volumes as our init container, but we are additionally mounting the home directory for the postgres
user. Here we launch the base pghoard
command as the postgres
user.
Recover and start backing up
With the config we defined earlier it will try to connect to the server to try to pull backups, and be available at 16000 to serve WAL files as the server recovers.
2016-10-20 23:44:55,747 TransferAgent Thread-9 WARNING 'default/timeline/00000006.history' not found from storage 2016-10-20 23:44:55,748 TransferAgent Thread-9 INFO 'DOWNLOAD' FAILED transfer of key: 'default/timeline/00000006.history', size: 0, took 0.409s INFO 'DOWNLOAD' transfer of key: 'default/xlog/000000030000000000000006', size: 797320, took 0.212s 2016-10-20 23:44:56,759 TransferAgent Thread-11 INFO 'DOWNLOAD' transfer of key: 'default/xlog/00000003000000000000000A', size: 797297, took 0.249s 2016-10-20 23:44:56,792 TransferAgent Thread-8 INFO 'DOWNLOAD' transfer of key: 'default/xlog/000000030000000000000008', size: 797296, took 0.284s 2016-10-20 23:44:56,820 TransferAgent Thread-10 INFO 'DOWNLOAD' transfer of key: 'default/xlog/000000030000000000000009', size: 797297, took 0.311s 2016-10-20 23:44:57,076 TransferAgent Thread-12 INFO 'DOWNLOAD' transfer of key: 'default/xlog/000000030000000000000007', size: 797296, took 0.568s 127.0.0.1 - - [20/Oct/2016 23:44:57] "GET /default/archive/000000030000000000000006 HTTP/1.1" 201 - 127.0.0.1 - - [20/Oct/2016 23:44:58] "GET /default/archive/000000030000000000000007 HTTP/1.1" 201 - 2016-10-20 23:44:59,081 TransferAgent Thread-11 INFO 'DOWNLOAD' transfer of key: 'default/xlog/000000030000000000000006', size: 797320, took 0.359s 2016-10-20 23:44:59,108 TransferAgent Thread-9 INFO 'DOWNLOAD' transfer of key: 'default/xlog/000000030000000000000007', size: 797296, took 0.386s 127.0.0.1 - - [20/Oct/2016 23:44:59] "GET /default/archive/000000030000000000000006 HTTP/1.1" 201 - 2016-10-20 23:44:59,970 TransferAgent Thread-12 WARNING 'default/timeline/00000004.history' not found from storage ...
In addition to the WAL files found in default/xlog there are often warnings about .history files. From what I've seen it's they aren't going to cause everything to fall apart. The big issue is is there are WAL segments missing.
Conclusion
There are a few more things that you might want to think about before deploying it. What PG_USER
and PG_PASS
values you wish to use. pg_isready
for a readiness probe.
This isn't quite a high availability deployment yet. Tools like stolon and Patroni will be ready to go soon. pglogical and Postgres-BDR are both also under development.
You can find the Dockerfiles for my blend of PostgreSQL, PostGIS, and PGHoard on Github.
Update - Getting things into a container
If you've gone and pg_dump
ed out an existing database, and you want to use psql
to restore it into your newly setup and backing up databese, then you can use kubectl exec postgres-pghoard-pod -c postgres -i -- /bin/bash -c 'gosu postgres psql' < dump.sql
to get your local dump.sql
in there.
You also could use kubectl port-forward
but what's the fun in that.