Debugging Kubernetes
Had my first big hiccup running Kubernetes on Google Container Engine(GKE) yesterday.
I have been using Deployments specifying a single replica as a sudo PetSet for a few databases as PetSets are still in alpha.
PetSets are Kubernete's solution to services, where you care about state during cluster changes. While PetSets have now gone alpha in 1.4.0, hosts like GKE can chose not to support alpha features.
Still I have been keeping Kubernetes up to date as soon a possible, as GKE's upgrades are largely painless. This time around I got caught by a race condition when the nodes upgraded.
The first sign of a issue was when I checked that everything was back happy after the nodes upgraded.
> kubectl get pods | grep -v 'Running' # -v for invert results NAME READY STATUS RESTARTS AGE sentry-postgres-1257884... 0/1 ContainerCreating 0 16m underground-postgres-16... 0/1 ContainerCreating 0 16m
Uh oh.
What does the deployment thing about things?
kubectl get deployment underground-postgres NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE underground-postgres 1 1 1 0 120d
At least the deployment knows that something is wrong too, but it doesn't seem to be taking care of the issue as usual.
Lets see whats up with one of those pods.
> kubectl describe pod sentry-postgres-1257884276-5u5tk Name: sentry-postgres-1257884276-5u5tk Namespace: default Node: gke-k8s-node Start Time: Fri, 30 Sep 2016 17:39:58 -0400 #...trimmed here and there...# Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: sentry-postgres-persistent-storage: Type: GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine) PDName: sentry-postgres-disk FSType: ext4 Partition: 0 ReadOnly: false Events: Seen Type Reason Message ---- ------- ------ ------------- 21m Warning FailedScheduling no nodes available to schedule pods 21m Warning FailedScheduling pod (sentry-postgres-1257884276-5u5tk) failed to fit in any node fit failure on node (gke-k8s-node): PodToleratesNodeTaints 20m Normal Scheduled Successfully assigned sentry-postgres-1257884276-5u5tk to gke-k8s-node 19m Warning FailedMount Failed to attach volume "sentry-postgres-persistent-storage" on node "gke-k8s-default-pool" with: googleapi: Error 404: The resource 'projects/alex-kerney/...details../instances/gke-k8s-node' was not found 19m Warning FailedMount Failed to attach volume "sentry-postgres-persistent-storage" on node "gke-k8s-node" with: error getting instance "gke-k8s-node" 13m Warning FailedMount Unable to mount volumes for pod "sentry-postgres-1257884276-5u5tk_default(499364b1-8756-11e6-b0fe-42010af00052)": timeout expired waiting for volumes to attach/mount for pod "sentry-postgres-1257884276-5u5tk"/"default". list of unattached/unmounted volumes=[sentry-postgres-persistent-storage] 13m Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "sentry-postgres-1257884276-5u5tk"/"default". list of unattached/unmounted volumes=[sentry-postgres-persistent-storage]
Trimmed and reformatted some for viewing sanity.
The first couple events were while the nodes were being shuffled around during the upgrade process, but then it gets successfully scheduled onto a node gke-k8s-node
.
The FailedMount
and FailedSync
events are the issue here.
As Kubernetes marked nodes to be drained and their pods shut down, the Replication Controller that the Deployment creates tries to schedule a new pod. In current versions, if there is a disk mounted, it it doesn't always get removed from the shut down pod before the new one tries to mount it.
I found a couple of issues about this. It also looks like a solution is coming in 1.4.1, but I have things broken now.
My first attempts to take care of this were to delete the problematic pods.
> kubectl delete pod sentry-postgres-1257884276-5u5tk pod "sentry-postgres-1257884276-5u5tk" deleted > kubectl get pods | grep -v 'Running' NAME READY STATUS RESTARTS AGE sentry-postgres-125788427... 0/1 Terminating 0 23m sentry-postgres-125788427... 0/1 Pending 0 3s underground-postgres-1659... 0/1 ContainerCreating 0 23m
Umm, wait a bit...
> kubectl get pods | grep -v 'Running' NAME READY STATUS RESTARTS AGE sentry-postgres-12578842... 0/1 ContainerCreating 0 17s underground-postgres-165... 0/1 ContainerCreating 0 23m
Bummer. It the Replication Controller would still try to create a new pod before the old one was gone and the disk released.
Therefore with the problem being too many pods in existance at the same time, there is a command to change the number of pods that a Replication Controller keeps around.
Lets scale all the way down.
> kubectl scale --replicas=0 deployment/underground-postgres deployment "underground-postgres" scaled
And then back up.
> kubectl scale --replicas=1 deployment/underground-postgres deployment "underground-postgres" scaled
Then with a check to see what isn't running:
> kubectl get pods | grep -v 'Running' NAME READY STATUS RESTARTS AGE
Nada! Zip, zero pods not running! Therefore it's running (which other checks confirmed). Also giving one of the problematic pods a kick, took care of the other ones. I shuffling one mount caused the other ones to get checked.
Kubernetes has been awesome to work with.
Each small project can in their own self contained pod ecosystem, and Kubernetes can pack them on as few hosts as possible for the memory requirements. Most of the time this means that I just have a single host running, but if things get busy, both the worker pods can scale, and the hosts will scale. That way, even if several back episodes from Underground Garage get requested and have to be assembled at the same time, then everything can keep running smoothly. Once the work is done, everything gets scaled back down.
My next step with Kubernetes is to get a Let's Encrypt setup deployed. kube-cert-manager is the most promising looking option right now.
After that comes the big one. Migrating Riverflo.ws to run on Kubernetes.