Autoscaling with Kubernetes

Discovering the autoscaling aspect of Kubernetes

Martin Joo

Nov 05, 2024

∙ Paid

Introduction

In an earlier article, I introduced Kubernetes. We deployed an application with:

Laravel API
nginx
Worker
Scheduler
Vue frontend
A migration job

You can read the article here.

In this one, we’ll discover the autoscaling features of Kubernetes using this example project.

Liveness and readiness probes

The next part is crucial. As you may know, Kubernetes has self-healing properties. Meaning it kills, starts, and restarts pods if they are unhealthy. But how does it know when a pod is unhealthy? There are obvious situations, for example, when your worker process exits with code 1. But we can also configure more advanced checks, for example, restart the API if it takes more than 1s to respond.

These are called "probes" in k8s. They are basically the same as health checks in docker-compose or Swarm but much easier to write down.

It's pretty easy to confuse them so here's a quick definition:

readinessProbe asks the container: "Are you ready to work?"
On the other hand, livenessProbe asks the container: "Are you still alive?"

So readinessProbe plays a role when the container is being started, while livenessProbe is important when the container is already running.

readinessProbe indicates whether a container is ready to serve requests. It ensures that it is fully initialized before starting to receive incoming traffic. If a container fails the readinessProbe, it is temporarily removed from load balancing until it passes the probe. This allows k8s to avoid sending traffic to containers that are not yet ready or are experiencing problems.

livenessProbe is used to determine whether a container is running as expected, and if it's not, Kubernetes takes action based on the probe's configuration. It helps in detecting and recovering from failures automatically. If a container fails the livenessProbe, Kubernetes will restart the container.

These probes together help ensure the high availability of your app by making sure containers are running correctly (livenessProbe) and ready to serve requests (readinessProbe).

Let's configure liveness and readiness probes for every container!

API probes

template:
    metadata:
      labels:
        app: api
    spec:
      containers:
        - name: api
          image: martinjoo/posts-api:latest
          imagePullPolicy: Always         
          livenessProbe:
            tcpSocket:
              port: 9000
            initialDelaySeconds: 20
            periodSeconds: 30
            failureThreshold: 2
          readinessProbe:
            tcpSocket:
              port: 9000
            initialDelaySeconds: 10
            periodSeconds: 30
            failureThreshold: 1

There are 3 different types of probes:

tcpSocket
httpGet
exec

The API exposes port 9000 over TCP so in this case tcpSocket is the one we need. Both livenessProbe and readinessProbe has the same properties:

initialDelaySeconds tells k8s to wait 20 seconds before sending the request.
periodSeconds tells k8s to run the probe every 30 seconds.
failureThreshold instructs k8s to tolerate only 2 failed probes.

The following sentence comes from the Kubernetes documentation:

A common pattern for liveness probes is to use the same low-cost HTTP endpoint as for readiness probes, but with a higher failureThreshold. This ensures that the pod is observed as not-ready for some period of time before it is hard killed.

This is why I use the same config but failureThreshold is set to 2 in the livenessProbe. The initialDelaySeconds is also a bit higher.

nginx probes

livenessProbe:
  httpGet:
    path: /api/health-check
    port: 80
  initialDelaySeconds: 30
  periodSeconds: 30
  failureThreshold: 2
readinessProbe:
  httpGet:
    path: /api/health-check
    port: 80
  initialDelaySeconds: 20
  periodSeconds: 30
  failureThreshold: 1

Since nginx exposes an HTTP port we can use the httpGet type which is pretty straightforward to use. I'm using a bit higher initialDelaySeconds because nginx needs the API.

An httpProbe fails if the response is not 2xx or 3xx.

worker probes

livenessProbe:
  exec:
    command:
      - sh
      - -c
      - php artisan queue:monitor default
  initialDelaySeconds: 20
  periodSeconds: 30
  failureThreshold: 2
readinessProbe:
  exec:
    command:
      - sh
      - -c
      - artisan queue:monitor default
  initialDelaySeconds: 10
  periodSeconds: 30
  failureThreshold: 1

Workers don't expose any port, so I just run queue:monitor command with the exec probe type. You can also use an array-type notation:

command: ["sh", "-c", "php artisan queue:monitor default"]

frontend probes

livenessProbe:
  httpGet:
    path: /
    port: 80
  initialDelaySeconds: 40
  periodSeconds: 30
  failureThreshold: 2
readinessProbe:
  httpGet:
    path: /
    port: 80
  initialDelaySeconds: 30
  periodSeconds: 30
  failureThreshold: 1

timeoutSeconds

If something is wrong with some of your pods based on the livenessProbe you can see it in the events section in the output of kubectl describe pod <pod-name>

For example:

You can see that it started 21 minutes ago, but then the liveness probe failed 20 minutes ago so Kubernetes restarted the container but then the readiness probe failed.

We can check the logs of the nginx container:

The status code is 499 which is an nginx-specific code that means the client closed the connection before the server could send the response. This is because there's another option for probes called timeoutSeconds which defaults to 1. So if your container doesn't respond in 1 second the probe fails.

It's a good thing because your probe endpoint should be pretty low-cost and very fast. So if 1 second is not enough for your container to respond there's certainly a problem with it. In my case, I made this situation happen on purpose. I decreased the cluster size, then the server size, then I configured too high request limits (later) to the containers so the whole cluster is dying right now.

If something like that happens you can see it in kubectl get pods:

There is a high number of restarts everywhere. And READY 0/1 means that the container is not ready so the readiness probe failed. You can see that all of my workers failing the readiness probe.

Of course, on the DigitalOcean dashboard, you can always adjust the autoscaling settings:

Autoscaling pods

The cluster is already autoscaled in terms of nodes. However, we still have a fixed number of replicas. For example, this is the API deployment:

spec:
  replicas: 2
  containers:
    - name: api
      image: martinjoo/posts-api:latest

If you want your pods to run a dynamic number of replicas you should delete these replicas settings.

We need a new resource type called HorizontalPodAutoscaler. Each deployment you want to scale will have a dedicated HPA. This is what it looks like:

Keep reading with a 7-day free trial

Subscribe to Computer Science Simplified to keep reading this post and get 7 days of free access to the full post archives.

Computer Science Simplified