Smart healthchecks with Kubernetes and Spring Boot Actuator

SERIES

Smart healthchecks with Kubernetes and Spring Boot Actuator
Health based traffic control with Kubernetes

I’ve seen quite some projects in the past using various orchestration tools for deploying applications. Probably the most popular one nowadays is Kubernetes (K8S). Even though these tools have such a vast amount of functionality to help applications to run in a scalable and resilient manner, I keep noticing engineers are not utilizing the features they have.

One example I often see is missing or misused healthchecks. The world would be a great place if the orchestration tool could just figure out whether the application is healthy and take the necessary actions if it’s not. Fortunately we are writing 2020 when such tools are already available (they were available long before :)).

Today I’m going to focus on Kubernetes and show you how to set up proper healthchecks to monitor a Spring Boot application that has Actuator set up.

Healthcheck in Kubernetes

Let’s begin with a little bit of introduction into the healthcheck mechanism of Kubernetes.

The probe actions

All the healthchecks are managed by so called “probes” in the K8S ecosystem. Imagine the probe as a process that periodically does something to determine the health of the application. There are 3 actions a probe can do.

Executing a command

Just very briefly covering this. You can execute a command or list of commands. If the return value of the expression is 0, the application is considered healthy. If it’s other than 0, it’s unhealthy and needs action.

Opening a TCP socket

With this type of probe, Kubernetes will attempt to open a TCP socket on a specified port. If the socket is created successfully, the container is considered healthy. In any other case if the socket creation failed, the state is unhealthy.

Executing an HTTP GET

This one is the most sophisticated one. The system will execute an HTTP GET request against a specific endpoint. If the API is returning a status code between 200 – 399, it is considered healthy. If it is any other status code or the request could not be executed, the container is unhealthy. You can also provide some custom headers that needs to be passed with the healtcheck request in case you have a special case.

Liveness probe

There are 3 different types of probes Kubernetes is providing. Each one is suitable for a different use-case.

Liveness probe
Readiness probe
Startup probe

In this article, I’m going to cover only the first one – liveness probe.

The purpose of this type is to detect when an application gets into a state it cannot recover from. Imagine a container running for days/weeks and suddenly it stops serving requests. The only way to resolve the problem is to restart it. Of course I know there must be an underlying issue in the application that needs resolution but for now let’s not go into that direction.

As soon as the liveness probe detects the application is not passing the healthcheck, it will initiate a container restart on the pod. Note, in this case the pod itself will not be restarted but the underlying container that is unhealthy.

There are 5 configuration parameters for a probe:

initialDelaySeconds
- The number of seconds to wait until the probe is initiated after the container start. Useful if you know your app is taking at least 10 seconds to start then simply set this to 10 so the liveness probe won’t count the startup as failure.
periodSeconds
- Defines how often the probe performs the healthcheck, in seconds.
timeoutSeconds
- Determines after how much time the probe times out, in seconds. If you think about executing an HTTP GET request, if the response is not received (the application is slow) in for example 1 second (if that’s the configured timeout). The probe is considering it as a failure.
successThreshold
- The minimum number of consecutive healthcheck successes before the container is considered healthy after being unhealthy.
failureThreshold
- The maximum number of consecutive healthcheck failures before the container is considered unhealthy and being restarted.

An example TCP socket based liveness probe configuration looks the following, just to give you a feel:

apiVersion: v1
kind: Pod
metadata:
  name: goproxy
  labels:
    app: goproxy
spec:
  containers:
  - name: goproxy
    image: k8s.gcr.io/goproxy:0.1
    ports:
    - containerPort: 8080
    livenessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 20

Spring Boot Actuator health API

Alright, we’ve covered the basic idea of the probes in Kubernetes, let’s look at Spring Boot Actuator.

Spring Boot Actuator is an extension module for Spring Boot to monitor and manage the application through JMX or HTTP. The marketing slogan is: enhancing the application with production-ready features. There are lots and lots of features available in the module, I’m not going to cover all of those in the article but if you are interested, you can find more info here.

There is one interesting feature though, the health API. There is a single HTTP endpoint you can call /actuator/health. The default behavior is simple, if the application is healthy, it responds with HTTP 200 and the following JSON:

{
    status: "UP"
}

If the application is unhealthy, it will respond with HTTP 503 and the following JSON:

{
    status: "DOWN"
}

Customizing the health indicator

I don’t want to go into deep details how Actuator works under the hood but there is an interface called HealthIndicator that contributes to the overall system health. There are more than a dozen of them already auto-configured for you. An example is DiskSpaceHealthIndicator that checks for low disk space.

Writing a custom HealthIndicator is quite easy. Simply create a new class that implements the HealthIndicator interface and mark it as a @Component. Spring will pick it up automatically.

For the sake of the testing, I’ll show you a very simple HealthIndicator that can be switched UP/DOWN with a simple HTTP call.

I’m starting off from a generated project on start.spring.io. Gradle one with Actuator and Web dependencies. So, as a first step, let’s create a Spring bean that will hold the health state (healthy/unhealthy):

@Component
public class ManualHealthHolder {
    private AtomicBoolean healthy = new AtomicBoolean(true);

    public void switchHealth() {
        healthy.set(!healthy.get());
    }

    public boolean isHealthy() {
        return healthy.get();
    }
}

Nothing special, just a state holder class with a single boolean value that represents the health of the system.

The HealthIndicator is also very simple:

@Component
public class ManualHealthIndicator implements HealthIndicator {
    @Autowired
    private ManualHealthHolder manualHealthHolder;

    @Override
    public Health health() {
        boolean healthy = manualHealthHolder.isHealthy();
        if (healthy) {
            return Health.up().build();
        }
        return Health.down().build();
    }
}

There is a single method on the HealthIndicator interface that needs to be implemented HealthIndicator#health. You can do very complicated things like introducing new health states to the system but we’ll go with the existing UP and DOWN states. In this particular example, deciding the health is based on the ManualHealthHolder bean. If it says healthy, the state will be UP. If it says unhealthy, the state will be DOWN.

The next and last step is to create an HTTP endpoint for changing the state.

@RestController
public class ManualHealthRestController {
    @Autowired
    private ManualHealthHolder manualHealthHolder;

    @GetMapping("/switch")
    public ResponseEntity<?> switchHealth() {
        manualHealthHolder.switchHealth();
        return new ResponseEntity<>(HttpStatus.OK);
    }
}

Very minimal again. There is a single HTTP GET mapping for switching the statuses: /status.

Testing time, starting up the application with ./gradlew clean build bootRun.

If you cURL localhost:8080/actuator/health, you’ll get the UP response (I’ve used jq here to format the response nicely).

$ curl localhost:8080/actuator/health | jq
{
  "status": "UP"
}

To simulate the downtime of the application, we can call the localhost:8080/switch API. It will switch the healthy flag internally and now querying the /actuator/health endpoint, you’ll get the DOWN state.

$ curl localhost:8080/actuator/health | jq
{
  "status": "DOWN"
}

Liveness probe with Actuator

Now that we know the building blocks, let’s go on with integrating Actuator and Kubernetes together. Kubernetes is working with Docker containers so we need to create a container from the Spring application. A simple Dockerfile looks the following:

FROM openjdk:8-jdk-alpine
RUN apk add --no-cache --upgrade bash
RUN apk add --no-cache --upgrade curl
COPY build/libs/actuator-healtcheck-example-0.0.1-SNAPSHOT.jar app.jar
ENTRYPOINT ["java","-jar","/app.jar"]

Alright, now after executing

$ ./gradlew clean build

we can also execute the

$ docker build . -t actuator-healthcheck-example

command to build the docker image.

I’m using minikube here for testing so let me add a few more steps to properly create the image so we are able to deploy it to the actual Kubernetes cluster.

Before creating the image, you should execute the following command to change your docker context to the Kubernetes cluster.

$ eval $(minikube docker-env)

Now that you have the docker context set up, execute the command to build the docker image. You can verify with the docker images command whether the image was successfully created.

$ docker images
REPOSITORY                                TAG                 IMAGE ID            CREATED             SIZE
actuator-healthcheck-example              latest              6db0841a7102        5 seconds ago       124MB

The next point we’re looking at next is the deployment to the cluster. The initial deployment file looks the following:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: actuator-healthcheck-example
  labels:
    app: actuator-healthcheck-example
spec:
  replicas: 1
  selector:
    matchLabels:
      app: actuator-healthcheck-example
  template:
    metadata:
      labels:
        app: actuator-healthcheck-example
    spec:
      containers:
        - name: actuator-healthcheck-example
          image: actuator-healthcheck-example:latest
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 8080

It’s a very basic deployment descriptor, one of the important points is to set the imagePullPolicy to IfNotPresent or Never so Kubernetes will not try to download the docker image.

Adding the liveness probe:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: actuator-healthcheck-example
  labels:
    app: actuator-healthcheck-example
spec:
  replicas: 1
  selector:
    matchLabels:
      app: actuator-healthcheck-example
  template:
    metadata:
      labels:
        app: actuator-healthcheck-example
    spec:
      containers:
        - name: actuator-healthcheck-example
          image: actuator-healthcheck-example:latest
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 8080
          livenessProbe:
            httpGet:
              path: /actuator/health
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
            failureThreshold: 2

So the trick here is to use the httpGet action on the probe and bind it to the /actuator/health endpoint. As I said earlier, in case of the httpGet action, the probe will consider the application healthy when the status code is between 200 and 399. Guess what, the /actuator/health API is fulfilling that contract. In case the application is reporting a healthy state, it will respond with 200 and 503 when it’s down.

The rest of the configuration is just telling Kubernetes to wait 5 seconds before the probe is initiated. Also, each 10 seconds execute the GET request against the endpoint to check for the health. And consider the container unhealthy if 2 consecutive healthchecks have failed.

Putting it all together

That’s it. Testing time. Now if you’ve read it this far, I assume the docker image is already build so we’re going from there.

The only thing we need to do is to deploy the application. With a little kubectl command you can do it:

$ kubectl apply -f k8s-deployment.yaml

The output should be:

deployment.apps/actuator-healthcheck-example created

So now if we take a look on the pods we have:

$ kubectl get pods
NAME                                            READY   STATUS    RESTARTS   AGE
actuator-healthcheck-example-6bf74bd94c-4xmvb   1/1     Running   0          17s

Everything looks good. The next phase of the testing is to flip the healthy flag in the application so we can see that the liveness probe is controlling to have the container in a healthy state.

Accessing the switch API for testing

There are 2 options for this. One is to open a terminal inside the container so we can locally trigger the /switch endpoint. The other one is to proxy the pod traffic to the local machine.

To get a shell inside the container, execute the following command (of course change the pod name to yours):

$ kubectl exec -it actuator-healthcheck-example-6bf74bd94c-t5h2f -- bash

From then on, executing

bash-4.4# curl localhost:8080/switch

will switch the flag. If you query the /actuator/health API the same way, it’s going to say DOWN.

The other option to trigger the /switch API is to forward requests from your local machine directly to the pod with the use of kubectl. However it needs some preparation so the pod is accessible. We need to expose the pod’s port as a Service. To do that, let’s extend the descriptor we created:

apiVersion: v1
kind: Service
metadata:
  name: actuator-healthcheck-example-svc
  labels:
    app: actuator-healthcheck-example
spec:
  ports:
    - port: 8080
      targetPort: 8080
  selector:
    app: actuator-healthcheck-example

So the full k8s-deployment.yaml file looks the following:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: actuator-healthcheck-example
  labels:
    app: actuator-healthcheck-example
spec:
  replicas: 1
  selector:
    matchLabels:
      app: actuator-healthcheck-example
  template:
    metadata:
      labels:
        app: actuator-healthcheck-example
    spec:
      containers:
        - name: actuator-healthcheck-example
          image: actuator-healthcheck-example:latest
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 8080
          livenessProbe:
            httpGet:
              path: /actuator/health
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
            failureThreshold: 2
---
apiVersion: v1
kind: Service
metadata:
  name: actuator-healthcheck-example-svc
  labels:
    app: actuator-healthcheck-example
spec:
  ports:
    - port: 8080
      targetPort: 8080
  selector:
    app: actuator-healthcheck-example

Now that everything is in place, redeploy the stack with

$ kubectl apply -f k8s-deployment.yaml

To access the API, only a port-forward is needed:

$ kubectl port-forward actuator-healthcheck-example-6bf74bd94c-t5h2f 9876:8080

The command binds the local 9876 port to the pod’s 8080 port. So from now on, you can access the API from your local machine through localhost:9876.

$ curl localhost:9876/switch

Observing the liveness probe

The application is deployed. We can access the API. Everything is ready to see the liveness probe in action. First of all, let’s verify that the pod is alive and the /actuator/health API is returning the UP status.

$ kubectl get pods
NAME                                            READY   STATUS    RESTARTS   AGE
actuator-healthcheck-example-6bf74bd94c-l6b9j   1/1     Running   0          15m

$ curl localhost:9876/actuator/health
{"status":"UP"}

Looks good so far. Switching the health with /switch.

$ curl localhost:9876/switch

$ kubectl get pods
NAME                                            READY   STATUS    RESTARTS   AGE
actuator-healthcheck-example-6bf74bd94c-l6b9j   1/1     Running   0          16m

$ curl localhost:9876/actuator/health
{"status":"DOWN"}

From the pod perspective, everything looks good however Actuator is saying the service is DOWN. Observing the pod events will clearly indicate that there was in fact a container restart because of it.

$ kubectl describe pod actuator-healthcheck-example-7dcdd4dd48-97qxm

Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  65s                default-scheduler  Successfully assigned default/actuator-healthcheck-example-7dcdd4dd48-97qxm to minikube
  Normal   Pulled     24s (x2 over 64s)  kubelet, minikube  Container image "actuator-healthcheck-example:latest" already present on machine
  Warning  Unhealthy  24s (x2 over 34s)  kubelet, minikube  Liveness probe failed: HTTP probe failed with statuscode: 503
  Normal   Killing    24s                kubelet, minikube  Container actuator-healthcheck-example failed liveness probe, will be restarted
  Normal   Created    23s (x2 over 64s)  kubelet, minikube  Created container actuator-healthcheck-example
  Normal   Started    23s (x2 over 64s)  kubelet, minikube  Started container actuator-healthcheck-example

You can see the message Liveness probe failed: HTTP probe failed with statuscode: 503. And it happened 2 times so the container was considered unhealthy and have been restarted.

When the container restart is done, you can query the Actuator health and it will respond with UP status as the container has been restarted.

$ curl localhost:9876/actuator/health
{"status":"UP"}

Conclusion

I hope you see how easy it is to set up a proper healthcheck – at least liveness – with Kubernetes and Spring Boot. It’s definitely something I recommend doing to create a more resilient system and react on problems automatically.

The code can be found on GitHub. If you liked the article, give it a thumbs up and share it. If you are interested in more, make sure you follow me on Twitter.