- Smart healthchecks with Kubernetes and Spring Boot Actuator
- Health based traffic control with Kubernetes
I’ve seen quite some projects in the past using various orchestration tools for deploying applications. Probably the most popular one nowadays is Kubernetes (K8S). Even though these tools have such a vast amount of functionality to help applications to run in a scalable and resilient manner, I keep noticing engineers are not utilizing the features they have.
One example I often see is missing or misused healthchecks. The world would be a great place if the orchestration tool could just figure out whether the application is healthy and take the necessary actions if it’s not. Fortunately we are writing 2020 when such tools are already available (they were available long before :)).
Today I’m going to focus on Kubernetes and show you how to set up proper healthchecks to monitor a Spring Boot application that has Actuator set up.
Healthcheck in Kubernetes
Let’s begin with a little bit of introduction into the healthcheck mechanism of Kubernetes.
The probe actions
All the healthchecks are managed by so called “probes” in the K8S ecosystem. Imagine the probe as a process that periodically does something to determine the health of the application. There are 3 actions a probe can do.
Executing a command
Just very briefly covering this. You can execute a command or list of commands. If the return value of the expression is 0, the application is considered healthy. If it’s other than 0, it’s unhealthy and needs action.
Opening a TCP socket
With this type of probe, Kubernetes will attempt to open a TCP socket on a specified port. If the socket is created successfully, the container is considered healthy. In any other case if the socket creation failed, the state is unhealthy.
Executing an HTTP GET
This one is the most sophisticated one. The system will execute an HTTP GET request against a specific endpoint. If the API is returning a status code between 200 – 399, it is considered healthy. If it is any other status code or the request could not be executed, the container is unhealthy. You can also provide some custom headers that needs to be passed with the healtcheck request in case you have a special case.
Liveness probe
There are 3 different types of probes Kubernetes is providing. Each one is suitable for a different use-case.
- Liveness probe
- Readiness probe
- Startup probe
In this article, I’m going to cover only the first one – liveness probe.
The purpose of this type is to detect when an application gets into a state it cannot recover from. Imagine a container running for days/weeks and suddenly it stops serving requests. The only way to resolve the problem is to restart it. Of course I know there must be an underlying issue in the application that needs resolution but for now let’s not go into that direction.
As soon as the liveness probe detects the application is not passing the healthcheck, it will initiate a container restart on the pod. Note, in this case the pod itself will not be restarted but the underlying container that is unhealthy.
There are 5 configuration parameters for a probe:
- initialDelaySeconds
- The number of seconds to wait until the probe is initiated after the container start. Useful if you know your app is taking at least 10 seconds to start then simply set this to 10 so the liveness probe won’t count the startup as failure.
- periodSeconds
- Defines how often the probe performs the healthcheck, in seconds.
- timeoutSeconds
- Determines after how much time the probe times out, in seconds. If you think about executing an HTTP GET request, if the response is not received (the application is slow) in for example 1 second (if that’s the configured timeout). The probe is considering it as a failure.
- successThreshold
- The minimum number of consecutive healthcheck successes before the container is considered healthy after being unhealthy.
- failureThreshold
- The maximum number of consecutive healthcheck failures before the container is considered unhealthy and being restarted.
An example TCP socket based liveness probe configuration looks the following, just to give you a feel:
apiVersion: v1 kind: Pod metadata: name: goproxy labels: app: goproxy spec: containers: - name: goproxy image: k8s.gcr.io/goproxy:0.1 ports: - containerPort: 8080 livenessProbe: tcpSocket: port: 8080 initialDelaySeconds: 15 periodSeconds: 20
Spring Boot Actuator health API
Alright, we’ve covered the basic idea of the probes in Kubernetes, let’s look at Spring Boot Actuator.
Spring Boot Actuator is an extension module for Spring Boot to monitor and manage the application through JMX or HTTP. The marketing slogan is: enhancing the application with production-ready features. There are lots and lots of features available in the module, I’m not going to cover all of those in the article but if you are interested, you can find more info here.
There is one interesting feature though, the health API. There is a single HTTP endpoint you can call /actuator/health
. The default behavior is simple, if the application is healthy, it responds with HTTP 200 and the following JSON:
{ status: "UP" }
If the application is unhealthy, it will respond with HTTP 503 and the following JSON:
{ status: "DOWN" }
Customizing the health indicator
I don’t want to go into deep details how Actuator works under the hood but there is an interface called HealthIndicator
that contributes to the overall system health. There are more than a dozen of them already auto-configured for you. An example is DiskSpaceHealthIndicator
that checks for low disk space.
Writing a custom HealthIndicator
is quite easy. Simply create a new class that implements the HealthIndicator
interface and mark it as a @Component
. Spring will pick it up automatically.
For the sake of the testing, I’ll show you a very simple HealthIndicator
that can be switched UP
/DOWN
with a simple HTTP call.
I’m starting off from a generated project on start.spring.io. Gradle one with Actuator and Web dependencies. So, as a first step, let’s create a Spring bean that will hold the health state (healthy/unhealthy):
@Component public class ManualHealthHolder { private AtomicBoolean healthy = new AtomicBoolean(true); public void switchHealth() { healthy.set(!healthy.get()); } public boolean isHealthy() { return healthy.get(); } }
Nothing special, just a state holder class with a single boolean value that represents the health of the system.
The HealthIndicator
is also very simple:
@Component public class ManualHealthIndicator implements HealthIndicator { @Autowired private ManualHealthHolder manualHealthHolder; @Override public Health health() { boolean healthy = manualHealthHolder.isHealthy(); if (healthy) { return Health.up().build(); } return Health.down().build(); } }
There is a single method on the HealthIndicator
interface that needs to be implemented HealthIndicator#health
. You can do very complicated things like introducing new health states to the system but we’ll go with the existing UP
and DOWN
states. In this particular example, deciding the health is based on the ManualHealthHolder
bean. If it says healthy, the state will be UP
. If it says unhealthy, the state will be DOWN
.
The next and last step is to create an HTTP endpoint for changing the state.
@RestController public class ManualHealthRestController { @Autowired private ManualHealthHolder manualHealthHolder; @GetMapping("/switch") public ResponseEntity<?> switchHealth() { manualHealthHolder.switchHealth(); return new ResponseEntity<>(HttpStatus.OK); } }
Very minimal again. There is a single HTTP GET mapping for switching the statuses: /status
.
Testing time, starting up the application with ./gradlew clean build bootRun
.
If you cURL localhost:8080/actuator/health
, you’ll get the UP
response (I’ve used jq here to format the response nicely).
$ curl localhost:8080/actuator/health | jq { "status": "UP" }
To simulate the downtime of the application, we can call the localhost:8080/switch
API. It will switch the healthy flag internally and now querying the /actuator/health
endpoint, you’ll get the DOWN
state.
$ curl localhost:8080/actuator/health | jq { "status": "DOWN" }
Liveness probe with Actuator
Now that we know the building blocks, let’s go on with integrating Actuator and Kubernetes together. Kubernetes is working with Docker containers so we need to create a container from the Spring application. A simple Dockerfile
looks the following:
FROM openjdk:8-jdk-alpine RUN apk add --no-cache --upgrade bash RUN apk add --no-cache --upgrade curl COPY build/libs/actuator-healtcheck-example-0.0.1-SNAPSHOT.jar app.jar ENTRYPOINT ["java","-jar","/app.jar"]
Alright, now after executing
$ ./gradlew clean build
we can also execute the
$ docker build . -t actuator-healthcheck-example
command to build the docker image.
I’m using minikube here for testing so let me add a few more steps to properly create the image so we are able to deploy it to the actual Kubernetes cluster.
Before creating the image, you should execute the following command to change your docker context to the Kubernetes cluster.
$ eval $(minikube docker-env)
Now that you have the docker context set up, execute the command to build the docker image. You can verify with the docker images
command whether the image was successfully created.
$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE actuator-healthcheck-example latest 6db0841a7102 5 seconds ago 124MB
The next point we’re looking at next is the deployment to the cluster. The initial deployment file looks the following:
apiVersion: apps/v1 kind: Deployment metadata: name: actuator-healthcheck-example labels: app: actuator-healthcheck-example spec: replicas: 1 selector: matchLabels: app: actuator-healthcheck-example template: metadata: labels: app: actuator-healthcheck-example spec: containers: - name: actuator-healthcheck-example image: actuator-healthcheck-example:latest imagePullPolicy: IfNotPresent ports: - containerPort: 8080
It’s a very basic deployment descriptor, one of the important points is to set the imagePullPolicy
to IfNotPresent
or Never
so Kubernetes will not try to download the docker image.
Adding the liveness probe:
apiVersion: apps/v1 kind: Deployment metadata: name: actuator-healthcheck-example labels: app: actuator-healthcheck-example spec: replicas: 1 selector: matchLabels: app: actuator-healthcheck-example template: metadata: labels: app: actuator-healthcheck-example spec: containers: - name: actuator-healthcheck-example image: actuator-healthcheck-example:latest imagePullPolicy: IfNotPresent ports: - containerPort: 8080 livenessProbe: httpGet: path: /actuator/health port: 8080 initialDelaySeconds: 5 periodSeconds: 10 failureThreshold: 2
So the trick here is to use the httpGet
action on the probe and bind it to the /actuator/health
endpoint. As I said earlier, in case of the httpGet
action, the probe will consider the application healthy when the status code is between 200
and 399
. Guess what, the /actuator/health
API is fulfilling that contract. In case the application is reporting a healthy state, it will respond with 200
and 503
when it’s down.
The rest of the configuration is just telling Kubernetes to wait 5 seconds before the probe is initiated. Also, each 10 seconds execute the GET request against the endpoint to check for the health. And consider the container unhealthy if 2 consecutive healthchecks have failed.
Putting it all together
That’s it. Testing time. Now if you’ve read it this far, I assume the docker image is already build so we’re going from there.
The only thing we need to do is to deploy the application. With a little kubectl command you can do it:
$ kubectl apply -f k8s-deployment.yaml
The output should be:
deployment.apps/actuator-healthcheck-example created
So now if we take a look on the pods we have:
$ kubectl get pods NAME READY STATUS RESTARTS AGE actuator-healthcheck-example-6bf74bd94c-4xmvb 1/1 Running 0 17s
Everything looks good. The next phase of the testing is to flip the healthy flag in the application so we can see that the liveness probe is controlling to have the container in a healthy state.
Accessing the switch API for testing
There are 2 options for this. One is to open a terminal inside the container so we can locally trigger the /switch
endpoint. The other one is to proxy the pod traffic to the local machine.
To get a shell inside the container, execute the following command (of course change the pod name to yours):
$ kubectl exec -it actuator-healthcheck-example-6bf74bd94c-t5h2f -- bash
From then on, executing
bash-4.4# curl localhost:8080/switch
will switch the flag. If you query the /actuator/health
API the same way, it’s going to say DOWN
.
The other option to trigger the /switch
API is to forward requests from your local machine directly to the pod with the use of kubectl
. However it needs some preparation so the pod is accessible. We need to expose the pod’s port as a Service
. To do that, let’s extend the descriptor we created:
apiVersion: v1 kind: Service metadata: name: actuator-healthcheck-example-svc labels: app: actuator-healthcheck-example spec: ports: - port: 8080 targetPort: 8080 selector: app: actuator-healthcheck-example
So the full k8s-deployment.yaml
file looks the following:
apiVersion: apps/v1 kind: Deployment metadata: name: actuator-healthcheck-example labels: app: actuator-healthcheck-example spec: replicas: 1 selector: matchLabels: app: actuator-healthcheck-example template: metadata: labels: app: actuator-healthcheck-example spec: containers: - name: actuator-healthcheck-example image: actuator-healthcheck-example:latest imagePullPolicy: IfNotPresent ports: - containerPort: 8080 livenessProbe: httpGet: path: /actuator/health port: 8080 initialDelaySeconds: 5 periodSeconds: 10 failureThreshold: 2 --- apiVersion: v1 kind: Service metadata: name: actuator-healthcheck-example-svc labels: app: actuator-healthcheck-example spec: ports: - port: 8080 targetPort: 8080 selector: app: actuator-healthcheck-example
Now that everything is in place, redeploy the stack with
$ kubectl apply -f k8s-deployment.yaml
To access the API, only a port-forward is needed:
$ kubectl port-forward actuator-healthcheck-example-6bf74bd94c-t5h2f 9876:8080
The command binds the local 9876 port to the pod’s 8080 port. So from now on, you can access the API from your local machine through localhost:9876
.
$ curl localhost:9876/switch
Observing the liveness probe
The application is deployed. We can access the API. Everything is ready to see the liveness probe in action. First of all, let’s verify that the pod is alive and the /actuator/health
API is returning the UP
status.
$ kubectl get pods NAME READY STATUS RESTARTS AGE actuator-healthcheck-example-6bf74bd94c-l6b9j 1/1 Running 0 15m
$ curl localhost:9876/actuator/health {"status":"UP"}
Looks good so far. Switching the health with /switch
.
$ curl localhost:9876/switch
$ kubectl get pods NAME READY STATUS RESTARTS AGE actuator-healthcheck-example-6bf74bd94c-l6b9j 1/1 Running 0 16m
$ curl localhost:9876/actuator/health {"status":"DOWN"}
From the pod perspective, everything looks good however Actuator is saying the service is DOWN
. Observing the pod events will clearly indicate that there was in fact a container restart because of it.
$ kubectl describe pod actuator-healthcheck-example-7dcdd4dd48-97qxm
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 65s default-scheduler Successfully assigned default/actuator-healthcheck-example-7dcdd4dd48-97qxm to minikube Normal Pulled 24s (x2 over 64s) kubelet, minikube Container image "actuator-healthcheck-example:latest" already present on machine Warning Unhealthy 24s (x2 over 34s) kubelet, minikube Liveness probe failed: HTTP probe failed with statuscode: 503 Normal Killing 24s kubelet, minikube Container actuator-healthcheck-example failed liveness probe, will be restarted Normal Created 23s (x2 over 64s) kubelet, minikube Created container actuator-healthcheck-example Normal Started 23s (x2 over 64s) kubelet, minikube Started container actuator-healthcheck-example
You can see the message Liveness probe failed: HTTP probe failed with statuscode: 503
. And it happened 2 times so the container was considered unhealthy and have been restarted.
When the container restart is done, you can query the Actuator health and it will respond with UP
status as the container has been restarted.
$ curl localhost:9876/actuator/health {"status":"UP"}
Conclusion
I hope you see how easy it is to set up a proper healthcheck – at least liveness – with Kubernetes and Spring Boot. It’s definitely something I recommend doing to create a more resilient system and react on problems automatically.
The code can be found on GitHub. If you liked the article, give it a thumbs up and share it. If you are interested in more, make sure you follow me on Twitter.