In software systems, components can become unhealthy due to transient issues such as temporary connectivity loss, configuration errors, or problems with external dependencies. OpenShift Container Platform applications have a number of options to detect and handle unhealthy containers.

Understanding health checks

A probe is a Kubernetes action that periodically performs diagnostics on a running container. Currently, two types of probes exist, each serving a different purpose.

Readiness Probe

A Readiness check determines if the container in which it is scheduled is ready to service requests. If the readiness probe fails a container, the endpoints controller ensures the container has its IP address removed from the endpoints of all services. A readiness probe can be used to signal to the endpoints controller that even though a container is running, it should not receive any traffic from a proxy.

For example, a Readiness check can control which Pods are used. When a Pod is not ready, it is removed.

Liveness Probe

A Liveness checks determines if the container in which it is scheduled is still running. If the liveness probe fails due to a condition such as a deadlock, the kubelet kills the container The container then responds based on its restart policy.

For example, a liveness probe on a node with a restartPolicy of Always or OnFailure kills and restarts the Container on the node.

Sample Liveness Check
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-http
spec:
  containers:
  - name: liveness-http
    image: k8s.gcr.io/liveness (1)
    args:
    - /server
    livenessProbe: (2)
      httpGet:   (3)
        # host: my-host
        # scheme: HTTPS
        path: /healthz
        port: 8080
        httpHeaders:
        - name: X-Custom-Header
          value: Awesome
      initialDelaySeconds: 15  (4)
      timeoutSeconds: 1   (5)
    name: liveness   (6)
1 Specifies the image to use for the liveness probe.
2 Specifies the type of heath check.
3 Specifies the type of Liveness check:
  • HTTP Checks. Specify httpGet.

  • Container Execution Checks. Specify exec.

  • TCP Socket Check. Specify tcpSocket.

4 Specifies the number of seconds before performing the first probe after the container starts.
5 Specifies the number of seconds between probes.
Sample Liveness check output wth unhealthy container
$ oc describe pod pod1

....

FirstSeen LastSeen    Count   From            SubobjectPath           Type        Reason      Message
--------- --------    -----   ----            -------------           --------    ------      -------
37s       37s     1   {default-scheduler }                            Normal      Scheduled   Successfully assigned liveness-exec to worker0
36s       36s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Pulling     pulling image "k8s.gcr.io/busybox"
36s       36s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Pulled      Successfully pulled image "k8s.gcr.io/busybox"
36s       36s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Created     Created container with docker id 86849c15382e; Security:[seccomp=unconfined]
36s       36s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Started     Started container with docker id 86849c15382e
2s        2s      1   {kubelet worker0}   spec.containers{liveness}   Warning     Unhealthy   Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory

Understanding the types of health checks

Liveness checks and Readiness checks can be configured in three ways:

HTTP Checks

The kubelet uses a web hook to determine the healthiness of the container. The check is deemed successful if the HTTP response code is between 200 and 399.

A HTTP check is ideal for applications that return HTTP status codes when completely initialized.

Container Execution Checks

The kubelet executes a command inside the container. Exiting the check with status 0 is considered a success.

TCP Socket Checks

The kubelet attempts to open a socket to the container. The container is only considered healthy if the check can establish a connection. A TCP socket check is ideal for applications that do not start listening until initialization is complete.

Configuring health checks using the CLI

To configure health checks, create a pod for each type of check you want.

Procedure

To create health checks:

  1. Create a Liveness Container Execution Check:

    1. Create a YAML file similar to the following:

      apiVersion: v1
      kind: Pod
      metadata:
        labels:
          test: liveness
        name: liveness-exec
      spec:
        containers:
        - args:
          image: k8s.gcr.io/liveness
          livenessProbe:
            exec:  (1)
              command: (2)
              - cat
              - /tmp/health
            initialDelaySeconds: 15 (3)
      ...
      1 Specify a Liveness check and the type of Liveness check.
      2 Specify the commands to use in the container.
      3 Specify the number of seconds before performing the first probe after the container starts.
    2. Verify the state of the health check pod:

      $ oc describe pod liveness-exec
      
      Events:
        Type    Reason     Age   From                                  Message
        ----    ------     ----  ----                                  -------
        Normal  Scheduled  9s    default-scheduler                     Successfully assigned openshift-logging/liveness-exec to ip-10-0-143-40.ec2.internal
        Normal  Pulling    2s    kubelet, ip-10-0-143-40.ec2.internal  pulling image "k8s.gcr.io/liveness"
        Normal  Pulled     1s    kubelet, ip-10-0-143-40.ec2.internal  Successfully pulled image "k8s.gcr.io/liveness"
        Normal  Created    1s    kubelet, ip-10-0-143-40.ec2.internal  Created container
        Normal  Started    1s    kubelet, ip-10-0-143-40.ec2.internal  Started container

      The timeoutSeconds parameter has no effect on the Readiness and Liveness probes for Container Execution Checks. You can implement a timeout inside the probe itself, as OpenShift Container Platform cannot time out on an exec call into the container. One way to implement a timeout in a probe is by using the timeout parameter to run your liveness or readiness probe:

      spec:
        containers:
          livenessProbe:
            exec:
              command:
                - /bin/bash
                - '-c'
                - timeout 60 /opt/eap/bin/livenessProbe.sh (1)
            timeoutSeconds: 1
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
      1 Timeout value and path to the probe script.
    3. Create the check:

      $ oc create -f <file-name>.yaml
  2. Create a Liveness TCP Socket Check:

    1. Create a YAML file similar to the following:

      apiVersion: v1
      kind: Pod
      metadata:
        labels:
          test: liveness
        name: liveness-tcp
      spec:
        containers:
        - name: contaier1 (1)
          image: k8s.gcr.io/liveness
          ports:
          - containerPort: 8080 (1)
          livenessProbe:  (2)
            tcpSocket:
              port: 8080
            initialDelaySeconds: 15 (3)
            timeoutSeconds: 1  (4)
      1 Specify the container name and port for the check to connect to.
      2 Specify the Liveness heath check and the type of Liveness check.
      3 Specify the number of seconds before performing the first probe after the container starts.
      4 Specify the number of seconds between probes.
    2. Create the check:

      $ oc create -f <file-name>.yaml
  3. Create an Readiness HTTP Check:

    1. Create a YAML file similar to the following:

      apiVersion: v1
      kind: Pod
      metadata:
        labels:
          test: readiness
        name: readiness-http
      spec:
        containers:
        - args:
          image: k8s.gcr.io/readiness (1)
          readinessProbe: (2)
          httpGet:
          # host: my-host (3)
          # scheme: HTTPS (4)
            path: /healthz
            port: 8080
          initialDelaySeconds: 15  (5)
          timeoutSeconds: 1  (6)
      1 Specify the image to use for the liveness probe.
      2 Specify the Readiness heath check and the type of Readiness check.
      3 Specify a host IP address. When host is not defined, the PodIP is used.
      4 Specify HTTP or HTTPS. When scheme is not defined, the HTTP scheme is used.
      5 Specify the number of seconds before performing the first probe after the container starts.
      6 Specify the number of seconds between probes.
    2. Create the check:

      $ oc create -f <file-name>.yaml

Monitoring application health using the Developer perspective

You can use the Developer perspective to add three types of health probes to your container to ensure that your application is healthy:

  • Use the Readiness probe to check if the container is ready to handle requests.

  • Use the Liveness probe to check if the container is running.

  • Use the Startup probe to check if the application within the container has started.

You can add health checks either while creating and deploying an application, or after you have deployed an application.

Adding health checks using the Developer perspective

You can use the Topology view to add health checks to your deployed application.

Prerequisites:
  • You have switched to the Developer perspective in the web console.

  • You have created and deployed an application on OpenShift Container Platform using the Developer perspective.

Procedure
  1. In the Topology view, click on the application node to see the side panel. If the container does not have health checks added to ensure the smooth running of your application, a Health Checks notification is displayed with a link to add health checks.

  2. In the displayed notification, click the Add Health Checks link.

  3. Alternatively, you can also click the Actions drop-down list and select Add Health Checks. Note that if the container already has health checks, you will see the Edit Health Checks option instead of the add option.

  4. In the Add Health Checks form, if you have deployed multiple containers, use the Container drop-down list to ensure that the appropriate container is selected.

  5. Click the required health probe links to add them to the container. Default data for the health checks is prepopulated. You can add the probes with the default data or further customize the values and then add them. For example, to add a Readiness probe that checks if your container is ready to handle requests:

    1. Click Add Readiness Probe, to see a form containing the parameters for the probe.

    2. Click the Type drop-down list to select the request type you want to add. For example, in this case, select Container Command to select the command that will be executed inside the container.

    3. In the Command field, add an argument cat, similarly, you can add multiple arguments for the check, for example, add another argument /tmp/healthy.

    4. Retain or modify the default values for the other parameters as required, and click the check mark at the bottom of the form. The Readiness Probe Added message is displayed.

  6. Click Add to add the health check. You are redirected to the Topology view and the container is restarted.

  7. In the side panel, verify that the probes have been added by clicking on the deployed Pod under the Pods section.

  8. In the Pod Details page, click the listed container in the Containers section.

  9. In the Container Details page, verify that the Readiness probe - Exec Command cat /tmp/healthy has been added to the container.

Editing health checks using the Developer perspective

You can use the Topology view to edit health checks added to your application, modify them, or add more health checks.

Prerequisites:
  • You have switched to the Developer perspective in the web console.

  • You have created and deployed an application on OpenShift Container Platform using the Developer perspective.

  • You have added health checks to your application.

Procedure
  1. In the Topology view, right-click your application and select Edit Health Checks. Alternatively, in the side panel, click the Actions drop-down list and select Edit Health Checks.

  2. In the Edit Health Checks page:

    • To remove a previously added health probe, click the minus sign adjoining it.

    • To edit the parameters of an existing probe:

      1. Click the Edit Probe link next to a previously added probe to see the parameters for the probe.

      2. Modify the parameters as required, and click the check mark to save your changes.

    • To add a new health probe, in addition to existing health checks, click the add probe links. For example, to add a Liveness probe that checks if your container is running:

      1. Click Add Liveness Probe, to see a form containing the parameters for the probe.

      2. Edit the probe parameters as required, and click the check mark at the bottom of the form. The Liveness Probe Added message is displayed.

  3. Click Save to save your modifications and add the additional probes to your container. You are redirected to the Topology view.

  4. In the side panel, verify that the probes have been added by clicking on the deployed Pod under the Pods section.

  5. In the Pod Details page, click the listed container in the Containers section.

  6. In the Container Details page, verify that the Liveness probe - HTTP Get 10.129.4.65:8080/ has been added to the container, in addition to the earlier existing probes.

Monitoring health check failures using the Developer perspective

In case an application health check fails, you can use the Topology view to monitor these health check violations.

Prerequisites:
  • You have switched to the Developer perspective in the web console.

  • You have created and deployed an application on OpenShift Container Platform using the Developer perspective.

  • You have added health checks to your application.

Procedure
  1. In the Topology view, click on the application node to see the side panel.

  2. Click the Monitoring tab to see the health check failures in the Events (Warning) section.

  3. Click the down arrow adjoining Events (Warning) to see the details of the health check failure.

Additional Resources