In software systems, components can become unhealthy due to transient issues such as temporary connectivity loss, configuration errors, or problems with external dependencies. OpenShift Container Platform applications have a number of options to detect and handle unhealthy containers.

Understanding health checks

A probe is a Kubernetes action that periodically performs diagnostics on a running container. Currently, two types of probes exist, each serving a different purpose.

Readiness Probe

A Readiness check determines if the container in which it is scheduled is ready to service requests. If the readiness probe fails a container, the endpoints controller ensures the container has its IP address removed from the endpoints of all services. A readiness probe can be used to signal to the endpoints controller that even though a container is running, it should not receive any traffic from a proxy.

For example, a Readiness check can control which Pods are used. When a Pod is not ready, it is removed.

Liveness Probe

A Liveness checks determines if the container in which it is scheduled is still running. If the liveness probe fails due to a condition such as a deadlock, the kubelet kills the container The container then responds based on its restart policy.

For example, a liveness probe with on a node with a restartPolicy of Always or OnFailure kills and restarts the Container on the node.

Sample Liveness Check
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-http
spec:
  containers:
  - name: liveness-http
    image: k8s.gcr.io/liveness (1)
    args:
    - /server
    livenessProbe: (2)
      httpGet:   (3)
        # host: my-host
        # scheme: HTTPS
        path: /healthz
        port: 8080
        httpHeaders:
        - name: X-Custom-Header
          value: Awesome
      initialDelaySeconds: 15  (4)
      timeoutSeconds: 1   (5)
    name: liveness   (6)
1 Specifies the image to use for the liveness probe.
2 Specifies the type of heath check.
3 Specifies the type of Liveness check:
  • HTTP Checks. Specify httpGet.

  • Container Execution Checks. Specify exec.

  • TCP Socket Check. Specify tcpSocket.

4 Specifies the number of seconds before performing the first probe after the container starts.
5 Specifies the number of seconds between probes.
Sample Liveness check output wth unhealthy container
$ oc describe pod pod1

....

FirstSeen LastSeen    Count   From            SubobjectPath           Type        Reason      Message
--------- --------    -----   ----            -------------           --------    ------      -------
37s       37s     1   {default-scheduler }                            Normal      Scheduled   Successfully assigned liveness-exec to worker0
36s       36s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Pulling     pulling image "k8s.gcr.io/busybox"
36s       36s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Pulled      Successfully pulled image "k8s.gcr.io/busybox"
36s       36s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Created     Created container with docker id 86849c15382e; Security:[seccomp=unconfined]
36s       36s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Started     Started container with docker id 86849c15382e
2s        2s      1   {kubelet worker0}   spec.containers{liveness}   Warning     Unhealthy   Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory

Understanding the types of health checks

Liveness checks and Readiness checks can be configured in three ways:

HTTP Checks

The kubelet uses a web hook to determine the healthiness of the container. The check is deemed successful if the HTTP response code is between 200 and 399.

A HTTP check is ideal for applications that return HTTP status codes when completely initialized.

Container Execution Checks

The kubelet executes a command inside the container. Exiting the check with status 0 is considered a success.

TCP Socket Checks

The kubelet attempts to open a socket to the container. The container is only considered healthy if the check can establish a connection. A TCP socket check is ideal for applications that do not start listening until initialization is complete.

Configuring health checks

To configure health checks, create a pod for each type of check you want.

Procedure

To create health checks:

  1. Create a Liveness Container Execution Check:

    1. Create a YAML file similar to the following:

      apiVersion: v1
      kind: Pod
      metadata:
        labels:
          test: liveness
        name: liveness-exec
      spec:
        containers:
        - args:
          image: k8s.gcr.io/liveness
          livenessProbe:
            exec:  (1)
              command: (2)
              - cat
              - /tmp/health
            initialDelaySeconds: 15 (3)
      ...
      1 Specify a Liveness check and the type of Liveness check.
      2 Specify the commands to use in the container.
      3 Specify the number of seconds before performing the first probe after the container starts.
    2. Verify the state of the health check pod:

      $ oc describe pod liveness-exec
      
      Events:
        Type    Reason     Age   From                                  Message
        ----    ------     ----  ----                                  -------
        Normal  Scheduled  9s    default-scheduler                     Successfully assigned openshift-logging/liveness-exec to ip-10-0-143-40.ec2.internal
        Normal  Pulling    2s    kubelet, ip-10-0-143-40.ec2.internal  pulling image "k8s.gcr.io/liveness"
        Normal  Pulled     1s    kubelet, ip-10-0-143-40.ec2.internal  Successfully pulled image "k8s.gcr.io/liveness"
        Normal  Created    1s    kubelet, ip-10-0-143-40.ec2.internal  Created container
        Normal  Started    1s    kubelet, ip-10-0-143-40.ec2.internal  Started container

      The timeoutSeconds parameter has no effect on the Readiness and Liveness probes for Container Execution Checks. You can implement a timeout inside the probe itself, as OpenShift Container Platform cannot time out on an exec call into the container. One way to implement a timeout in a probe is by using the timeout parameter to run your liveness or readiness probe:

      spec:
        containers:
          livenessProbe:
            exec:
              command:
                - /bin/bash
                - '-c'
                - timeout 60 /opt/eap/bin/livenessProbe.sh (1)
            timeoutSeconds: 1
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
      1 Timeout value and path to the probe script.
    3. Create the check:

      $ oc create -f <file-name>.yaml
  2. Create a Liveness TCP Socket Check:

    1. Create a YAML file similar to the following:

      apiVersion: v1
      kind: Pod
      metadata:
        labels:
          test: liveness
        name: liveness-tcp
      spec:
        containers:
        - name: contaier1 (1)
          image: k8s.gcr.io/liveness
          ports:
          - containerPort: 8080 (1)
          livenessProbe:  (2)
            tcpSocket:
              port: 8080
            initialDelaySeconds: 15 (3)
            timeoutSeconds: 1  (4)
      1 Specify the container name and port for the check to connect to.
      2 Specify the Liveness heath check and the type of Liveness check.
      3 Specify the number of seconds before performing the first probe after the container starts.
      4 Specify the number of seconds between probes.
    2. Create the check:

      $ oc create -f <file-name>.yaml
  3. Create an Readiness HTTP Check:

    1. Create a YAML file similar to the following:

      apiVersion: v1
      kind: Pod
      metadata:
        labels:
          test: readiness
        name: readiness-http
      spec:
        containers:
        - args:
          image: k8s.gcr.io/readiness (1)
          readinessProbe: (2)
          httpGet:
          # host: my-host (3)
          # scheme: HTTPS (4)
            path: /healthz
            port: 8080
          initialDelaySeconds: 15  (5)
          timeoutSeconds: 1  (6)
      1 Specify the image to use for the liveness probe.
      2 Specify the Readiness heath check and the type of Readiness check.
      3 Specify a host IP address. When host is not defined, the PodIP is used.
      4 Specify HTTP or HTTPS. When scheme is not defined, the HTTP scheme is used.
      5 Specify the number of seconds before performing the first probe after the container starts.
      6 Specify the number of seconds between probes.
    2. Create the check:

      $ oc create -f <file-name>.yaml