Operators are a method of packaging, deploying, and managing an OpenShift Container Platform application. They act like an extension of the software vendor’s engineering team, watching over an OpenShift Container Platform environment and using its current state to make decisions in real time. Operators are designed to handle upgrades seamlessly, react to failures automatically, and not take shortcuts, like skipping a software backup process to save time.

OpenShift Container Platform 4.5 includes a default set of Operators that are required for proper functioning of the cluster. These default Operators are managed by the Cluster Version Operator (CVO).

As a cluster administrator, you can install application Operators from the OperatorHub using the OpenShift Container Platform web console or the CLI. You can then subscribe the Operator to one or more namespaces to make it available for developers on your cluster. Application Operators are managed by Operator Lifecycle Manager (OLM).

If you experience Operator issues, verify Operator Subscription status. Check Operator Pod health across the cluster and gather Operator logs for diagnosis.

Operator Subscription condition types

Subscriptions can report the following condition types:

Table 1. Subscription condition types
Condition Description

CatalogSourcesUnhealthy

Some or all of the Catalog Sources to be used in resolution are unhealthy.

InstallPlanMissing

A Subscription’s InstallPlan is missing.

InstallPlanPending

A Subscription’s InstallPlan is pending installation.

InstallPlanFailed

A Subscription’s InstallPlan has failed.

Default OpenShift Container Platform cluster Operators are managed by the Cluster Version Operator (CVO) and they do not have a Subscription object. Application Operators are managed by Operator Lifecycle Manager (OLM) and they have a Subscription object.

Viewing Operator Subscription status using the CLI

You can view Operator Subscription status using the CLI.

Prerequisites
  • You have access to the cluster as a user with the cluster-admin role.

  • You have installed the OpenShift CLI (oc).

Procedure
  1. List Operator Subscriptions:

    $ oc get subs -n <operator_namespace>
  2. Use the oc describe command to inspect a Subscription resource:

    $ oc describe sub <subscription_name> -n <operator_namespace>
  3. In the command output, find the Conditions section for the status of Operator subscription condition types. In the following example, the CatalogSourcesUnhealthy condition type has a status of false because all available catalog sources are healthy:

    Example output
    Conditions:
       Last Transition Time:  2019-07-29T13:42:57Z
       Message:               all available catalogsources are healthy
       Reason:                AllCatalogSourcesHealthy
       Status:                False
       Type:                  CatalogSourcesUnhealthy

Default OpenShift Container Platform cluster Operators are managed by the Cluster Version Operator (CVO) and they do not have a Subscription object. Application Operators are managed by Operator Lifecycle Manager (OLM) and they have a Subscription object.

Querying Operator Pod status

You can list Operator Pods within a cluster and their status. You can also collect a detailed Operator Pod summary.

Prerequisites
  • You have access to the cluster as a user with the cluster-admin role.

  • Your API service is still functional.

  • You have installed the OpenShift CLI (oc).

Procedure
  1. List Operators running in the cluster. The output includes Operator version, availability, and up-time information:

    $ oc get clusteroperators
  2. List Operator Pods running in the Operator’s namespace, plus Pod status, restarts, and age:

    $ oc get pod -n <operator_namespace>
  3. Output a detailed Operator Pod summary:

    $ oc describe pod <operator_pod_name> -n <operator_namespace>
  4. If an Operator issue is node-specific, query Operator container status on that node.

    1. Start a debug Pod for the node:

      $ oc debug node/my-node
    2. Set /host as the root directory within the debug shell. The debug Pod mounts the host’s root file system in /host within the Pod. By changing the root directory to /host, you can run binaries contained in the host’s executable paths:

      # chroot /host

      OpenShift Container Platform 4.5 cluster nodes running Red Hat Enterprise Linux CoreOS (RHCOS) are immutable and rely on Operators to apply cluster changes. Accessing cluster nodes using SSH is not recommended and nodes will be tainted as accessed. However, if the OpenShift Container Platform API is not available, or the kubelet is not properly functioning on the target node, oc operations will be impacted. In such situations, it is possible to access nodes using ssh core@<node>.<cluster_name>.<base_domain> instead.

    3. List details about the node’s containers, including state and associated Pod IDs:

      # crictl ps
    4. List information about a specific Operator container on the node. The following example lists information about the network-operator container:

      # crictl ps --name network-operator
    5. Exit from the debug shell.

Gathering Operator logs

If you experience Operator issues, you can gather detailed diagnostic information from Operator Pod logs.

Prerequisites
  • You have access to the cluster as a user with the cluster-admin role.

  • Your API service is still functional.

  • You have installed the OpenShift CLI (oc).

  • You have the fully qualified domain names of the control plane, or master machines.

Procedure
  1. List the Operator Pods that are running in the Operator’s namespace, plus the Pod status, restarts, and age:

    $ oc get pods -n <operator_namespace>
  2. Review logs for an Operator Pod:

    $ oc logs pod/<pod_name> -n <operator_namespace>

    If an Operator Pod has multiple containers, the preceding command will produce an error that includes the name of each container. Query logs from an individual container:

    $ oc logs pod/<operator_pod_name> -c <container_name> -n <operator_namespace>
  3. If the API is not functional, review Operator Pod and container logs on each master node by using SSH instead. Replace <master-node>.<cluster_name>.<base_domain> with appropriate values.

    1. List Pods on each master node:

      $ ssh core@<master-node>.<cluster_name>.<base_domain> sudo crictl pods
    2. For any Operator Pods not showing a Ready status, inspect the Pod’s status in detail. Replace <operator_pod_id> with the Operator Pod’s ID listed in the output of the preceding command:

      $ ssh core@<master-node>.<cluster_name>.<base_domain> sudo crictl inspectp <operator_pod_id>
    3. List containers related to an Operator Pod:

      $ ssh core@<master-node>.<cluster_name>.<base_domain> sudo crictl ps --pod=<operator_pod_id>
    4. For any Operator container not showing a Ready status, inspect the container’s status in detail. Replace <container_id> with a container ID listed in the output of the preceding command:

      $ ssh core@<master-node>.<cluster_name>.<base_domain> sudo crictl inspect <container_id>
    5. Review the logs for any Operator containers not showing a Ready status. Replace <container_id> with a container ID listed in the output of the preceding command:

      $ ssh core@<master-node>.<cluster_name>.<base_domain> sudo crictl logs -f <container_id>

      OpenShift Container Platform 4.5 cluster nodes running Red Hat Enterprise Linux CoreOS (RHCOS) are immutable and rely on Operators to apply cluster changes. Accessing cluster nodes using SSH is not recommended and nodes will be tainted as accessed. Before attempting to collect diagnostic data over SSH, review whether the data collected by running oc adm must gather and other oc commands is sufficient instead. However, if the OpenShift Container Platform API is not available, or the kubelet is not properly functioning on the target node, oc operations will be impacted. In such situations, it is possible to access nodes using ssh core@<node>.<cluster_name>.<base_domain>.