$ oc get subs -n <operator_namespace>
Operators are a method of packaging, deploying, and managing an Red Hat OpenShift Service on AWS application. They act like an extension of the software vendor’s engineering team, watching over an Red Hat OpenShift Service on AWS environment and using its current state to make decisions in real time. Operators are designed to handle upgrades seamlessly, react to failures automatically, and not take shortcuts, such as skipping a software backup process to save time.
Red Hat OpenShift Service on AWS includes a default set of Operators that are required for proper functioning of the cluster. These default Operators are managed by the Cluster Version Operator (CVO).
As a cluster administrator, you can install application Operators from the OperatorHub using the Red Hat OpenShift Service on AWS web console or the CLI. You can then subscribe the Operator to one or more namespaces to make it available for developers on your cluster. Application Operators are managed by Operator Lifecycle Manager (OLM).
If you experience Operator issues, verify Operator subscription status. Check Operator pod health across the cluster and gather Operator logs for diagnosis.
Subscriptions can report the following condition types:
Condition | Description |
---|---|
|
Some or all of the catalog sources to be used in resolution are unhealthy. |
|
An install plan for a subscription is missing. |
|
An install plan for a subscription is pending installation. |
|
An install plan for a subscription has failed. |
|
The dependency resolution for a subscription has failed. |
Default Red Hat OpenShift Service on AWS cluster Operators are managed by the Cluster Version Operator (CVO) and they do not have a |
You can view Operator subscription status by using the CLI.
You have access to the cluster as a user with the dedicated-admin
role.
You have installed the OpenShift CLI (oc
).
List Operator subscriptions:
$ oc get subs -n <operator_namespace>
Use the oc describe
command to inspect a Subscription
resource:
$ oc describe sub <subscription_name> -n <operator_namespace>
In the command output, find the Conditions
section for the status of Operator subscription condition types. In the following example, the CatalogSourcesUnhealthy
condition type has a status of false
because all available catalog sources are healthy:
Name: cluster-logging
Namespace: openshift-logging
Labels: operators.coreos.com/cluster-logging.openshift-logging=
Annotations: <none>
API Version: operators.coreos.com/v1alpha1
Kind: Subscription
# ...
Conditions:
Last Transition Time: 2019-07-29T13:42:57Z
Message: all available catalogsources are healthy
Reason: AllCatalogSourcesHealthy
Status: False
Type: CatalogSourcesUnhealthy
# ...
Default Red Hat OpenShift Service on AWS cluster Operators are managed by the Cluster Version Operator (CVO) and they do not have a |
You can view the status of an Operator catalog source by using the CLI.
You have access to the cluster as a user with the dedicated-admin
role.
You have installed the OpenShift CLI (oc
).
List the catalog sources in a namespace. For example, you can check the openshift-marketplace
namespace, which is used for cluster-wide catalog sources:
$ oc get catalogsources -n openshift-marketplace
NAME DISPLAY TYPE PUBLISHER AGE
certified-operators Certified Operators grpc Red Hat 55m
community-operators Community Operators grpc Red Hat 55m
example-catalog Example Catalog grpc Example Org 2m25s
redhat-marketplace Red Hat Marketplace grpc Red Hat 55m
redhat-operators Red Hat Operators grpc Red Hat 55m
Use the oc describe
command to get more details and status about a catalog source:
$ oc describe catalogsource example-catalog -n openshift-marketplace
Name: example-catalog
Namespace: openshift-marketplace
Labels: <none>
Annotations: operatorframework.io/managed-by: marketplace-operator
target.workload.openshift.io/management: {"effect": "PreferredDuringScheduling"}
API Version: operators.coreos.com/v1alpha1
Kind: CatalogSource
# ...
Status:
Connection State:
Address: example-catalog.openshift-marketplace.svc:50051
Last Connect: 2021-09-09T17:07:35Z
Last Observed State: TRANSIENT_FAILURE
Registry Service:
Created At: 2021-09-09T17:05:45Z
Port: 50051
Protocol: grpc
Service Name: example-catalog
Service Namespace: openshift-marketplace
# ...
In the preceding example output, the last observed state is TRANSIENT_FAILURE
. This state indicates that there is a problem establishing a connection for the catalog source.
List the pods in the namespace where your catalog source was created:
$ oc get pods -n openshift-marketplace
NAME READY STATUS RESTARTS AGE
certified-operators-cv9nn 1/1 Running 0 36m
community-operators-6v8lp 1/1 Running 0 36m
marketplace-operator-86bfc75f9b-jkgbc 1/1 Running 0 42m
example-catalog-bwt8z 0/1 ImagePullBackOff 0 3m55s
redhat-marketplace-57p8c 1/1 Running 0 36m
redhat-operators-smxx8 1/1 Running 0 36m
When a catalog source is created in a namespace, a pod for the catalog source is created in that namespace. In the preceding example output, the status for the example-catalog-bwt8z
pod is ImagePullBackOff
. This status indicates that there is an issue pulling the catalog source’s index image.
Use the oc describe
command to inspect a pod for more detailed information:
$ oc describe pod example-catalog-bwt8z -n openshift-marketplace
Name: example-catalog-bwt8z
Namespace: openshift-marketplace
Priority: 0
Node: ci-ln-jyryyg2-f76d1-ggdbq-worker-b-vsxjd/10.0.128.2
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 48s default-scheduler Successfully assigned openshift-marketplace/example-catalog-bwt8z to ci-ln-jyryyf2-f76d1-fgdbq-worker-b-vsxjd
Normal AddedInterface 47s multus Add eth0 [10.131.0.40/23] from openshift-sdn
Normal BackOff 20s (x2 over 46s) kubelet Back-off pulling image "quay.io/example-org/example-catalog:v1"
Warning Failed 20s (x2 over 46s) kubelet Error: ImagePullBackOff
Normal Pulling 8s (x3 over 47s) kubelet Pulling image "quay.io/example-org/example-catalog:v1"
Warning Failed 8s (x3 over 47s) kubelet Failed to pull image "quay.io/example-org/example-catalog:v1": rpc error: code = Unknown desc = reading manifest v1 in quay.io/example-org/example-catalog: unauthorized: access to the requested resource is not authorized
Warning Failed 8s (x3 over 47s) kubelet Error: ErrImagePull
In the preceding example output, the error messages indicate that the catalog source’s index image is failing to pull successfully because of an authorization issue. For example, the index image might be stored in a registry that requires login credentials.
gRPC documentation: States of Connectivity
You can list Operator pods within a cluster and their status. You can also collect a detailed Operator pod summary.
You have access to the cluster as a user with the dedicated-admin
role.
Your API service is still functional.
You have installed the OpenShift CLI (oc
).
List Operators running in the cluster. The output includes Operator version, availability, and up-time information:
$ oc get clusteroperators
List Operator pods running in the Operator’s namespace, plus pod status, restarts, and age:
$ oc get pod -n <operator_namespace>
Output a detailed Operator pod summary:
$ oc describe pod <operator_pod_name> -n <operator_namespace>
If you experience Operator issues, you can gather detailed diagnostic information from Operator pod logs.
You have access to the cluster as a user with the dedicated-admin
role.
Your API service is still functional.
You have installed the OpenShift CLI (oc
).
You have the fully qualified domain names of the control plane or control plane machines.
List the Operator pods that are running in the Operator’s namespace, plus the pod status, restarts, and age:
$ oc get pods -n <operator_namespace>
Review logs for an Operator pod:
$ oc logs pod/<pod_name> -n <operator_namespace>
If an Operator pod has multiple containers, the preceding command will produce an error that includes the name of each container. Query logs from an individual container:
$ oc logs pod/<operator_pod_name> -c <container_name> -n <operator_namespace>
If the API is not functional, review Operator pod and container logs on each control plane node by using SSH instead. Replace <master-node>.<cluster_name>.<base_domain>
with appropriate values.
List pods on each control plane node:
$ ssh core@<master-node>.<cluster_name>.<base_domain> sudo crictl pods
For any Operator pods not showing a Ready
status, inspect the pod’s status in detail. Replace <operator_pod_id>
with the Operator pod’s ID listed in the output of the preceding command:
$ ssh core@<master-node>.<cluster_name>.<base_domain> sudo crictl inspectp <operator_pod_id>
List containers related to an Operator pod:
$ ssh core@<master-node>.<cluster_name>.<base_domain> sudo crictl ps --pod=<operator_pod_id>
For any Operator container not showing a Ready
status, inspect the container’s status in detail. Replace <container_id>
with a container ID listed in the output of the preceding command:
$ ssh core@<master-node>.<cluster_name>.<base_domain> sudo crictl inspect <container_id>
Review the logs for any Operator containers not showing a Ready
status. Replace <container_id>
with a container ID listed in the output of the preceding command:
$ ssh core@<master-node>.<cluster_name>.<base_domain> sudo crictl logs -f <container_id>
Red Hat OpenShift Service on AWS cluster nodes running Red Hat Enterprise Linux CoreOS (RHCOS) are immutable and rely on Operators to apply cluster changes. Accessing cluster nodes by using SSH is not recommended. Before attempting to collect diagnostic data over SSH, review whether the data collected by running |