Troubleshooting Service Mesh - Service Mesh 2.x | Service Mesh

Understanding Service Mesh versions
Troubleshooting Operator installation
- Validating Operator installation
- Troubleshooting service mesh Operators
Troubleshooting the control plane
- Validating the Service Mesh control plane installation
- Troubleshooting the Service Mesh control plane
Troubleshooting the data plane
- Troubleshooting sidecar injection
- Troubleshooting Envoy proxy
Getting support

This section describes how to identify and resolve common problems in Red Hat OpenShift Service Mesh. Use the following sections to help troubleshoot and debug problems when deploying Red Hat OpenShift Service Mesh on OpenShift Container Platform.

Understanding Service Mesh versions

In order to understand what version of Red Hat OpenShift Service Mesh you have deployed on your system, you need to understand how each of the component versions is managed.

Operator version - The most current Operator version is 2.2.3. The Operator version number only indicates the version of the currently installed Operator. Because the Red Hat OpenShift Service Mesh Operator supports multiple versions of the Service Mesh control plane, the version of the Operator does not determine the version of your deployed ServiceMeshControlPlane resources.

Upgrading to the latest Operator version automatically applies patch updates, but does not automatically upgrade your Service Mesh control plane to the latest minor version.
ServiceMeshControlPlane version - The ServiceMeshControlPlane version determines what version of Red Hat OpenShift Service Mesh you are using. The value of the spec.version field in the ServiceMeshControlPlane resource controls the architecture and configuration settings that are used to install and deploy Red Hat OpenShift Service Mesh. When you create the Service Mesh control plane you can set the version in one of two ways:
- To configure in the Form View, select the version from the Control Plane Version menu.
- To configure in the YAML View, set the value for spec.version in the YAML file.

Operator Lifecycle Manager (OLM) does not manage Service Mesh control plane upgrades, so the version number for your Operator and ServiceMeshControlPlane (SMCP) may not match, unless you have manually upgraded your SMCP.

Troubleshooting Operator installation

In addition to the information in this section, be sure to review the following topics:

Validating Operator installation

When you install the Red Hat OpenShift Service Mesh Operators, OpenShift automatically creates the following objects as part of a successful Operator installation:

config maps
custom resource definitions
deployments
pods
replica sets
roles
role bindings
secrets
service accounts
services

From the OpenShift Container Platform console

You can verify that the Operator pods are available and running by using the OpenShift Container Platform console.

Navigate to Workloads → Pods.
Select the openshift-operators namespace.
Verify that the following pods exist and have a status of running:
- istio-operator
- jaeger-operator
- kiali-operator
Select the openshift-operators-redhat namespace.
Verify that the elasticsearch-operator pod exists and has a status of running.

From the command line

Verify the Operator pods are available and running in the openshift-operators namespace with the following command:

$ oc get pods -n openshift-operators

Example output

NAME                               READY   STATUS    RESTARTS   AGE
istio-operator-bb49787db-zgr87     1/1     Running   0          15s
jaeger-operator-7d5c4f57d8-9xphf   1/1     Running   0          2m42s
kiali-operator-f9c8d84f4-7xh2v     1/1     Running   0          64s

Verify the Elasticsearch operator with the following command:

$ oc get pods -n openshift-operators-redhat

Example output

NAME                                      READY   STATUS    RESTARTS   AGE
elasticsearch-operator-d4f59b968-796vq     1/1     Running   0          15s

Troubleshooting service mesh Operators

If you experience Operator issues:

Verify your Operator subscription status.
Verify that you did not install a community version of the Operator, instead of the supported Red Hat version.
Verify that you have the cluster-admin role to install Red Hat OpenShift Service Mesh.
Check for any errors in the Operator pod logs if the issue is related to installation of Operators.

You can install Operators only through the OpenShift console, the OperatorHub is not accessible from the command line.

Viewing Operator pod logs

You can view Operator logs by using the oc logs command. Red Hat may request logs to help resolve support cases.

Procedure

To view Operator pod logs, enter the command:

$ oc logs -n openshift-operators <podName>

For example,

$ oc logs -n openshift-operators istio-operator-bb49787db-zgr87

Troubleshooting the control plane

The Service Mesh control plane is composed of Istiod, which consolidates several previous control plane components (Citadel, Galley, Pilot) into a single binary. Deploying the ServiceMeshControlPlane also creates the other components that make up Red Hat OpenShift Service Mesh as described in the architecture topic.

Validating the Service Mesh control plane installation

When you create the Service Mesh control plane, the Service Mesh Operator uses the parameters that you have specified in the ServiceMeshControlPlane resource file to do the following:

Creates the Istio components and deploys the following pods:
- istiod
- istio-ingressgateway
- istio-egressgateway
- grafana
- prometheus
- wasm-cacher
Calls the Kiali Operator to create Kaili deployment based on configuration in either the SMCP or the Kiali custom resource.

You view the Kiali components under the Kiali Operator, not the Service Mesh Operator.

Calls the Red Hat OpenShift distributed tracing platform Operator to create distributed tracing platform components based on configuration in either the SMCP or the Jaeger custom resource.

You view the Jaeger components under the Red Hat OpenShift distributed tracing platform Operator and the Elasticsearch components under the Red Hat Elasticsearch Operator, not the Service Mesh Operator.

From the OpenShift Container Platform console

You can verify the Service Mesh control plane installation in the OpenShift Container Platform web console.

Navigate to Operators → Installed Operators.
Select the <istio-system> namespace.
Select the Red Hat OpenShift Service Mesh Operator.
1. Click the Istio Service Mesh Control Plane tab.
2. Click the name of your control plane, for example basic.
3. To view the resources created by the deployment, click the Resources tab. You can use the filter to narrow your view, for example, to check that all the Pods have a status of running.
4. If the SMCP status indicates any problems, check the status: output in the YAML file for more information.
5. Navigate back to Operators → Installed Operators.
Select the OpenShift Elasticsearch Operator.
1. Click the Elasticsearch tab.
2. Click the name of the deployment, for example elasticsearch.
3. To view the resources created by the deployment, click the Resources tab. .
4. If the Status column any problems, check the status: output on the YAML tab for more information.
5. Navigate back to Operators → Installed Operators.
Select the Red Hat OpenShift distributed tracing platform Operator.
1. Click the Jaeger tab.
2. Click the name of your deployment, for example jaeger.
3. To view the resources created by the deployment, click the Resources tab.
4. If the Status column indicates any problems, check the status: output on the YAML tab for more information.
5. Navigate to Operators → Installed Operators.
Select the Kiali Operator.
1. Click the Istio Service Mesh Control Plane tab.
2. Click the name of your deployment, for example kiali.
3. To view the resources created by the deployment, click the Resources tab.
4. If the Status column any problems, check the status: output on the YAML tab for more information.

From the command line

Run the following command to see if the Service Mesh control plane pods are available and running, where istio-system is the namespace where you installed the SMCP.

$ oc get pods -n istio-system

Example output

NAME                                   READY   STATUS    RESTARTS   AGE
grafana-6776785cfc-6fz7t               2/2     Running   0          102s
istio-egressgateway-5f49dd99-l9ppq     1/1     Running   0          103s
istio-ingressgateway-6dc885c48-jjd8r   1/1     Running   0          103s
istiod-basic-6c9cc55998-wg4zq          1/1     Running   0          2m14s
jaeger-6865d5d8bf-zrfss                2/2     Running   0          100s
kiali-579799fbb7-8mwc8                 1/1     Running   0          46s
prometheus-5c579dfb-6qhjk              2/2     Running   0          115s
wasm-cacher-basic-5b99bfcddb-m775l     1/1     Running   0          86s

Check the status of the Service Mesh control plane deployment by using the following command. Replace istio-system with the namespace where you deployed the SMCP.
```
$ oc get smcp -n <istio-system>
```
The installation has finished successfully when the STATUS column is ComponentsReady.
Example output
```
NAME    READY   STATUS            PROFILES      VERSION   AGE
basic   10/10   ComponentsReady   ["default"]   2.1.3     4m2s
```
If you have modified and redeployed your Service Mesh control plane, the status should read UpdateSuccessful.
Example output
```
NAME            READY     STATUS             TEMPLATE   VERSION   AGE
basic-install   10/10     UpdateSuccessful   default     v1.1     3d16h
```
If the SMCP status indicates anything other than ComponentsReady check the status: output in the SCMP resource for more information.
```
$ oc describe smcp <smcp-name> -n <controlplane-namespace>
```
Example output
```
$ oc describe smcp basic -n istio-system
```
Check the status of the Jaeger deployment with the following command, where istio-system is the namespace where you deployed the SMCP.
```
$ oc get jaeger -n <istio-system>
```
Example output
```
NAME     STATUS    VERSION   STRATEGY   STORAGE   AGE
jaeger   Running   1.30.0    allinone   memory    15m
```
Check the status of the Kiali deployment with the following command, where istio-system is the namespace where you deployed the SMCP.
```
$ oc get kiali -n <istio-system>
```
Example output
```
NAME    AGE
kiali   15m
```

Accessing the Kiali console

You can view your application’s topology, health, and metrics in the Kiali console. If your service is experiencing problems, the Kiali console lets you view the data flow through your service. You can view insights about the mesh components at different levels, including abstract applications, services, and workloads. Kiali also provides an interactive graph view of your namespace in real time.

To access the Kiali console you must have Red Hat OpenShift Service Mesh installed, Kiali installed and configured.

The installation process creates a route to access the Kiali console.

If you know the URL for the Kiali console, you can access it directly. If you do not know the URL, use the following directions.

Procedure for administrators

Log in to the OpenShift Container Platform web console with an administrator role.
Click Home → Projects.
On the Projects page, if necessary, use the filter to find the name of your project.
Click the name of your project, for example, bookinfo.
On the Project details page, in the Launcher section, click the Kiali link.
Log in to the Kiali console with the same user name and password that you use to access the OpenShift Container Platform console.

When you first log in to the Kiali Console, you see the Overview page which displays all the namespaces in your service mesh that you have permission to view.

If you are validating the console installation and namespaces have not yet been added to the mesh, there might not be any data to display other than istio-system.

Procedure for developers

Log in to the OpenShift Container Platform web console with a developer role.
Click Project.
On the Project Details page, if necessary, use the filter to find the name of your project.
Click the name of your project, for example, bookinfo.
On the Project page, in the Launcher section, click the Kiali link.
Click Log In With OpenShift.

Accessing the Jaeger console

To access the Jaeger console you must have Red Hat OpenShift Service Mesh installed, Red Hat OpenShift distributed tracing platform installed and configured.

The installation process creates a route to access the Jaeger console.

If you know the URL for the Jaeger console, you can access it directly. If you do not know the URL, use the following directions.

Procedure from OpenShift console

Log in to the OpenShift Container Platform web console as a user with cluster-admin rights. If you use Red Hat OpenShift Dedicated, you must have an account with the dedicated-admin role.
Navigate to Networking → Routes.
On the Routes page, select the Service Mesh control plane project, for example istio-system, from the Namespace menu.

The Location column displays the linked address for each route.
If necessary, use the filter to find the jaeger route. Click the route Location to launch the console.
Click Log In With OpenShift.

Procedure from Kiali console

Launch the Kiali console.
Click Distributed Tracing in the left navigation pane.
Click Log In With OpenShift.

Procedure from the CLI

Log in to the OpenShift Container Platform CLI as a user with the cluster-admin role. If you use Red Hat OpenShift Dedicated, you must have an account with the dedicated-admin role.
```
$ oc login --username=<NAMEOFUSER> https://<HOSTNAME>:6443
```
To query for details of the route using the command line, enter the following command. In this example, istio-system is the Service Mesh control plane namespace.
```
$ export JAEGER_URL=$(oc get route -n istio-system jaeger -o jsonpath='{.spec.host}')
```
Launch a browser and navigate to https://<JAEGER_URL>, where <JAEGER_URL> is the route that you discovered in the previous step.
Log in using the same user name and password that you use to access the OpenShift Container Platform console.
If you have added services to the service mesh and have generated traces, you can use the filters and Find Traces button to search your trace data.

If you are validating the console installation, there is no trace data to display.

Troubleshooting the Service Mesh control plane

If you are experiencing issues while deploying the Service Mesh control plane,

Ensure that the ServiceMeshControlPlane resource is installed in a project that is separate from your services and Operators. This documentation uses the istio-system project as an example, but you can deploy your control plane in any project as long as it is separate from the project that contains your Operators and services.
Ensure that the ServiceMeshControlPlane and Jaeger custom resources are deployed in the same project. For example, use the istio-system project for both.

Troubleshooting the data plane

The data plane is a set of intelligent proxies that intercept and control all inbound and outbound network communications between services in the service mesh.

Red Hat OpenShift Service Mesh relies on a proxy sidecar within the application’s pod to provide service mesh capabilities to the application.

Troubleshooting sidecar injection

Red Hat OpenShift Service Mesh does not automatically inject proxy sidecars to pods. You must opt in to sidecar injection.

Troubleshooting Istio sidecar injection

Check to see if automatic injection is enabled in the Deployment for your application. If automatic injection for the Envoy proxy is enabled, there should be a sidecar.istio.io/inject:"true" annotation in the Deployment resource under spec.template.metadata.annotations.

Troubleshooting Jaeger agent sidecar injection

Check to see if automatic injection is enabled in the Deployment for your application. If automatic injection for the Jaeger agent is enabled, there should be a sidecar.jaegertracing.io/inject:"true" annotation in the Deployment resource.

For more information about sidecar injection, see Enabling automatic injection

Troubleshooting Envoy proxy

The Envoy proxy intercepts all inbound and outbound traffic for all services in the service mesh. Envoy also collects and reports telemetry on the service mesh. Envoy is deployed as a sidecar to the relevant service in the same pod.

Enabling Envoy access logs

Envoy access logs are useful in diagnosing traffic failures and flows, and help with end-to-end traffic flow analysis.

To enable access logging for all istio-proxy containers, edit the ServiceMeshControlPlane (SMCP) object to add a file name for the logging output.

Procedure

Log in to the OpenShift Container Platform CLI as a user with the cluster-admin role. Enter the following command. Then, enter your username and password when prompted.
```
$ oc login --username=<NAMEOFUSER> https://<HOSTNAME>:6443
```
Change to the project where you installed the Service Mesh control plane, for example istio-system.
```
$ oc project istio-system
```
Edit the ServiceMeshControlPlane file.
```
$ oc edit smcp <smcp_name>
```
As show in the following example, use name to specify the file name for the proxy log. If you do not specify a value for name, no log entries will be written.
```
spec:
  proxy:
    accessLogging:
      file:
        name: /dev/stdout     #file name
```

For more information about troubleshooting pod issues, see Investigating pod issues

Getting support

If you experience difficulty with a procedure described in this documentation, or with OpenShift Container Platform in general, visit the Red Hat Customer Portal. From the Customer Portal, you can:

Search or browse through the Red Hat Knowledgebase of articles and solutions relating to Red Hat products.
Submit a support case to Red Hat Support.
Access other product documentation.

To identify issues with your cluster, you can use Insights in OpenShift Cluster Manager. Insights provides details about issues and, if available, information on how to solve a problem.

If you have a suggestion for improving this documentation or have found an error, submit a Jira issue for the most relevant documentation component. Please provide specific details, such as the section name and OpenShift Container Platform version.

About the Red Hat Knowledgebase

The Red Hat Knowledgebase provides rich content aimed at helping you make the most of Red Hat’s products and technologies. The Red Hat Knowledgebase consists of articles, product documentation, and videos outlining best practices on installing, configuring, and using Red Hat products. In addition, you can search for solutions to known issues, each providing concise root cause descriptions and remedial steps.

Searching the Red Hat Knowledgebase

In the event of an OpenShift Container Platform issue, you can perform an initial search to determine if a solution already exists within the Red Hat Knowledgebase.

Prerequisites

You have a Red Hat Customer Portal account.

Procedure

Log in to the Red Hat Customer Portal.
In the main Red Hat Customer Portal search field, input keywords and strings relating to the problem, including:
- OpenShift Container Platform components (such as etcd)
- Related procedure (such as installation)
- Warnings, error messages, and other outputs related to explicit failures
Click Search.
Select the OpenShift Container Platform product filter.
Select the Knowledgebase content type filter.

About the must-gather tool

The oc adm must-gather CLI command collects the information from your cluster that is most likely needed for debugging issues, including:

Resource definitions
Service logs

By default, the oc adm must-gather command uses the default plug-in image and writes into ./must-gather.local.

Alternatively, you can collect specific information by running the command with the appropriate arguments as described in the following sections:

To collect data related to one or more specific features, use the --image argument with an image, as listed in a following section.

For example:
```
$ oc adm must-gather  --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel8:v4.9.0
```
To collect the audit logs, use the -- /usr/bin/gather_audit_logs argument, as described in a following section.

For example:
```
$ oc adm must-gather -- /usr/bin/gather_audit_logs
```
Audit logs are not collected as part of the default set of information to reduce the size of the files.

When you run oc adm must-gather, a new pod with a random name is created in a new project on the cluster. The data is collected on that pod and saved in a new directory that starts with must-gather.local. This directory is created in the current working directory.

For example:

NAMESPACE                      NAME                 READY   STATUS      RESTARTS      AGE
...
openshift-must-gather-5drcj    must-gather-bklx4    2/2     Running     0             72s
openshift-must-gather-5drcj    must-gather-s8sdh    2/2     Running     0             72s
...

About collecting service mesh data

You can use the oc adm must-gather CLI command to collect information about your cluster, including features and objects associated with Red Hat OpenShift Service Mesh.

Prerequisites

Access to the cluster as a user with the cluster-admin role.
The OpenShift Container Platform CLI (oc) installed.

Precedure

To collect Red Hat OpenShift Service Mesh data with must-gather, you must specify the Red Hat OpenShift Service Mesh image.
```
$ oc adm must-gather --image=registry.redhat.io/openshift-service-mesh/istio-must-gather-rhel8
```
To collect Red Hat OpenShift Service Mesh data for a specific Service Mesh control plane namespace with must-gather, you must specify the Red Hat OpenShift Service Mesh image and namespace. In this example, replace <namespace> with your Service Mesh control plane namespace, such as istio-system.
```
$ oc adm must-gather --image=registry.redhat.io/openshift-service-mesh/istio-must-gather-rhel8 gather <namespace>
```

For prompt support, supply diagnostic information for both OpenShift Container Platform and Red Hat OpenShift Service Mesh.

Submitting a support case

Prerequisites

You have installed the OpenShift CLI (oc).
You have a Red Hat Customer Portal account.
You have access to OpenShift Cluster Manager.

Procedure

Log in to the Red Hat Customer Portal and select SUPPORT CASES → Open a case.
Select the appropriate category for your issue (such as Defect / Bug), product (OpenShift Container Platform), and product version (4.7, if this is not already autofilled).
Review the list of suggested Red Hat Knowledgebase solutions for a potential match against the problem that is being reported. If the suggested articles do not address the issue, click Continue.
Enter a concise but descriptive problem summary and further details about the symptoms being experienced, as well as your expectations.
Review the updated list of suggested Red Hat Knowledgebase solutions for a potential match against the problem that is being reported. The list is refined as you provide more information during the case creation process. If the suggested articles do not address the issue, click Continue.
Ensure that the account information presented is as expected, and if not, amend accordingly.
Check that the autofilled OpenShift Container Platform Cluster ID is correct. If it is not, manually obtain your cluster ID.
- To manually obtain your cluster ID using the OpenShift Container Platform web console:
  1. Navigate to Home → Dashboards → Overview.
  2. Find the value in the Cluster ID field of the Details section.
- Alternatively, it is possible to open a new support case through the OpenShift Container Platform web console and have your cluster ID autofilled.
  1. From the toolbar, navigate to (?) Help → Open Support Case.
  2. The Cluster ID value is autofilled.
- To obtain your cluster ID using the OpenShift CLI (oc), run the following command:
  $ oc get clusterversion -o jsonpath='{.items[].spec.clusterID}{"\n"}'
Complete the following questions where prompted and then click Continue:
- Where are you experiencing the behavior? What environment?
- When does the behavior occur? Frequency? Repeatedly? At certain times?
- What information can you provide around time-frames and the business impact?
Upload relevant diagnostic data files and click Continue. It is recommended to include data gathered using the oc adm must-gather command as a starting point, plus any issue specific data that is not collected by that command.
Input relevant case management details and click Continue.
Preview the case details and click Submit.

Troubleshooting your service mesh

Understanding Service Mesh versions

Troubleshooting Operator installation

Validating Operator installation

Troubleshooting service mesh Operators

Viewing Operator pod logs

Troubleshooting the control plane

Validating the Service Mesh control plane installation

Accessing the Kiali console

Accessing the Jaeger console

Troubleshooting the Service Mesh control plane

Troubleshooting the data plane

Troubleshooting sidecar injection

Troubleshooting Istio sidecar injection

Troubleshooting Jaeger agent sidecar injection

Troubleshooting Envoy proxy

Enabling Envoy access logs

Getting support

About the Red Hat Knowledgebase

Searching the Red Hat Knowledgebase

About the must-gather tool

About collecting service mesh data

Submitting a support case