Troubleshooting Service Mesh - Service Mesh 2.x | Service Mesh

Understanding Service Mesh versions
Troubleshooting Operator installation
- Validating Operator installation
- Troubleshooting service mesh Operators
Troubleshooting the control plane
- Validating the Service Mesh control plane installation
- Troubleshooting the Service Mesh control plane
Troubleshooting the data plane
- Troubleshooting sidecar injection
- Troubleshooting Envoy proxy
Getting support

This section describes how to identify and resolve common problems in Red Hat OpenShift Service Mesh. Use the following sections to help troubleshoot and debug problems when deploying Red Hat OpenShift Service Mesh on Red Hat OpenShift Service on AWS.

edit

Understanding Service Mesh versions

In order to understand what version of Red Hat OpenShift Service Mesh you have deployed on your system, you need to understand how each of the component versions is managed.

Operator version - The most current Operator version is 2.6.5. The Operator version number only indicates the version of the currently installed Operator. Because the Red Hat OpenShift Service Mesh Operator supports multiple versions of the Service Mesh control plane, the version of the Operator does not determine the version of your deployed ServiceMeshControlPlane resources.

Upgrading to the latest Operator version automatically applies patch updates, but does not automatically upgrade your Service Mesh control plane to the latest minor version.
ServiceMeshControlPlane version - The ServiceMeshControlPlane version determines what version of Red Hat OpenShift Service Mesh you are using. The value of the spec.version field in the ServiceMeshControlPlane resource controls the architecture and configuration settings that are used to install and deploy Red Hat OpenShift Service Mesh. When you create the Service Mesh control plane you can set the version in one of two ways:
- To configure in the Form View, select the version from the Control Plane Version menu.
- To configure in the YAML View, set the value for spec.version in the YAML file.

Operator Lifecycle Manager (OLM) does not manage Service Mesh control plane upgrades, so the version number for your Operator and ServiceMeshControlPlane (SMCP) may not match, unless you have manually upgraded your SMCP.

edit

Troubleshooting Operator installation

edit

Validating Operator installation

When you install the Red Hat OpenShift Service Mesh Operators, OpenShift automatically creates the following objects as part of a successful Operator installation:

config maps
custom resource definitions
deployments
pods
replica sets
roles
role bindings
secrets
service accounts
services

From the Red Hat OpenShift Service on AWS console

You can verify that the Operator pods are available and running by using the Red Hat OpenShift Service on AWS console.

Navigate to Workloads → Pods.
Select the openshift-operators namespace.
Verify that the following pods exist and have a status of running:
- istio-operator
- jaeger-operator
- kiali-operator
Select the openshift-operators-redhat namespace.
Verify that the elasticsearch-operator pod exists and has a status of running.

From the command line

Verify the Operator pods are available and running in the openshift-operators namespace with the following command:

$ oc get pods -n openshift-operators

Example output

NAME                               READY   STATUS    RESTARTS   AGE
istio-operator-bb49787db-zgr87     1/1     Running   0          15s
jaeger-operator-7d5c4f57d8-9xphf   1/1     Running   0          2m42s
kiali-operator-f9c8d84f4-7xh2v     1/1     Running   0          64s

Verify the Elasticsearch operator with the following command:

$ oc get pods -n openshift-operators-redhat

Example output

NAME                                      READY   STATUS    RESTARTS   AGE
elasticsearch-operator-d4f59b968-796vq     1/1     Running   0          15s

Troubleshooting service mesh Operators

If you experience Operator issues:

Verify your Operator subscription status.
Verify that you did not install a community version of the Operator, instead of the supported Red Hat version.
Verify that you have the cluster-admin role to install Red Hat OpenShift Service Mesh.
Check for any errors in the Operator pod logs if the issue is related to installation of Operators.

You can install Operators only through the OpenShift console, the OperatorHub is not accessible from the command line.

edit

Viewing Operator pod logs

You can view Operator logs by using the oc logs command. Red Hat may request logs to help resolve support cases.

Procedure

To view Operator pod logs, enter the command:

$ oc logs -n openshift-operators <podName>

For example,

$ oc logs -n openshift-operators istio-operator-bb49787db-zgr87

Troubleshooting the control plane

The Service Mesh control plane is composed of Istiod, which consolidates several previous control plane components (Citadel, Galley, Pilot) into a single binary. Deploying the ServiceMeshControlPlane also creates the other components that make up Red Hat OpenShift Service Mesh as described in the architecture topic.

edit

Validating the Service Mesh control plane installation

When you create the Service Mesh control plane, the Service Mesh Operator uses the parameters that you have specified in the ServiceMeshControlPlane resource file to do the following:

Creates the Istio components and deploys the following pods:
- istiod
- istio-ingressgateway
- istio-egressgateway
- grafana
- prometheus
Calls the Kiali Operator to create Kaili deployment based on configuration in either the SMCP or the Kiali custom resource.

You view the Kiali components under the Kiali Operator, not the Service Mesh Operator.

Calls the Red Hat OpenShift distributed tracing platform (Jaeger) Operator to create distributed tracing platform (Jaeger) components based on configuration in either the SMCP or the Jaeger custom resource.

You view the Jaeger components under the Red Hat OpenShift distributed tracing platform (Jaeger) Operator and the Elasticsearch components under the Red Hat Elasticsearch Operator, not the Service Mesh Operator.

From the Red Hat OpenShift Service on AWS console

You can verify the Service Mesh control plane installation in the Red Hat OpenShift Service on AWS web console.

Navigate to Operators → Installed Operators.
Select the istio-system namespace.
Select the Red Hat OpenShift Service Mesh Operator.
1. Click the Istio Service Mesh Control Plane tab.
2. Click the name of your control plane, for example basic.
3. To view the resources created by the deployment, click the Resources tab. You can use the filter to narrow your view, for example, to check that all the Pods have a status of running.
4. If the SMCP status indicates any problems, check the status: output in the YAML file for more information.
Navigate back to Operators → Installed Operators.
Select the OpenShift Elasticsearch Operator.
1. Click the Elasticsearch tab.
2. Click the name of the deployment, for example elasticsearch.
3. To view the resources created by the deployment, click the Resources tab. .
4. If the Status column any problems, check the status: output on the YAML tab for more information.
Navigate back to Operators → Installed Operators.
Select the Red Hat OpenShift distributed tracing platform (Jaeger) Operator.
1. Click the Jaeger tab.
2. Click the name of your deployment, for example jaeger.
3. To view the resources created by the deployment, click the Resources tab.
4. If the Status column indicates any problems, check the status: output on the YAML tab for more information.
Navigate to Operators → Installed Operators.
Select the Kiali Operator.
1. Click the Istio Service Mesh Control Plane tab.
2. Click the name of your deployment, for example kiali.
3. To view the resources created by the deployment, click the Resources tab.
4. If the Status column any problems, check the status: output on the YAML tab for more information.

From the command line

Run the following command to see if the Service Mesh control plane pods are available and running, where istio-system is the namespace where you installed the SMCP.

$ oc get pods -n istio-system

Example output

NAME                                   READY   STATUS    RESTARTS   AGE
grafana-6776785cfc-6fz7t               2/2     Running   0          102s
istio-egressgateway-5f49dd99-l9ppq     1/1     Running   0          103s
istio-ingressgateway-6dc885c48-jjd8r   1/1     Running   0          103s
istiod-basic-6c9cc55998-wg4zq          1/1     Running   0          2m14s
jaeger-6865d5d8bf-zrfss                2/2     Running   0          100s
kiali-579799fbb7-8mwc8                 1/1     Running   0          46s
prometheus-5c579dfb-6qhjk              2/2     Running   0          115s

Check the status of the Service Mesh control plane deployment by using the following command. Replace istio-system with the namespace where you deployed the SMCP.
```
$ oc get smcp -n istio-system
```
The installation has finished successfully when the STATUS column is ComponentsReady.
Example output
```
NAME    READY   STATUS            PROFILES      VERSION   AGE
basic   10/10   ComponentsReady   ["default"]   2.1.3     4m2s
```
If you have modified and redeployed your Service Mesh control plane, the status should read UpdateSuccessful.
Example output
```
NAME            READY     STATUS             TEMPLATE   VERSION   AGE
basic-install   10/10     UpdateSuccessful   default     v1.1     3d16h
```
If the SMCP status indicates anything other than ComponentsReady check the status: output in the SCMP resource for more information.
```
$ oc describe smcp <smcp-name> -n <controlplane-namespace>
```
Example output
```
$ oc describe smcp basic -n istio-system
```
Check the status of the Jaeger deployment with the following command, where istio-system is the namespace where you deployed the SMCP.
```
$ oc get jaeger -n istio-system
```
Example output
```
NAME     STATUS    VERSION   STRATEGY   STORAGE   AGE
jaeger   Running   1.30.0    allinone   memory    15m
```
Check the status of the Kiali deployment with the following command, where istio-system is the namespace where you deployed the SMCP.
```
$ oc get kiali -n istio-system
```
Example output
```
NAME    AGE
kiali   15m
```

edit

Accessing the Kiali console

You can view your application’s topology, health, and metrics in the Kiali console. If your service is experiencing problems, the Kiali console lets you view the data flow through your service. You can view insights about the mesh components at different levels, including abstract applications, services, and workloads. Kiali also provides an interactive graph view of your namespace in real time.

To access the Kiali console you must have Red Hat OpenShift Service Mesh installed, Kiali installed and configured.

The installation process creates a route to access the Kiali console.

If you know the URL for the Kiali console, you can access it directly. If you do not know the URL, use the following directions.

Procedure for administrators

Log in to the Red Hat OpenShift Service on AWS web console with an administrator role.
Click Home → Projects.
On the Projects page, if necessary, use the filter to find the name of your project.
Click the name of your project, for example, bookinfo.
On the Project details page, in the Launcher section, click the Kiali link.
Log in to the Kiali console with the same user name and password that you use to access the Red Hat OpenShift Service on AWS console.

When you first log in to the Kiali Console, you see the Overview page which displays all the namespaces in your service mesh that you have permission to view.

If you are validating the console installation and namespaces have not yet been added to the mesh, there might not be any data to display other than istio-system.

Procedure for developers

Log in to the Red Hat OpenShift Service on AWS web console with a developer role.
Click Project.
On the Project Details page, if necessary, use the filter to find the name of your project.
Click the name of your project, for example, bookinfo.
On the Project page, in the Launcher section, click the Kiali link.
Click Log In With OpenShift.

edit

Accessing the Jaeger console

To access the Jaeger console you must have Red Hat OpenShift Service Mesh installed, Red Hat OpenShift distributed tracing platform (Jaeger) installed and configured.

The installation process creates a route to access the Jaeger console.

If you know the URL for the Jaeger console, you can access it directly. If you do not know the URL, use the following directions.

Starting with Red Hat OpenShift Service Mesh 2.5, Red Hat OpenShift distributed tracing platform (Jaeger) and OpenShift Elasticsearch Operator have been deprecated and will be removed in a future release. Red Hat will provide bug fixes and support for this feature during the current release lifecycle, but this feature will no longer receive enhancements and will be removed. As an alternative to Red Hat OpenShift distributed tracing platform (Jaeger), you can use Red Hat OpenShift distributed tracing platform (Tempo) instead.

Procedure from OpenShift console

Log in to the Red Hat OpenShift Service on AWS web console as a user with cluster-admin rights. If you use Red Hat OpenShift Dedicated, you must have an account with the dedicated-admin role.
Navigate to Networking → Routes.
On the Routes page, select the Service Mesh control plane project, for example istio-system, from the Namespace menu.

The Location column displays the linked address for each route.
If necessary, use the filter to find the jaeger route. Click the route Location to launch the console.
Click Log In With OpenShift.

Procedure from Kiali console

Launch the Kiali console.
Click Distributed Tracing in the left navigation pane.
Click Log In With OpenShift.

Procedure from the CLI

Log in to the Red Hat OpenShift Service on AWS CLI as a user with the cluster-admin role. If you use Red Hat OpenShift Dedicated, you must have an account with the dedicated-admin role.
```
$ oc login --username=<NAMEOFUSER> https://<HOSTNAME>:6443
```
To query for details of the route using the command line, enter the following command. In this example, istio-system is the Service Mesh control plane namespace.
```
$ oc get route -n istio-system jaeger -o jsonpath='{.spec.host}'
```
Launch a browser and navigate to https://<JAEGER_URL>, where <JAEGER_URL> is the route that you discovered in the previous step.
Log in using the same user name and password that you use to access the Red Hat OpenShift Service on AWS console.
If you have added services to the service mesh and have generated traces, you can use the filters and Find Traces button to search your trace data.

If you are validating the console installation, there is no trace data to display.

edit

Troubleshooting the Service Mesh control plane

If you are experiencing issues while deploying the Service Mesh control plane,

Ensure that the ServiceMeshControlPlane resource is installed in a project that is separate from your services and Operators. This documentation uses the istio-system project as an example, but you can deploy your control plane in any project as long as it is separate from the project that contains your Operators and services.
Ensure that the ServiceMeshControlPlane and Jaeger custom resources are deployed in the same project. For example, use the istio-system project for both.

Troubleshooting the data plane

The data plane is a set of intelligent proxies that intercept and control all inbound and outbound network communications between services in the service mesh.

Red Hat OpenShift Service Mesh relies on a proxy sidecar within the application’s pod to provide service mesh capabilities to the application.

Troubleshooting sidecar injection

Red Hat OpenShift Service Mesh does not automatically inject proxy sidecars to pods. You must opt in to sidecar injection.

edit

Troubleshooting Istio sidecar injection

Check to see if automatic injection is enabled in the Deployment for your application. If automatic injection for the Envoy proxy is enabled, there should be a sidecar.istio.io/inject:"true" annotation in the Deployment resource under spec.template.metadata.annotations.

Troubleshooting Jaeger agent sidecar injection

Check to see if automatic injection is enabled in the Deployment for your application. If automatic injection for the Jaeger agent is enabled, there should be a sidecar.jaegertracing.io/inject:"true" annotation in the Deployment resource.

For more information about sidecar injection, see Enabling automatic injection

Troubleshooting Envoy proxy

The Envoy proxy intercepts all inbound and outbound traffic for all services in the service mesh. Envoy also collects and reports telemetry on the service mesh. Envoy is deployed as a sidecar to the relevant service in the same pod.

edit

Enabling Envoy access logs

Envoy access logs are useful in diagnosing traffic failures and flows, and help with end-to-end traffic flow analysis.

To enable access logging for all istio-proxy containers, edit the ServiceMeshControlPlane (SMCP) object to add a file name for the logging output.

Procedure

Log in to the OpenShift Container Platform CLI as a user with the cluster-admin role. Enter the following command. Then, enter your username and password when prompted.
```
$ oc login --username=<NAMEOFUSER> https://<HOSTNAME>:6443
```
Change to the project where you installed the Service Mesh control plane, for example istio-system.
```
$ oc project istio-system
```
Edit the ServiceMeshControlPlane file.
```
$ oc edit smcp <smcp_name>
```

As show in the following example, use name to specify the file name for the proxy log. If you do not specify a value for name, no log entries will be written.

apiVersion: maistra.io/v2
kind: ServiceMeshControlPlane
metadata:
  name: basic
  namespace: istio-system
spec:
  proxy:
    accessLogging:
      file:
        name: /dev/stdout     #file name

edit

Getting support

If you experience difficulty with a procedure described in this documentation, or with Red Hat OpenShift Service on AWS in general, visit the Red Hat Customer Portal.

From the Customer Portal, you can:

Search or browse through the Red Hat Knowledgebase of articles and solutions relating to Red Hat products.
Submit a support case to Red Hat Support.
Access other product documentation.

To identify issues with your cluster, you can use Insights in OpenShift Cluster Manager. Insights provides details about issues and, if available, information on how to solve a problem.

If you have a suggestion for improving this documentation or have found an error, submit a Jira issue for the most relevant documentation component. Please provide specific details, such as the section name and Red Hat OpenShift Service on AWS version.

edit

About the Red Hat Knowledgebase

The Red Hat Knowledgebase provides rich content aimed at helping you make the most of Red Hat’s products and technologies. The Red Hat Knowledgebase consists of articles, product documentation, and videos outlining best practices on installing, configuring, and using Red Hat products. In addition, you can search for solutions to known issues, each providing concise root cause descriptions and remedial steps.

edit

Searching the Red Hat Knowledgebase

In the event of an Red Hat OpenShift Service on AWS issue, you can perform an initial search to determine if a solution already exists within the Red Hat Knowledgebase.

Prerequisites

You have a Red Hat Customer Portal account.

Procedure

Log in to the Red Hat Customer Portal.
Click Search.
In the search field, input keywords and strings relating to the problem, including:
- Red Hat OpenShift Service on AWS components (such as etcd)
- Related procedure (such as installation)
- Warnings, error messages, and other outputs related to explicit failures
Click the Enter key.
Optional: Select the Red Hat OpenShift Service on AWS product filter.
Optional: Select the Documentation content type filter.

edit

Submitting a support case

Prerequisites

You have access to the cluster as a user with the dedicated-admin role.
You have installed the OpenShift CLI (oc).
You have access to the Red Hat OpenShift Cluster Manager.

Procedure

Log in to the Customer Support page of the Red Hat Customer Portal.
Click Get support.
On the Cases tab of the Customer Support page:
1. Optional: Change the pre-filled account and owner details if needed.
2. Select the appropriate category for your issue, such as Bug or Defect, and click Continue.
Enter the following information:
1. In the Summary field, enter a concise but descriptive problem summary and further details about the symptoms being experienced, as well as your expectations.
2. Select Red Hat OpenShift Service on AWS from the Product drop-down menu.
Review the list of suggested Red Hat Knowledgebase solutions for a potential match against the problem that is being reported. If the suggested articles do not address the issue, click Continue.
Review the updated list of suggested Red Hat Knowledgebase solutions for a potential match against the problem that is being reported. The list is refined as you provide more information during the case creation process. If the suggested articles do not address the issue, click Continue.
Ensure that the account information presented is as expected, and if not, amend accordingly.
Check that the autofilled Red Hat OpenShift Service on AWS Cluster ID is correct. If it is not, manually obtain your cluster ID.
- To manually obtain your cluster ID using the Red Hat OpenShift Service on AWS web console:
  1. Navigate to Home → Overview.
  2. Find the value in the Cluster ID field of the Details section.
- Alternatively, it is possible to open a new support case through the Red Hat OpenShift Service on AWS web console and have your cluster ID autofilled.
  1. From the toolbar, navigate to (?) Help → Open Support Case.
  2. The Cluster ID value is autofilled.
- To obtain your cluster ID using the OpenShift CLI (oc), run the following command:
  $ oc get clusterversion -o jsonpath='{.items[].spec.clusterID}{"\n"}'
Complete the following questions where prompted and then click Continue:
- What are you experiencing? What are you expecting to happen?
- Define the value or impact to you or the business.
- Where are you experiencing this behavior? What environment?
- When does this behavior occur? Frequency? Repeatedly? At certain times?
Upload relevant diagnostic data files and click Continue.
Input relevant case management details and click Continue.
Preview the case details and click Submit.

Troubleshooting your service mesh

Understanding Service Mesh versions

Troubleshooting Operator installation

Validating Operator installation

Troubleshooting service mesh Operators

Viewing Operator pod logs

Troubleshooting the control plane

Validating the Service Mesh control plane installation

Accessing the Kiali console

Accessing the Jaeger console

Troubleshooting the Service Mesh control plane

Troubleshooting the data plane

Troubleshooting sidecar injection

Troubleshooting Istio sidecar injection

Troubleshooting Jaeger agent sidecar injection

Troubleshooting Envoy proxy

Enabling Envoy access logs

Getting support

About the Red Hat Knowledgebase

Searching the Red Hat Knowledgebase

Submitting a support case