This section describes how to identify and resolve common problems in Red Hat OpenShift Service Mesh. Use the following sections to help troubleshoot and debug problems when deploying Red Hat OpenShift Service Mesh on OpenShift Container Platform.

Understanding Service Mesh versioning

The Red Hat OpenShift Service Mesh 2.0 Operator supports both v1 and v2 service meshes.

  • Operator version - The current Operator version is 2.0.8. This version number only indicates the version of the currently installed Operator. This version number is controlled by the intersection of the Update Channel and Approval Strategy specified in your Operator subscription. The version of the Operator does not determine which version of the ServiceMeshControlPlane resource is deployed. Upgrading to the latest Operator does not automatically upgrade your service mesh control plane to the latest version.

    Upgrading to the latest Operator version does not automatically upgrade your control plane to the latest version.

  • ServiceMeshControlPlane version - The same Operator supports multiple versions of the service mesh control plane. The service mesh control plane version controls the architecture and configuration settings that are used to install and deploy Red Hat OpenShift Service Mesh. To set or change the service mesh control plane version, you must deploy a new control plane. When you create the service mesh control plane you can select the version in one of two ways:

    • To configure in the Form View, select the version from the Control Plane Version menu.

    • To configure in the YAML View, set the value for spec.version in the YAML file.

  • Control Plane version - The version parameter specified within the SMCP resource file as spec.version. Supported versions are v1.1 and v2.0.

The Operator Lifecycle Manager (OLM) does not manage upgrades from v1 to v2, so the version number for your Operator and ServiceMeshControlPlane (SMCP) may not match, unless you have manually upgraded your SMCP.

Troubleshooting Operator installation

In addition to the information in this section, be sure to review the following topics:

Validating Operator installation

When you install the Red Hat OpenShift Service Mesh Operators, OpenShift automatically creates the following objects as part of a successful Operator installation:

  • config maps

  • custom resource definitions

  • deployments

  • pods

  • replica sets

  • roles

  • role bindings

  • secrets

  • service accounts

  • services

From the Openshift console

You can verify that the Operator pods are available and running by using the OpenShift Container Platform Console.

  1. Navigate to WorkloadsPods.

  2. Select the openshift-operators namespace.

  3. Verify that the following pods exist and have a status of running:

    • istio-operator

    • jaeger-operator

    • kiali-operator

  4. Select the openshift-operators-redhat namespace.

  5. Verify that the elasticsearch-operator pod exists and has a status of running.

From the command line
  1. Verify the Operator pods are available and running in the openshift-operators namespace with the following command:

    $ oc get pods -n openshift-operators
    Example output
    NAME                               READY   STATUS    RESTARTS   AGE
    istio-operator-bb49787db-zgr87     1/1     Running   0          15s
    jaeger-operator-7d5c4f57d8-9xphf   1/1     Running   0          2m42s
    kiali-operator-f9c8d84f4-7xh2v     1/1     Running   0          64s
  2. Verify the Elasticsearch operator with the following command:

    $ oc get pods -n openshift-operators-redhat
    Example output
    NAME                                      READY   STATUS    RESTARTS   AGE
    elasticsearch-operator-d4f59b968-796vq     1/1     Running   0          15s

Troubleshooting service mesh Operators

If you experience Operator issues:

  • Verify your Operator subscription status.

  • Verify that you did not install a community version of the Operator, instead of the supported Red Hat version.

  • Verify that you have the cluster-admin role to install Red Hat OpenShift Service Mesh.

  • Check for any errors in the Operator pod logs if the issue is related to installation of Operators.

You can install Operators only through the OpenShift console, the OperatorHub is not accessible from the command line.

Viewing Operator pod logs

You can view Operator logs by using the oc logs command. Red Hat may request logs to help resolve support cases.

Procedure
  • To view Operator pod logs, enter the command:

    $ oc logs -n openshift-operators <podName>

    For example,

    $ oc logs -n openshift-operators istio-operator-bb49787db-zgr87

Troubleshooting the control plane

The Service Mesh control plane is composed of Istiod, which consolidates several previous control plane components (Citadel, Galley, Pilot) into a single binary. Deploying the ServiceMeshControlPlane also creates the other components that make up Red Hat OpenShift Service Mesh as described in the architecture topic.

Validating the Service Mesh control plane installation

When you create the Service Mesh control plane, the Service Mesh Operator uses the parameters that you have specified in the ServiceMeshControlPlane resource file to do the following:

  • Creates the Istio components and deploys the following pods:

    • istiod

    • istio-ingressgateway

    • istio-egressgateway

    • grafana

    • prometheus

  • Calls the Kiali Operator to create Kaili deployment based on configuration in either the SMCP or the Kiali custom resource.

    You view the Kiali components under the Kiali Operator, not the Service Mesh Operator.

  • Calls the Jaeger Operator to create Jaeger components based on configuration in either the SMCP or the Jaeger custom resource.

    You view the Jaeger components under the Jaeger Operator and the Elasticsearch components under the Elasticsearch Operator, not the Service Mesh Operator.

    From the Openshift console

    You can verify the Service Mesh control plane installation in the OpenShift web console.

    1. Navigate to OperatorsInstalled Operators.

    2. Select the <istio-system> namespace.

    3. Select the Red Hat OpenShift Service Mesh Operator.

    4. Click the Istio Service Mesh Control Plane tab.

    5. Click the name of your control plane, for example basic.

    6. To view the resources created by the deployment, click the Resources tab. You can use the filter to narrow your view, for example, to check that all the Pods have a status of running.

    7. If the SMCP status indicates any problems, check the status: output in the YAML file for more information.

From the command line
  1. Execute the following command to see if the control plane pods are available and running, where istio-system is the namespace where you installed the SMCP.

    $ oc get pods -n istio-system
    Example output
    NAME                                    READY   STATUS    RESTARTS   AGE
    grafana-6c47888749-dsztv                2/2     Running   0          37s
    istio-egressgateway-85fdc5b466-dgqgt    1/1     Running   0          36s
    istio-ingressgateway-844f785b79-pxbvb   1/1     Running   0          37s
    istiod-basic-c89b5b4bb-5jh8b            1/1     Running   0          104s
    jaeger-6ff889f874-rz2nm                 2/2     Running   0          34s
    prometheus-578df79589-p7p9k             3/3     Running   0          69s
  2. Check the status of the control plane deployment with the following command, where istio-system is the namespace where you deployed the SMCP.

    $ oc get smcp -n <istio-system>

    The installation has finished successfully when the STATUS column is ComponentsReady.

    Example output
    NAME    READY   STATUS            PROFILES      VERSION   AGE
    basic   9/9     ComponentsReady   ["default"]   2.0.1.1   19m

    If you have modified and redeployed your control plane, the status should read UpdateSuccessful.

    Example output
    NAME            READY     STATUS             TEMPLATE   VERSION   AGE
    basic-install   9/9       UpdateSuccessful   default               v1.1          3d16h
  3. If the SMCP status indicates anything other than ComponentsReady check the status: output in the SCMP resource for more information.

    $ oc describe smcp <smcp-name> -n <controlplane-namespace>
    Example output
    $ oc describe smcp basic -n istio-system

Accessing the Kiali console

The installation process creates a route to access the Kiali console.

Procedure
  1. Log in to the OpenShift Container Platform console.

  2. Use the perspective switcher to switch to the Administrator perspective.

  3. Click HomeProjects.

  4. Click the name of your project. For example click bookinfo.

  5. In the Launcher section, click Kiali.

  6. Log in to the Kiali console with the same user name and password that you use to access the OpenShift Container Platform console.

When you first log in to the Kiali Console, you see the Overview page which displays all the namespaces in your service mesh that you have permission to view.

If you are validating the console installation, there might not be any data to display.

Accessing the Jaeger console

The installation process creates a route to access the Jaeger console.

Procedure
  1. Log in to the OpenShift Container Platform console.

  2. Navigate to NetworkingRoutes and search for the Jaeger route, which is the URL listed under Location.

  3. To query for details of the route using the command line, enter the following command. In this example, istio-system is the control plane namespace.

    $ export JAEGER_URL=$(oc get route -n istio-system jaeger -o jsonpath='{.spec.host}')
  4. Launch a browser and navigate to https://<JAEGER_URL>, where <JAEGER_URL> is the route that you discovered in the previous step.

  5. Log in using the same user name and password that you use to access the OpenShift Container Platform console.

  6. If you have added services to the service mesh and have generated traces, you can use the filters and Find Traces button to search your trace data.

    If you are validating the console installation, there is no trace data to display.

Troubleshooting the Service Mesh control plane

If you are experiencing issues while deploying the Service Mesh control plane,

  • Ensure that the ServiceMeshControlPlane resource is installed in a project that is separate from your services and Operators. This documentation uses the istio-system project as an example, but you can deploy your control plane in any project as long as it is separate from the project that contains your Operators and services.

  • Ensure that the ServiceMeshControlPlane and Jaeger custom resources are deployed in the same project. For example, use the istio-system project for both.

Troubleshooting the data plane

The data plane is a set of intelligent proxies that intercept and control all inbound and outbound network communications between services in the service mesh.

Red Hat OpenShift Service Mesh relies on a proxy sidecar within the application’s pod to provide service mesh capabilities to the application.

Troubleshooting sidecar injection

Red Hat OpenShift Service Mesh does not automatically inject proxy sidecars to pods. You must opt in to sidecar injection.

Troubleshooting Istio sidecar injection

Check to see if automatic injection is enabled in the Deployment for your application. If automatic injection for the Envoy proxy is enabled, there should be a sidecar.istio.io/inject:"true" annotation in the Deployment resource under spec.template.metadata.annotations.

Troubleshooting Jaeger agent sidecar injection

Check to see if automatic injection is enabled in the Deployment for your application. If automatic injection for the Jaeger agent is enabled, there should be a sidecar.jaegertracing.io/inject:"true" annotation in the Deployment resource.

For more information about sidecar injection, see Enabling automatic injection

Troubleshooting Envoy proxy

The Envoy proxy intercepts all inbound and outbound traffic for all services in the service mesh. Envoy also collects and reports telemetry on the service mesh. Envoy is deployed as a sidecar to the relevant service in the same pod.

Enabling Envoy access logs

Envoy access logs are useful in diagnosing traffic failures and flows, and help with end-to-end traffic flow analysis.

To enable access logging for all istio-proxy containers, edit the ServiceMeshControlPlane (SMCP) object to add a file name for the logging output.

Procedure
  1. Log in to the OpenShift Container Platform CLI as a user with the cluster-admin role. Enter the following command. Then, enter your username and password when prompted.

    $ oc login https://{HOSTNAME}:6443
  2. Change to the project where you installed the control plane, for example istio-system.

    $ oc project istio-system
  3. Edit the ServiceMeshControlPlane file.

    $ oc edit smcp <smcp_name>
  4. As show in the following example, use name to specify the file name for the proxy log. If you do not specify a value for name, no log entries will be written.

    spec:
      proxy:
        accessLogging:
          file:
            name: /dev/stdout     #file name

For more information about troubleshooting pod issues, see Investigating pod issues

Getting support

If you experience difficulty with a procedure described in this documentation, or with OpenShift Container Platform in general, visit the Red Hat Customer Portal. From the Customer Portal, you can:

  • Search or browse through the Red Hat Knowledgebase of articles and solutions relating to Red Hat products.

  • Submit a support case to Red Hat Support.

  • Access other product documentation.

To identify issues with your cluster, you can use Insights in Red Hat OpenShift Cluster Manager. Insights provides details about issues and, if available, information on how to solve a problem.

If you have a suggestion for improving this documentation or have found an error, please submit a Bugzilla report against the OpenShift Container Platform product for the Documentation component. Please provide specific details, such as the section name and OpenShift Container Platform version.

About the Red Hat Knowledgebase

The Red Hat Knowledgebase provides rich content aimed at helping you make the most of Red Hat’s products and technologies. The Red Hat Knowledgebase consists of articles, product documentation, and videos outlining best practices on installing, configuring, and using Red Hat products. In addition, you can search for solutions to known issues, each providing concise root cause descriptions and remedial steps.

Searching the Red Hat Knowledgebase

In the event of an OpenShift Container Platform issue, you can perform an initial search to determine if a solution already exists within the Red Hat Knowledgebase.

Prerequisites
  • You have a Red Hat Customer Portal account.

Procedure
  1. Log in to the Red Hat Customer Portal.

  2. In the main Red Hat Customer Portal search field, input keywords and strings relating to the problem, including:

    • OpenShift Container Platform components (such as etcd)

    • Related procedure (such as installation)

    • Warnings, error messages, and other outputs related to explicit failures

  3. Click Search.

  4. Select the OpenShift Container Platform product filter.

  5. Select the Knowledgebase content type filter.

About the must-gather tool

The oc adm must-gather CLI command collects the information from your cluster that is most likely needed for debugging issues, such as:

  • Resource definitions

  • Audit logs

  • Service logs

You can specify one or more images when you run the command by including the --image argument. When you specify an image, the tool collects data related to that feature or product.

When you run oc adm must-gather, a new pod is created on the cluster. The data is collected on that pod and saved in a new directory that starts with must-gather.local. This directory is created in the current working directory.

About collecting service mesh data

You can use the oc adm must-gather CLI command to collect information about your cluster, including features and objects associated with Red Hat OpenShift Service Mesh.

Prerequisites
  • Access to the cluster as a user with the cluster-admin role.

  • The OpenShift Container Platform CLI (oc) installed.

Precedure
  1. To collect Red Hat OpenShift Service Mesh data with must-gather, you must specify the Red Hat OpenShift Service Mesh image.

    $ oc adm must-gather --image=registry.redhat.io/openshift-service-mesh/istio-must-gather-rhel8
  2. To collect Red Hat OpenShift Service Mesh data for a specific control plane namespace with must-gather, you must specify the Red Hat OpenShift Service Mesh image and namespace. In this example, replace <namespace> with your control plane namespace, such as istio-system.

    $ oc adm must-gather --image=registry.redhat.io/openshift-service-mesh/istio-must-gather-rhel8 gather <namespace>

For prompt support, supply diagnostic information for both OpenShift Container Platform and Red Hat OpenShift Service Mesh.

Submitting a support case

Prerequisites
  • You have access to the cluster as a user with the cluster-admin role.

  • You have installed the OpenShift CLI (oc).

  • You have a Red Hat Customer Portal account.

  • You have a Red Hat standard or premium Subscription.

Procedure
  1. Log in to the Red Hat Customer Portal and select SUPPORT CASESOpen a case.

  2. Select the appropriate category for your issue (such as Defect / Bug), product (OpenShift Container Platform), and product version (4.6, if this is not already autofilled).

  3. Review the list of suggested Red Hat Knowledgebase solutions for a potential match against the problem that is being reported. If the suggested articles do not address the issue, click Continue.

  4. Enter a concise but descriptive problem summary and further details about the symptoms being experienced, as well as your expectations.

  5. Review the updated list of suggested Red Hat Knowledgebase solutions for a potential match against the problem that is being reported. The list is refined as you provide more information during the case creation process. If the suggested articles do not address the issue, click Continue.

  6. Ensure that the account information presented is as expected, and if not, amend accordingly.

  7. Check that the autofilled OpenShift Container Platform Cluster ID is correct. If it is not, manually obtain your cluster ID.

    • To manually obtain your cluster ID using the OpenShift Container Platform web console:

      1. Navigate to HomeDashboardsOverview.

      2. Find the value in the Cluster ID field of the Details section.

    • Alternatively, it is possible to open a new support case through the OpenShift Container Platform web console and have your cluster ID autofilled.

      1. From the toolbar, navigate to (?) HelpOpen Support Case.

      2. The Cluster ID value is autofilled.

    • To obtain your cluster ID using the OpenShift CLI (oc), run the following command:

      $ oc get clusterversion -o jsonpath='{.items[].spec.clusterID}{"\n"}'
  8. Complete the following questions where prompted and then click Continue:

    • Where are you experiencing the behavior? What environment?

    • When does the behavior occur? Frequency? Repeatedly? At certain times?

    • What information can you provide around time-frames and the business impact?

  9. Upload relevant diagnostic data files and click Continue. It is recommended to include data gathered using the oc adm must-gather command as a starting point, plus any issue specific data that is not collected by that command.

  10. Input relevant case management details and click Continue.

  11. Preview the case details and click Submit.