×

OpenShift Virtualization has alerts that inform you when a problem occurs. Critical alerts require immediate attention.

Each alert has a corresponding description of the problem, a reason for why the alert is occurring, a troubleshooting process to diagnose the source of the problem, and steps for resolving the alert.

Network alerts

Network alerts provide information about problems for the OpenShift Virtualization Network Operator.

KubeMacPoolDown alert

Description

The KubeMacPool component allocates MAC addresses and prevents MAC address conflicts.

Reason

If the KubeMacPool-manager pod is down, then the creation of VirtualMachine objects fails.

Troubleshoot
  1. Determine the Kubemacpool-manager pod namespace and name.

    $ export KMP_NAMESPACE="$(oc get pod -A --no-headers -l control-plane=mac-controller-manager | awk '{print $1}')"
    $ export KMP_NAME="$(oc get pod -A --no-headers -l control-plane=mac-controller-manager | awk '{print $2}')"
  2. Check the Kubemacpool-manager pod description and logs to determine the source of the problem.

    $ oc describe pod -n $KMP_NAMESPACE $KMP_NAME
    $ oc logs -n $KMP_NAMESPACE $KMP_NAME
Resolution

Open a support issue and provide the information gathered in the troubleshooting process.

SSP alerts

SSP alerts provide information about problems for the OpenShift Virtualization SSP Operator.

SSPFailingToReconcile alert

Description

The SSP Operator’s pod is up, but the pod’s reconcile cycle consistently fails. This failure includes failure to update the resources for which it is responsible, failure to deploy the template validator, or failure to deploy or update the common templates.

Reason

If the SSP Operator fails to reconcile, then the deployment of dependent components fails, reconciliation of component changes fails, or both. Additionally, the updates to the common templates and template validator reset and fail.

Troubleshoot
  1. Check the ssp-operator pod’s logs for errors:

    $ export NAMESPACE="$(oc get deployment -A | grep ssp-operator | awk '{print $1}')"
    $ oc -n $NAMESPACE describe pods -l control-plane=ssp-operator
    $ oc -n $NAMESPACE logs --tail=-1 -l control-plane=ssp-operator
  2. Verify that the template validator is up. If the template validator is not up, then check the pod’s logs for errors.

    $ export NAMESPACE="$($ oc get deployment -A | grep ssp-operator | awk '{print $1}')"
    $ oc -n $NAMESPACE get pods -l name=virt-template-validator
    $ oc -n $NAMESPACE describe pods -l name=virt-template-validator
    $ oc -n $NAMESPACE logs --tail=-1 -l name=virt-template-validator
Resolution

Open a support issue and provide the information gathered in the troubleshooting process.

SSPOperatorDown alert

Description

The SSP Operator deploys and reconciles the common templates and the template validator.

Reason

If the SSP Operator is down, then the deployment of dependent components fails, reconciliation of component changes fails, or both. Additionally, the updates to the common template and template validator reset and fail.

Troubleshoot
  1. Check ssp-operator’s pod namespace:

    $ export NAMESPACE="$(oc get deployment -A | grep ssp-operator | awk '{print $1}')"
  2. Verify that the ssp-operator’s pod is currently down.

    $ oc -n $NAMESPACE get pods -l control-plane=ssp-operator
  3. Check the ssp-operator’s pod description and logs.

    $ oc -n $NAMESPACE describe pods -l control-plane=ssp-operator
    $ oc -n $NAMESPACE logs --tail=-1 -l control-plane=ssp-operator
Resolution

Open a support issue and provide the information gathered in the troubleshooting process.

SSPTemplateValidatorDown alert

Description

The template validator validates that virtual machines (VMs) do not violate their assigned templates.

Reason

If every template validator pod is down, then the template validator fails to validate VMs against their assigned templates.

Troubleshoot
  1. Check the namespaces of the ssp-operator pods and the virt-template-validator pods.

    $ export NAMESPACE_SSP="$(oc get deployment -A | grep ssp-operator | awk '{print $1}')"
    $ export NAMESPACE="$(oc get deployment -A | grep virt-template-validator | awk '{print $1}')"
  2. Verify that the virt-template-validator’s pod is currently down.

    $ oc -n $NAMESPACE get pods -l name=virt-template-validator
  3. Check the pod description and logs of the ssp-operator and the virt-template-validator.

    $ oc -n $NAMESPACE_SSP describe pods -l name=ssp-operator
    $ oc -n $NAMESPACE_SSP logs --tail=-1 -l name=ssp-operator
    $ oc -n $NAMESPACE describe pods -l name=virt-template-validator
    $ oc -n $NAMESPACE logs --tail=-1 -l name=virt-template-validator
Resolution

Open a support issue and provide the information gathered in the troubleshooting process.

Virt alerts

Virt alerts provide information about problems for the OpenShift Virtualization Virt Operator.

NoLeadingVirtOperator alert

Description

In the past 10 minutes, no virt-operator pod holds the leader lease, despite one or more virt-operator pods being in Ready state. The alert suggests no operating virt-operator pod exists.

Reason

The virt-operator is the first Kubernetes Operator active in a OpenShift Container Platform cluster. Its primary responsibilities are:

  • Installation

  • Live-update

  • Live-upgrade of a cluster

  • Monitoring the lifecycle of top-level controllers such as virt-controller, virt-handler, and virt-launcher

  • Managing the reconciliation of top-level controllers

In addition, the virt-operator is responsible for cluster-wide tasks such as certificate rotation and some infrastructure management.

The virt-operator deployment has a default replica of two pods with one leader pod holding a leader lease, indicating an operating virt-operator pod.

This alert indicates a failure at the cluster level. Critical cluster-wide management functionalities such as certification rotation, upgrade, and reconciliation of controllers may be temporarily unavailable.

Troubleshoot

Determine a virt-operator pod’s leader status from the pod logs. The log messages containing Started leading and acquire leader indicate the leader status of a given virt-operator pod.

Additionally, always check if there are any running virt-operator pods and the pods' statuses with these commands:

$ export NAMESPACE="$(oc get kubevirt -A -o custom-columns="":.metadata.namespace)"
$ oc -n $NAMESPACE get pods -l kubevirt.io=virt-operator
$ oc -n $NAMESPACE logs <pod-name>
$ oc -n $NAMESPACE describe pod <pod-name>

Leader pod example:

$ oc -n $NAMESPACE logs <pod-name> |grep lead
Example output
{"component":"virt-operator","level":"info","msg":"Attempting to acquire leader status","pos":"application.go:400","timestamp":"2021-11-30T12:15:18.635387Z"}
I1130 12:15:18.635452       1 leaderelection.go:243] attempting to acquire leader lease <namespace>/virt-operator...
I1130 12:15:19.216582       1 leaderelection.go:253] successfully acquired lease <namespace>/virt-operator
{"component":"virt-operator","level":"info","msg":"Started leading","pos":"application.go:385","timestamp":"2021-11-30T12:15:19.216836Z"}

Non-leader pod example:

$ oc -n $NAMESPACE logs <pod-name> |grep lead
Example output
{"component":"virt-operator","level":"info","msg":"Attempting to acquire leader status","pos":"application.go:400","timestamp":"2021-11-30T12:15:20.533696Z"}
I1130 12:15:20.533792       1 leaderelection.go:243] attempting to acquire leader lease <namespace>/virt-operator...
Resolution

There are several reasons for no virt-operator pod holding the leader lease, despite one or more virt-operator pods being in Ready state. Identify the root cause and take appropriate action.

Otherwise, open a support issue and provide the information gathered in the troubleshooting process.

NoReadyVirtController alert

Description

The virt-controller monitors virtual machine instances (VMIs). The virt-controller also manages the associated pods by creating and managing the lifecycle of the pods associated with the VMI objects.

A VMI object always associates with a pod during its lifetime. However, the pod instance can change over time because of VMI migration.

This alert occurs when detection of no ready virt-controllers occurs for five minutes.

Reason

If the virt-controller fails, then VM lifecycle management completely fails. Lifecycle management tasks include launching a new VMI or shutting down an existing VMI.

Troubleshoot
  1. Check the vdeployment status of the virt-controller for available replicas and conditions.

    $ oc -n $NAMESPACE get deployment virt-controller -o yaml
  2. Check if the virt-controller pods exist and check their statuses.

    get pods -n $NAMESPACE |grep virt-controller
  3. Check the virt-controller pods' events.

    $ oc -n $NAMESPACE describe pods <virt-controller pod>
  4. Check the virt-controller pods' logs.

    <