×

Understanding low latency

The emergence of Edge computing in the area of Telco / 5G plays a key role in reducing latency and congestion problems and improving application performance.

Simply put, latency determines how fast data (packets) moves from the sender to receiver and returns to the sender after processing by the receiver. Obviously, maintaining a network architecture with the lowest possible delay of latency speeds is key for meeting the network performance requirements of 5G. Compared to 4G technology, with an average latency of 50ms, 5G is targeted to reach latency numbers of 1ms or less. This reduction in latency boosts wireless throughput by a factor of 10.

Many of the deployed applications in the Telco space require low latency that can only tolerate zero packet loss. Tuning for zero packet loss helps mitigate the inherent issues that degrade network performance. For more information, see Tuning for Zero Packet Loss in Red Hat OpenStack Platform (RHOSP).

The Edge computing initiative also comes in to play for reducing latency rates. Think of it as literally being on the edge of the cloud and closer to the user. This greatly reduces the distance between the user and distant data centers, resulting in reduced application response times and performance latency.

Administrators must be able to manage their many Edge sites and local services in a centralized way so that all of the deployments can run at the lowest possible management cost. They also need an easy way to deploy and configure certain nodes of their cluster for real-time low latency and high-performance purposes. Low latency nodes are useful for applications such as Cloud-native Network Functions (CNF) and Data Plane Development Kit (DPDK).

OpenShift Container Platform currently provides mechanisms to tune software on an OpenShift Container Platform cluster for real-time running and low latency (around <20 microseconds reaction time). This includes tuning the kernel and OpenShift Container Platform set values, installing a kernel, and reconfiguring the machine. But this method requires setting up four different Operators and performing many configurations that, when done manually, is complex and could be prone to mistakes.

OpenShift Container Platform provides a Performance Addon Operator to implement automatic tuning to achieve low latency performance for OpenShift applications. The cluster administrator uses this performance profile configuration that makes it easier to make these changes in a more reliable way. The administrator can specify whether to update the kernel to kernel-rt, reserve CPUs for cluster and operating system housekeeping duties, including pod infra containers, and isolate CPUs for application containers to run the workloads.

About hyperthreading for low latency and real-time applications

Hyperthreading is an Intel processor technology that allows a physical CPU processor core to function as two logical cores, executing two independent threads simultaneously. Hyperthreading allows for better system throughput for certain workload types where parallel processing is beneficial. The default OpenShift Container Platform configuration expects hyperthreading to be enabled by default.

For telecommunications applications, it is important to design your application infrastructure to minimize latency as much as possible. Hyperthreading can slow performance times and negatively affect throughput for compute intensive workloads that require low latency. Disabling hyperthreading ensures predictable performance and can decrease processing times for these workloads.

Hyperthreading implementation and configuration differs depending on the hardware you are running OpenShift Container Platform on. Consult the relevant host hardware tuning information for more details of the hyperthreading implementation specific to that hardware. Disabling hyperthreading can increase the cost per core of the cluster.

Installing the Performance Addon Operator

Performance Addon Operator provides the ability to enable advanced node performance tunings on a set of nodes. As a cluster administrator, you can install Performance Addon Operator using the OpenShift Container Platform CLI or the web console.

Installing the Operator using the CLI

As a cluster administrator, you can install the Operator using the CLI.

Prerequisites
  • A cluster installed on bare-metal hardware.

  • Install the OpenShift CLI (oc).

  • Log in as a user with cluster-admin privileges.

Procedure
  1. Create a namespace for the Performance Addon Operator by completing the following actions:

    1. Create the following Namespace Custom Resource (CR) that defines the openshift-performance-addon-operator namespace, and then save the YAML in the pao-namespace.yaml file:

      apiVersion: v1
      kind: Namespace
      metadata:
        name: openshift-performance-addon-operator
        annotations:
          workload.openshift.io/allowed: management
    2. Create the namespace by running the following command:

      $ oc create -f pao-namespace.yaml
  2. Install the Performance Addon Operator in the namespace you created in the previous step by creating the following objects:

    1. Create the following OperatorGroup CR and save the YAML in the pao-operatorgroup.yaml file:

      apiVersion: operators.coreos.com/v1
      kind: OperatorGroup
      metadata:
        name: openshift-performance-addon-operator
        namespace: openshift-performance-addon-operator
    2. Create the OperatorGroup CR by running the following command:

      $ oc create -f pao-operatorgroup.yaml
    3. Run the following command to get the channel value required for the next step.

      $ oc get packagemanifest performance-addon-operator -n openshift-marketplace -o jsonpath='{.status.defaultChannel}'
      Example output
      4.9
    4. Create the following Subscription CR and save the YAML in the pao-sub.yaml file:

      Example Subscription
      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: openshift-performance-addon-operator-subscription
        namespace: openshift-performance-addon-operator
      spec:
        channel: "<channel>" (1)
        name: performance-addon-operator
        source: redhat-operators (2)
        sourceNamespace: openshift-marketplace
      1 Specify the value from you obtained in the previous step for the .status.defaultChannel parameter.
      2 You must specify the redhat-operators value.
    5. Create the Subscription object by running the following command:

      $ oc create -f pao-sub.yaml
    6. Change to the openshift-performance-addon-operator project:

      $ oc project openshift-performance-addon-operator

Installing the Performance Addon Operator using the web console

As a cluster administrator, you can install the Performance Addon Operator using the web console.

You must create the Namespace CR and OperatorGroup CR as mentioned in the previous section.

Procedure
  1. Install the Performance Addon Operator using the OpenShift Container Platform web console:

    1. In the OpenShift Container Platform web console, click OperatorsOperatorHub.

    2. Choose Performance Addon Operator from the list of available Operators, and then click Install.

    3. On the Install Operator page, select All namespaces on the cluster. Then, click Install.

  2. Optional: Verify that the performance-addon-operator installed successfully:

    1. Switch to the OperatorsInstalled Operators page.

    2. Ensure that Performance Addon Operator is listed in the openshift-performance-addon-operator project with a Status of InstallSucceeded.

      During installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.

      If the Operator does not appear as installed, to troubleshoot further:

      • Go to the OperatorsInstalled Operators page and inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.

      • Go to the WorkloadsPods page and check the logs for pods in the performance-addon-operator project.

Upgrading Performance Addon Operator

You can manually upgrade to the next minor version of Performance Addon Operator and monitor the status of an update by using the web console.

About upgrading Performance Addon Operator

  • You can upgrade to the next minor version of Performance Addon Operator by using the OpenShift Container Platform web console to change the channel of your Operator subscription.

  • You can enable automatic z-stream updates during Performance Addon Operator installation.

  • Updates are delivered via the Marketplace Operator, which is deployed during OpenShift Container Platform installation.The Marketplace Operator makes external Operators available to your cluster.

  • The amount of time an update takes to complete depends on your network connection. Most automatic updates complete within fifteen minutes.

How Performance Addon Operator upgrades affect your cluster

  • Neither the low latency tuning nor huge pages are affected.

  • Updating the Operator should not cause any unexpected reboots.

Upgrading Performance Addon Operator to the next minor version

You can manually upgrade Performance Addon Operator to the next minor version by using the OpenShift Container Platform web console to change the channel of your Operator subscription.

Prerequisites
  • Access to the cluster as a user with the cluster-admin role.

Procedure
  1. Access the web console and navigate to OperatorsInstalled Operators.

  2. Click Performance Addon Operator to open the Operator details page.

  3. Click the Subscription tab to open the Subscription details page.

  4. In the Update channel pane, click the pencil icon on the right side of the version number to open the Change Subscription update channel window.

  5. Select the next minor version. For example, if you want to upgrade to Performance Addon Operator 4.9, select 4.9.

  6. Click Save.

  7. Check the status of the upgrade by navigating to Operators → Installed Operators. You can also check the status by running the following oc command:

    $ oc get csv -n openshift-performance-addon-operator

Upgrading Performance Addon Operator when previously installed to a specific namespace

If you previously installed the Performance Addon Operator to a specific namespace on the cluster, for example openshift-performance-addon-operator, modify the OperatorGroup object to remove the targetNamespaces entry before upgrading.

Prerequisites
  • Install the OpenShift Container Platform CLI (oc).

  • Log in to the OpenShift cluster as a user with cluster-admin privileges.

Procedure
  1. Edit the Performance Addon Operator OperatorGroup CR and remove the spec element that contains the targetNamespaces entry by running the following command:

    $ oc patch operatorgroup -n openshift-performance-addon-operator openshift-performance-addon-operator --type json -p '[{ "op": "remove", "path": "/spec" }]'
  2. Wait until the Operator Lifecycle Manager (OLM) processes the change.

  3. Verify that the OperatorGroup CR change has been successfully applied. Check that the OperatorGroup CR spec element has been removed:

    $ oc describe -n openshift-performance-addon-operator og openshift-performance-addon-operator
  4. Proceed with the Performance Addon Operator upgrade.

Monitoring upgrade status

The best way to monitor Performance Addon Operator upgrade status is to watch the ClusterServiceVersion (CSV) PHASE. You can also monitor the CSV conditions in the web console or by running the oc get csv command.

The PHASE and conditions values are approximations that are based on available information.

Prerequisites
  • Access to the cluster as a user with the cluster-admin role.

  • Install the OpenShift CLI (oc).

Procedure
  1. Run the following command:

    $ oc get csv
  2. Review the output, checking the PHASE field. For example:

    VERSION    REPLACES                                         PHASE
    4.9.0      performance-addon-operator.v4.9.0                Installing
    4.8.0                                                       Replacing
  3. Run get csv again to verify the output:

    # oc get csv
    Example output
    NAME                                DISPLAY                      VERSION   REPLACES                            PHASE
    performance-addon-operator.v4.9.0   Performance Addon Operator   4.9.0     performance-addon-operator.v4.8.0   Succeeded

Provisioning real-time and low latency workloads

Many industries and organizations need extremely high performance computing and might require low and predictable latency, especially in the financial and telecommunications industries. For these industries, with their unique requirements, OpenShift Container Platform provides a Performance Addon Operator to implement automatic tuning to achieve low latency performance and consistent response time for OpenShift Container Platform applications.

The cluster administrator can use this performance profile configuration to make these changes in a more reliable way. The administrator can specify whether to update the kernel to kernel-rt (real-time), reserve CPUs for cluster and operating system housekeeping duties, including pod infra containers, and isolate CPUs for application containers to run the workloads.

The usage of execution probes in conjunction with applications that require guaranteed CPUs can cause latency spikes. It is recommended to use other probes, such as a properly configured set of network probes, as an alternative.

Known limitations for real-time

In most deployments, kernel-rt is supported only on worker nodes when you use a standard cluster with three control plane nodes and three worker nodes. There are exceptions for compact and single nodes on OpenShift Container Platform deployments. For installations on a single node, kernel-rt is supported on the single control plane node.

To fully utilize the real-time mode, the containers must run with elevated privileges. See Set capabilities for a Container for information on granting privileges.

OpenShift Container Platform restricts the allowed capabilities, so you might need to create a SecurityContext as well.

This procedure is fully supported with bare metal installations using Red Hat Enterprise Linux CoreOS (RHCOS) systems.

Establishing the right performance expectations refers to the fact that the real-time kernel is not a panacea. Its objective is consistent, low-latency determinism offering predictable response times. There is some additional kernel overhead associated with the real-time kernel. This is due primarily to handling hardware interruptions in separately scheduled threads. The increased overhead in some workloads results in some degradation in overall throughput. The exact amount of degradation is very workload dependent, ranging from 0% to 30%. However, it is the cost of determinism.

Provisioning a worker with real-time capabilities

  1. Install Performance Addon Operator to the cluster.

  2. Optional: Add a node to the OpenShift Container Platform cluster. See Setting BIOS parameters.

  3. Add the label worker-rt to the worker nodes that require the real-time capability by using the oc command.

  4. Create a new machine config pool for real-time nodes:

    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfigPool
    metadata:
      name: worker-rt
      labels:
        machineconfiguration.openshift.io/role: worker-rt
    spec:
      machineConfigSelector:
        matchExpressions:
          - {
               key: machineconfiguration.openshift.io/role,
               operator: In,
               values: [worker, worker-rt],
            }
      paused: false
      nodeSelector:
        matchLabels:
          node-role.kubernetes.io/worker-rt: ""

    Note that a machine config pool worker-rt is created for group of nodes that have the label worker-rt.

  5. Add the node to the proper machine config pool by using node role labels.

    You must decide which nodes are configured with real-time workloads. You could configure all of the nodes in the cluster, or a subset of the nodes. The Performance Addon Operator that expects all of the nodes are part of a dedicated machine config pool. If you use all of the nodes, you must point the Performance Addon Operator to the worker node role label. If you use a subset, you must group the nodes into a new machine config pool.

  6. Create the PerformanceProfile with the proper set of housekeeping cores and realTimeKernel: enabled: true.

  7. You must set machineConfigPoolSelector in PerformanceProfile:

      apiVersion: performance.openshift.io/v2
      kind: PerformanceProfile
      metadata:
       name: example-performanceprofile
      spec:
      ...
        realTimeKernel:
          enabled: true
        nodeSelector:
           node-role.kubernetes.io/worker-rt: ""
        machineConfigPoolSelector:
           machineconfiguration.openshift.io/role: worker-rt
  8. Verify that a matching machine config pool exists with a label:

    $ oc describe mcp/worker-rt
    Example output
    Name:         worker-rt
    Namespace:
    Labels:       machineconfiguration.openshift.io/role=worker-rt
  9. OpenShift Container Platform will start configuring the nodes, which might involve multiple reboots. Wait for the nodes to settle. This can take a long time depending on the specific hardware you use, but 20 minutes per node is expected.

  10. Verify everything is working as expected.

Verifying the real-time kernel installation

Use this command to verify that the real-time kernel is installed:

$ oc get node -o wide

Note the worker with the role worker-rt that contains the string 4.18.0-211.rt5.23.el8.x86_64:

NAME                               	STATUS   ROLES           	AGE 	VERSION                  	INTERNAL-IP
EXTERNAL-IP   OS-IMAGE                                       	KERNEL-VERSION
CONTAINER-RUNTIME
rt-worker-0.example.com	          Ready	 worker,worker-rt   5d17h   v1.22.1
128.66.135.107   <none>    	        Red Hat Enterprise Linux CoreOS 46.82.202008252340-0 (Ootpa)
4.18.0-211.rt5.23.el8.x86_64   cri-o://1.22.1-90.rhaos4.9.git4a0ac05.el8-rc.1
[...]

Creating a workload that works in real-time

Use the following procedures for preparing a workload that will use real-time capabilities.

Procedure
  1. Create a pod with a QoS class of Guaranteed.

  2. Optional: Disable CPU load balancing for DPDK.

  3. Assign a proper node selector.

When writing your applications, follow the general recommendations described in Application tuning and deployment.

Creating a pod with a QoS class of Guaranteed

Keep the following in mind when you create a pod that is given a QoS class of Guaranteed:

  • Every container in the pod must have a memory limit and a memory request, and they must be the same.

  • Every container in the pod must have a CPU limit and a CPU request, and they must be the same.

The following example shows the configuration file for a pod that has one container. The container has a memory limit and a memory request, both equal to 200 MiB. The container has a CPU limit and a CPU request, both equal to 1 CPU.

apiVersion: v1
kind: Pod
metadata:
  name: qos-demo
  namespace: qos-example
spec:
  containers:
  - name: qos-demo-ctr
    image: <image-pull-spec>
    resources:
      limits:
        memory: "200Mi"
        cpu: "1"
      requests:
        memory: "200Mi"
        cpu: "1"
  1. Create the pod:

    $ oc  apply -f qos-pod.yaml --namespace=qos-example
  2. View detailed information about the pod:

    $ oc get pod qos-demo --namespace=qos-example --output=yaml
    Example output
    spec:
      containers:
        ...
    status:
      qosClass: Guaranteed

    If a container specifies its own memory limit, but does not specify a memory request, OpenShift Container Platform automatically assigns a memory request that matches the limit. Similarly, if a container specifies its own CPU limit, but does not specify a CPU request, OpenShift Container Platform automatically assigns a CPU request that matches the limit.

Optional: Disabling CPU load balancing for DPDK

Functionality to disable or enable CPU load balancing is implemented on the CRI-O level. The code under the CRI-O disables or enables CPU load balancing only when the following requirements are met.

  • The pod must use the performance-<profile-name> runtime class. You can get the proper name by looking at the status of the performance profile, as shown here:

    apiVersion: performance.openshift.io/v2
    kind: PerformanceProfile
    ...
    status:
      ...
      runtimeClass: performance-manual
  • The pod must have the cpu-load-balancing.crio.io: true annotation.

The Performance Addon Operator is responsible for the creation of the high-performance runtime handler config snippet under relevant nodes and for creation of the high-performance runtime class under the cluster. It will have the same content as default runtime handler except it enables the CPU load balancing configuration functionality.

To disable the CPU load balancing for the pod, the Pod specification must include the following fields:

apiVersion: v1
kind: Pod
metadata:
  ...
  annotations:
    ...
    cpu-load-balancing.crio.io: "disable"
    ...
  ...
spec:
  ...
  runtimeClassName: performance-<profile_name>
  ...

Only disable CPU load balancing when the CPU manager static policy is enabled and for pods with guaranteed QoS that use whole CPUs. Otherwise, disabling CPU load balancing can affect the performance of other containers in the cluster.

Assigning a proper node selector

The preferred way to assign a pod to nodes is to use the same node selector the performance profile used, as shown here:

apiVersion: v1
kind: Pod
metadata:
  name: example
spec:
  # ...
  nodeSelector:
    node-role.kubernetes.io/worker-rt: ""

Scheduling a workload onto a worker with real-time capabilities

Use label selectors that match the nodes attached to the machine config pool that was configured for low latency by the Performance Addon Operator. For more information, see Assigning pods to nodes.

Managing device interrupt processing for guaranteed pod isolated CPUs

The Performance Addon Operator can manage host CPUs by dividing them into reserved CPUs for cluster and operating system housekeeping duties, including pod infra containers, and isolated CPUs for application containers to run the workloads. This allows you to set CPUs for low latency workloads as isolated.

Device interrupts are load balanced between all isolated and reserved CPUs to avoid CPUs being overloaded, with the exception of CPUs where there is a guaranteed pod running. Guaranteed pod CPUs are prevented from processing device interrupts when the relevant annotations are set for the pod.

In the performance profile, globallyDisableIrqLoadBalancing is used to manage whether device interrupts are processed or not. For certain workloads the reserved CPUs are not always sufficient for dealing with device interrupts, and for this reason, device interrupts are not globally disabled on the isolated CPUs. By default, Performance Addon Operator does not disable device interrupts on isolated CPUs.

To achieve low latency for workloads, some (but not all) pods require the CPUs they are running on to not process device interrupts. A pod annotation, irq-load-balancing.crio.io, is used to define whether device interrupts are processed or not. When configured, CRI-O disables device interrupts only as long as the pod is running.

Disabling CPU CFS quota

To reduce CPU throttling for individual guaranteed pods, create a pod specification with the annotation cpu-quota.crio.io: "disable". This annotation disables the CPU completely fair scheduler (CFS) quota at the pod run time. The following pod specification contains this annotation:

apiVersion: performance.openshift.io/v2
kind: Pod
metadata:
  annotations:
      cpu-quota.crio.io: "disable"
spec:
    runtimeClassName: performance-<profile_name>
...

Only disable CPU CFS quota when the CPU manager static policy is enabled and for pods with guaranteed QoS that use whole CPUs. Otherwise, disabling CPU CFS quota can affect the performance of other containers in the cluster.

Disabling global device interrupts handling in Performance Addon Operator

To configure Performance Addon Operator to disable global device interrupts for the isolated CPU set, set the globallyDisableIrqLoadBalancing field in the performance profile to true. When true, conflicting pod annotations are ignored. When false, IRQ loads are balanced across all CPUs.

A performance profile snippet illustrates this setting:

apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: manual
spec:
  globallyDisableIrqLoadBalancing: true
...

Disabling interrupt processing for individual pods

To disable interrupt processing for individual pods, ensure that globallyDisableIrqLoadBalancing is set to false in the performance profile. Then, in the pod specification, set the irq-load-balancing.crio.io pod annotation to disable. The following pod specification contains this annotation:

apiVersion: performance.openshift.io/v2
kind: Pod
metadata:
  annotations:
      irq-load-balancing.crio.io: "disable"
spec:
    runtimeClassName: performance-<profile_name>
...

Upgrading the performance profile to use device interrupt processing

When you upgrade the Performance Addon Operator performance profile custom resource definition (CRD) from v1 or v1alpha1 to v2, globallyDisableIrqLoadBalancing is set to true on existing profiles.

globallyDisableIrqLoadBalancing toggles whether IRQ load balancing will be disabled for the Isolated CPU set. When the option is set to true it disables IRQ load balancing for the Isolated CPU set. Setting the option to false allows the IRQs to be balanced across all CPUs.

Supported API Versions

The Performance Addon Operator supports v2, v1, and v1alpha1 for the performance profile apiVersion field. The v1 and v1alpha1 APIs are identical. The v2 API includes an optional boolean field globallyDisableIrqLoadBalancing with a default value of false.

Upgrading Performance Addon Operator API from v1alpha1 to v1

When upgrading Performance Addon Operator API version from v1alpha1 to v1, the v1alpha1 performance profiles are converted on-the-fly using a "None" Conversion strategy and served to the Performance Addon Operator with API version v1.

Upgrading Performance Addon Operator API from v1alpha1 or v1 to v2

When upgrading from an older Performance Addon Operator API version, the existing v1 and v1alpha1 performance profiles are converted using a conversion webhook that injects the globallyDisableIrqLoadBalancing field with a value of true.

Configuring a node for IRQ dynamic load balancing

To configure a cluster node to handle IRQ dynamic load balancing, do the following:

  1. Log in to the OpenShift Container Platform cluster as a user with cluster-admin privileges.

  2. Set the performance profile apiVersion to use performance.openshift.io/v2.

  3. Remove the globallyDisableIrqLoadBalancing field or set it to false.

  4. Set the appropriate isolated and reserved CPUs. The following snippet illustrates a profile that reserves 2 CPUs. IRQ load-balancing is enabled for pods running on the isolated CPU set:

    apiVersion: performance.openshift.io/v2
    kind: PerformanceProfile
    metadata:
      name: dynamic-irq-profile
    spec:
      cpu:
        isolated: 2-5
        reserved: 0-1
    ...

    When you configure reserved and isolated CPUs, the infra containers in pods use the reserved CPUs and the application containers use the isolated CPUs.

  5. Create the pod that uses exclusive CPUs, and set irq-load-balancing.crio.io and cpu-quota.crio.io annotations to disable. For example:

    apiVersion: v1
    kind: Pod
    metadata:
      name: dynamic-irq-pod
      annotations:
         irq-load-balancing.crio.io: "disable"
         cpu-quota.crio.io: "disable"
    spec:
      containers:
      - name: dynamic-irq-pod
        image: "quay.io/openshift-kni/cnf-tests:4.9"
        command: ["sleep", "10h"]
        resources:
          requests:
            cpu: 2
            memory: "200M"
          limits:
            cpu: 2
            memory: "200M"
      nodeSelector:
        node-role.kubernetes.io/worker-cnf: ""
      runtimeClassName: performance-dynamic-irq-profile
    ...
  6. Enter the pod runtimeClassName in the form performance-<profile_name>, where <profile_name> is the name from the PerformanceProfile YAML, in this example, performance-dynamic-irq-profile.

  7. Set the node selector to target a cnf-worker.

  8. Ensure the pod is running correctly. Status should be running, and the correct cnf-worker node should be set:

    $ oc get pod -o wide
    Expected output
    NAME              READY   STATUS    RESTARTS   AGE     IP             NODE          NOMINATED NODE   READINESS GATES
    dynamic-irq-pod   1/1     Running   0          5h33m   <ip-address>   <node-name>   <none>           <none>
  9. Get the CPUs that the pod configured for IRQ dynamic load balancing runs on:

    $ oc exec -it dynamic-irq-pod -- /bin/bash -c "grep Cpus_allowed_list /proc/self/status | awk '{print $2}'"
    Expected output
    Cpus_allowed_list:  2-3
  10. Ensure the node configuration is applied correctly. SSH into the node to verify the configuration.

    $ oc debug node/<node-name>
    Expected output
    Starting pod/<node-name>-debug ...
    To use host binaries, run `chroot /host`
    
    Pod IP: <ip-address>
    If you don't see a command prompt, try pressing enter.
    
    sh-4.4#
  11. Verify that you can use the node file system:

    sh-4.4# chroot /host
    Expected output
    sh-4.4#
  12. Ensure the default system CPU affinity mask does not include the dynamic-irq-pod CPUs, for example, CPUs 2 and 3.

    $ cat /proc/irq/default_smp_affinity
    Example output
    33
  13. Ensure the system IRQs are not configured to run on the dynamic-irq-pod CPUs:

    find /proc/irq/ -name smp_affinity_list -exec sh -c 'i="$1"; mask=$(cat $i); file=$(echo $i); echo $file: $mask' _ {} \;
    Example output
    /proc/irq/0/smp_affinity_list: 0-5
    /proc/irq/1/smp_affinity_list: 5
    /proc/irq/2/smp_affinity_list: 0-5
    /proc/irq/3/smp_affinity_list: 0-5
    /proc/irq/4/smp_affinity_list: 0
    /proc/irq/5/smp_affinity_list: 0-5
    /proc/irq/6/smp_affinity_list: 0-5
    /proc/irq/7/smp_affinity_list: 0-5
    /proc/irq/8/smp_affinity_list: 4
    /proc/irq/9/smp_affinity_list: 4
    /proc/irq/10/smp_affinity_list: 0-5
    /proc/irq/11/smp_affinity_list: 0
    /proc/irq/12/smp_affinity_list: 1
    /proc/irq/13/smp_affinity_list: 0-5
    /proc/irq/14/smp_affinity_list: 1
    /proc/irq/15/smp_affinity_list: 0
    /proc/irq/24/smp_affinity_list: 1
    /proc/irq/25/smp_affinity_list: 1
    /proc/irq/26/smp_affinity_list: 1
    /proc/irq/27/smp_affinity_list: 5
    /proc/irq/28/smp_affinity_list: 1
    /proc/irq/29/smp_affinity_list: 0
    /proc/irq/30/smp_affinity_list: 0-5

Some IRQ controllers do not support IRQ re-balancing and will always expose all online CPUs as the IRQ mask. These IRQ controllers effectively run on CPU 0. For more information on the host configuration, SSH into the host and run the following, replacing <irq-num> with the CPU number that you want to query:

$ cat /proc/irq/<irq-num>/effective_affinity

Configuring hyperthreading for a cluster

To configure hyperthreading for an OpenShift Container Platform cluster, set the CPU threads in the performance profile to the same cores that are configured for the reserved or isolated CPU pools.

If you configure a performance profile, and subsequently change the hyperthreading configuration for the host, ensure that you update the CPU isolated and reserved fields in the PerformanceProfile YAML to match the new configuration.

Disabling a previously enabled host hyperthreading configuration can cause the CPU core IDs listed in the PerformanceProfile YAML to be incorrect. This incorrect configuration can cause the node to become unavailable because the listed CPUs can no longer be found.

Prerequisites
  • Access to the cluster as a user with the cluster-admin role.

  • Install the OpenShift CLI (oc).

Procedure
  1. Ascertain which threads are running on what CPUs for the host you want to configure.

    You can view which threads are running on the host CPUs by logging in to the cluster and running the following command:

    $ lscpu --all --extended
    Example output
    CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ    MINMHZ
    0   0    0      0    0:0:0:0       yes    4800.0000 400.0000
    1   0    0      1    1:1:1:0       yes    4800.0000 400.0000
    2   0    0      2    2:2:2:0       yes    4800.0000 400.0000
    3   0    0      3    3:3:3:0       yes    4800.0000 400.0000
    4   0    0      0    0:0:0:0       yes    4800.0000 400.0000
    5   0    0      1    1:1:1:0       yes    4800.0000 400.0000
    6   0    0      2    2:2:2:0       yes    4800.0000 400.0000
    7   0    0      3    3:3:3:0       yes    4800.0000 400.0000

    In this example, there are eight logical CPU cores running on four physical CPU cores. CPU0 and CPU4 are running on physical Core0, CPU1 and CPU5 are running on physical Core 1, and so on.

    Alternatively, to view the threads that are set for a particular physical CPU core (cpu0 in the example below), open a command prompt and run the following:

    $ cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
    Example output
    0-4
  2. Apply the isolated and reserved CPUs in the PerformanceProfile YAML. For example, you can set logical cores CPU0 and CPU4 as isolated, and logical cores CPU1 to CPU3 and CPU5 to CPU7 as reserved. When you configure reserved and isolated CPUs, the infra containers in pods use the reserved CPUs and the application containers use the isolated CPUs.

    ...
      cpu:
        isolated: 0,4
        reserved: 1-3,5-7
    ...

    The reserved and isolated CPU pools must not overlap and together must span all available cores in the worker node.

Hyperthreading is enabled by default on most Intel processors. If you enable hyperthreading, all threads processed by a particular core must be isolated or processed on the same core.

Disabling hyperthreading for low latency applications

When configuring clusters for low latency processing, consider whether you want to disable hyperthreading before you deploy the cluster. To disable hyperthreading, do the following:

  1. Create a performance profile that is appropriate for your hardware and topology.

  2. Set nosmt as an additional kernel argument. The following example performance profile illustrates this setting:

    apiVersion: performance.openshift.io/v2
    kind: PerformanceProfile
    metadata:
      name: example-performanceprofile
    spec:
      additionalKernelArgs:
        - nmi_watchdog=0
        - audit=0
        - mce=off
        - processor.max_cstate=1
        - idle=poll
        - intel_idle.max_cstate=0
        - nosmt
      cpu:
        isolated: 2-3
        reserved: 0-1
      hugepages:
        defaultHugepagesSize: 1G
        pages:
          - count: 2
            node: 0
            size: 1G
      nodeSelector:
        node-role.kubernetes.io/performance: ''
      realTimeKernel:
        enabled: true

    When you configure reserved and isolated CPUs, the infra containers in pods use the reserved CPUs and the application containers use the isolated CPUs.

Tuning nodes for low latency with the performance profile

The performance profile lets you control latency tuning aspects of nodes that belong to a certain machine config pool. After you specify your settings, the PerformanceProfile object is compiled into multiple objects that perform the actual node level tuning:

  • A MachineConfig file that manipulates the nodes.

  • A KubeletConfig file that configures the Topology Manager, the CPU Manager, and the OpenShift Container Platform nodes.

  • The Tuned profile that configures the Node Tuning Operator.

You can use a performance profile to specify whether to update the kernel to kernel-rt, to allocate huge pages, and to partition the CPUs for performing housekeeping duties or running workloads.

You can manually create the PerformanceProfile object or use the Performance Profile Creator (PPC) to generate a performance profile. See the additional resources below for more information on the PPC.

Sample performance profile
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
 name: performance
spec:
 cpu:
  isolated: "5-15" (1)
  reserved: "0-4" (2)
 hugepages:
  defaultHugepagesSize: "1G"
  pages:
  - size: "1G"
    count: 16
    node: 0
 realTimeKernel:
  enabled: true  (3)
 numa:  (4)
  topologyPolicy: "best-effort"
 nodeSelector:
  node-role.kubernetes.io/worker-cnf: "" (5)
1 Use this field to isolate specific CPUs to use with application containers for workloads.
2 Use this field to reserve specific CPUs to use with infra containers for housekeeping.
3 Use this field to install the real-time kernel on the node. Valid values are true or false. Setting the true value installs the real-time kernel.
4 Use this field to configure the topology manager policy. Valid values are none (default), best-effort, restricted, and single-numa-node. For more information, see Topology Manager Policies.
5 Use this field to specify a node selector to apply the performance profile to specific nodes.
Additional resources

Configuring huge pages

Nodes must pre-allocate huge pages used in an OpenShift Container Platform cluster. Use the Performance Addon Operator to allocate huge pages on a specific node.

OpenShift Container Platform provides a method for creating and allocating huge pages. Performance Addon Operator provides an easier method for doing this using the performance profile.

For example, in the hugepages pages section of the performance profile, you can specify multiple blocks of size, count, and, optionally, node:

hugepages:
   defaultHugepagesSize: "1G"
   pages:
   - size:  "1G"
     count:  4
     node:  0 (1)
1 node is the NUMA node in which the huge pages are allocated. If you omit node, the pages are evenly spread across all NUMA nodes.

Wait for the relevant machine config pool status that indicates the update is finished.

These are the only configuration steps you need to do to allocate huge pages.

Verification
  • To verify the configuration, see the /proc/meminfo file on the node:

    $ oc debug node/ip-10-0-141-105.ec2.internal
    # grep -i huge /proc/meminfo
    Example output
    AnonHugePages:    ###### ##
    ShmemHugePages:        0 kB
    HugePages_Total:       2
    HugePages_Free:        2
    HugePages_Rsvd:        0
    HugePages_Surp:        0
    Hugepagesize:       #### ##
    Hugetlb:            #### ##
  • Use oc describe to report the new size:

    $ oc describe node worker-0.ocp4poc.example.com | grep -i huge
    Example output
                                       hugepages-1g=true
     hugepages-###:  ###
     hugepages-###:  ###

Allocating multiple huge page sizes

You can request huge pages with different sizes under the same container. This allows you to define more complicated pods consisting of containers with different huge page size needs.

For example, you can define sizes 1G and 2M and the Performance Addon Operator will configure both sizes on the node, as shown here:

spec:
  hugepages:
    defaultHugepagesSize: 1G
    pages:
    - count: 1024
      node: 0
      size: 2M
    - count: 4
      node: 1
      size: 1G

Restricting CPUs for infra and application containers

Generic housekeeping and workload tasks use CPUs in a way that may impact latency-sensitive processes. By default, the container runtime uses all online CPUs to run all containers together, which can result in context switches and spikes in latency. Partitioning the CPUs prevents noisy processes from interfering with latency-sensitive processes by separating them from each other. The following table describes how processes run on a CPU after you have tuned the node using the Performance Add-On Operator:

Table 1. Process' CPU assignments
Process type Details

Burstable and BestEffort pods

Runs on any CPU except where low latency workload is running

Infrastructure pods

Runs on any CPU except where low latency workload is running

Interrupts

Redirects to reserved CPUs (optional in OpenShift Container Platform 4.7 and later)

Kernel processes

Pins to reserved CPUs

Latency-sensitive workload pods

Pins to a specific set of exclusive CPUs from the isolated pool

OS processes/systemd services

Pins to reserved CPUs

The allocatable capacity of cores on a node for pods of all QoS process types, Burstable, BestEffort, or Guaranteed, is equal to the capacity of the isolated pool. The capacity of the reserved pool is removed from the node’s total core capacity for use by the cluster and operating system housekeeping duties.

Example 1

A node features a capacity of 100 cores. Using a performance profile, the cluster administrator allocates 50 cores to the isolated pool and 50 cores to the reserved pool. The cluster administrator assigns 25 cores to QoS Guaranteed pods and 25 cores for BestEffort or Burstable pods. This matches the capacity of the isolated pool.

Example 2

A node features a capacity of 100 cores. Using a performance profile, the cluster administrator allocates 50 cores to the isolated pool and 50 cores to the reserved pool. The cluster administrator assigns 50 cores to QoS Guaranteed pods and one core for BestEffort or Burstable pods. This exceeds the capacity of the isolated pool by one core. Pod scheduling fails because of insufficient CPU capacity.

The exact partitioning pattern to use depends on many factors like hardware, workload characteristics and the expected system load. Some sample use cases are as follows:

  • If the latency-sensitive workload uses specific hardware, such as a network interface controller (NIC), ensure that the CPUs in the isolated pool are as close as possible to this hardware. At a minimum, you should place the workload in the same Non-Uniform Memory Access (NUMA) node.

  • The reserved pool is used for handling all interrupts. When depending on system networking, allocate a sufficiently-sized reserve pool to handle all the incoming packet interrupts. In 4.9 and later versions, workloads can optionally be labeled as sensitive.

The decision regarding which specific CPUs should be used for reserved and isolated partitions requires detailed analysis and measurements. Factors like NUMA affinity of devices and memory play a role. The selection also depends on the workload architecture and the specific use case.

The reserved and isolated CPU pools must not overlap and together must span all available cores in the worker node.

To ensure that housekeeping tasks and workloads do not interfere with each other, specify two groups of CPUs in the spec section of the performance profile.

  • isolated - Specifies the CPUs for the application container workloads. These CPUs have the lowest latency. Processes in this group have no interruptions and can, for example, reach much higher DPDK zero packet loss bandwidth.

  • reserved - Specifies the CPUs for the cluster and operating system housekeeping duties. Threads in the reserved group are often busy. Do not run latency-sensitive applications in the reserved group. Latency-sensitive applications run in the isolated group.

Procedure
  1. Create a performance profile appropriate for the environment’s hardware and topology.

  2. Add the reserved and isolated parameters with the CPUs you want reserved and isolated for the infra and application containers:

    apiVersion: performance.openshift.io/v2
    kind: PerformanceProfile
    metadata:
      name: infra-cpus
    spec:
      cpu:
        reserved: "0-4,9" (1)
        isolated: "5-8" (2)
      nodeSelector: (3)
        node-role.kubernetes.io/worker: ""
    1 Specify which CPUs are for infra containers to perform cluster and operating system housekeeping duties.
    2 Specify which CPUs are for application containers to run workloads.
    3 Optional: Specify a node selector to apply the performance profile to specific nodes.

Reducing NIC queues using the Performance Addon Operator

The Performance Addon Operator allows you to adjust the network interface controller (NIC) queue count for each network device by configuring the performance profile. Device network queues allows the distribution of packets among different physical queues and each queue gets a separate thread for packet processing.

In real-time or low latency systems, all the unnecessary interrupt request lines (IRQs) pinned to the isolated CPUs must be moved to reserved or housekeeping CPUs.

In deployments with applications that require system, OpenShift Container Platform networking or in mixed deployments with Data Plane Development Kit (DPDK) workloads, multiple queues are needed to achieve good throughput and the number of NIC queues should be adjusted or remain unchanged. For example, to achieve low latency the number of NIC queues for DPDK based workloads should be reduced to just the number of reserved or housekeeping CPUs.

Too many queues are created by default for each CPU and these do not fit into the interrupt tables for housekeeping CPUs when tuning for low latency. Reducing the number of queues makes proper tuning possible. Smaller number of queues means a smaller number of interrupts that then fit in the IRQ table.

Adjusting the NIC queues with the performance profile

The performance profile lets you adjust the queue count for each network device.

Supported network devices:

  • Non-virtual network devices

  • Network devices that support multiple queues (channels)

Unsupported network devices:

  • Pure software network interfaces

  • Block devices

  • Intel DPDK virtual functions

Prerequisites
  • Access to the cluster as a user with the cluster-admin role.

  • Install the OpenShift CLI (oc).

Procedure
  1. Log in to the OpenShift Container Platform cluster running the Performance Addon Operator as a user with cluster-admin privileges.

  2. Create and apply a performance profile appropriate for your hardware and topology. For guidance on creating a profile, see the "Creating a performance profile" section.

  3. Edit this created performance profile:

    $ oc edit -f <your_profile_name>.yaml
  4. Populate the spec field with the net object. The object list can contain two fields:

    • userLevelNetworking is a required field specified as a boolean flag. If userLevelNetworking is true, the queue count is set to the reserved CPU count for all supported devices. The default is false.

    • devices is an optional field specifying a list of devices that will have the queues set to the reserved CPU count. If the device list is empty, the configuration applies to all network devices. The configuration is as follows:

      • interfaceName: This field specifies the interface name, and it supports shell-style wildcards, which can be positive or negative.

        • Example wildcard syntax is as follows: <string> .*

        • Negative rules are prefixed with an exclamation mark. To apply the net queue changes to all devices other than the excluded list, use !<device>, for example, !eno1.

      • vendorID: The network device vendor ID represented as a 16-bit hexadecimal number with a 0x prefix.

      • deviceID: The network device ID (model) represented as a 16-bit hexadecimal number with a 0x prefix.

        When a deviceID is specified, the vendorID must also be defined. A device that matches all of the device identifiers specified in a device entry interfaceName, vendorID, or a pair of vendorID plus deviceID qualifies as a network device. This network device then has its net queues count set to the reserved CPU count.

        When two or more devices are specified, the net queues count is set to any net device that matches one of them.

  5. Set the queue count to the reserved CPU count for all devices by using this example performance profile:

    apiVersion: performance.openshift.io/v2
    kind: PerformanceProfile
    metadata:
      name: manual
    spec:
      cpu:
        isolated: 3-51,54-103
        reserved: 0-2,52-54
      net:
        userLevelNetworking: true
      nodeSelector:
        node-role.kubernetes.io/worker-cnf: ""
  6. Set the queue count to the reserved CPU count for all devices matching any of the defined device identifiers by using this example performance profile:

    apiVersion: performance.openshift.io/v2
    kind: PerformanceProfile
    metadata:
      name: manual
    spec:
      cpu:
        isolated: 3-51,54-103
        reserved: 0-2,52-54
      net:
        userLevelNetworking: true
        devices:
        - interfaceName: “eth0”
        - interfaceName: “eth1”
        - vendorID: “0x1af4”
        - deviceID: “0x1000”
      nodeSelector:
        node-role.kubernetes.io/worker-cnf: ""
  7. Set the queue count to the reserved CPU count for all devices starting with the interface name eth by using this example performance profile:

    apiVersion: performance.openshift.io/v2
    kind: PerformanceProfile
    metadata:
      name: manual
    spec:
      cpu:
        isolated: 3-51,54-103
        reserved: 0-2,52-54
      net:
        userLevelNetworking: true
        devices:
        - interfaceName: “eth*”
      nodeSelector:
        node-role.kubernetes.io/worker-cnf: ""
  8. Set the queue count to the reserved CPU count for all devices with an interface named anything other than eno1 by using this example performance profile:

    apiVersion: performance.openshift.io/v2
    kind: PerformanceProfile
    metadata:
      name: manual
    spec:
      cpu:
        isolated: 3-51,54-103
        reserved: 0-2,52-54
      net:
        userLevelNetworking: true
        devices:
        - interfaceName: “!eno1”
      nodeSelector:
        node-role.kubernetes.io/worker-cnf: ""
  9. Set the queue count to the reserved CPU count for all devices that have an interface name eth0, vendorID of 0x1af4, and deviceID of 0x1000 by using this example performance profile:

    apiVersion: performance.openshift.io/v2
    kind: PerformanceProfile
    metadata:
      name: manual
    spec:
      cpu:
        isolated: 3-51,54-103
        reserved: 0-2,52-54
      net:
        userLevelNetworking: true
        devices:
        - interfaceName: “eth0”
        - vendorID: “0x1af4”
        - deviceID: “0x1000”
      nodeSelector:
        node-role.kubernetes.io/worker-cnf: ""
  10. Apply the updated performance profile:

    $ oc apply -f <your_profile_name>.yaml
Additional resources

Verifying the queue status

In this section, a number of examples illustrate different performance profiles and how to verify the changes are applied.

Example 1

In this example, the net queue count is set to the reserved CPU count (2) for all supported devices.

The relevant section from the performance profile is:

apiVersion: performance.openshift.io/v2
metadata:
  name: performance
spec:
  kind: PerformanceProfile
  spec:
    cpu:
      reserved: 0-1  #total = 2
      isolated: 2-8
    net:
      userLevelNetworking: true
# ...
  • Display the status of the queues associated with a device using the following command:

    Run this command on the node where the performance profile was applied.

    $ ethtool -l <device>
  • Verify the queue status before the profile is applied:

    $ ethtool -l ens4
    Example output
    Channel parameters for ens4:
    Pre-set maximums:
    RX:         0
    TX:         0
    Other:      0
    Combined:   4
    Current hardware settings:
    RX:         0
    TX:         0
    Other:      0
    Combined:   4
  • Verify the queue status after the profile is applied:

    $ ethtool -l ens4
    Example output
    Channel parameters for ens4:
    Pre-set maximums:
    RX:         0
    TX:         0
    Other:      0
    Combined:   4
    Current hardware settings:
    RX:         0
    TX:         0
    Other:      0
    Combined:   2 (1)
    
1 The combined channel shows that the total count of reserved CPUs for all supported devices is 2. This matches what is configured in the performance profile.
Example 2

In this example, the net queue count is set to the reserved CPU count (2) for all supported network devices with a specific vendorID.

The relevant section from the performance profile is:

apiVersion: performance.openshift.io/v2
metadata:
  name: performance
spec:
  kind: PerformanceProfile
  spec:
    cpu:
      reserved: 0-1  #total = 2
      isolated: 2-8
    net:
      userLevelNetworking: true
      devices:
      - vendorID = 0x1af4
# ...
  • Display the status of the queues associated with a device using the following command:

    Run this command on the node where the performance profile was applied.

    $ ethtool -l <device>
  • Verify the queue status after the profile is applied:

    $ ethtool -l ens4
    Example output
    Channel parameters for ens4:
    Pre-set maximums:
    RX:         0
    TX:         0
    Other:      0
    Combined:   4
    Current hardware settings:
    RX:         0
    TX:         0
    Other:      0
    Combined:   2 (1)
    
1 The total count of reserved CPUs for all supported devices with vendorID=0x1af4 is 2. For example, if there is another network device ens2 with vendorID=0x1af4 it will also have total net queues of 2. This matches what is configured in the performance profile.
Example 3

In this example, the net queue count is set to the reserved CPU count (2) for all supported network devices that match any of the defined device identifiers.

The command udevadm info provides a detailed report on a device. In this example the devices are:

# udevadm info -p /sys/class/net/ens4
...
E: ID_MODEL_ID=0x1000
E: ID_VENDOR_ID=0x1af4
E: INTERFACE=ens4
...
# udevadm info -p /sys/class/net/eth0
...
E: ID_MODEL_ID=0x1002
E: ID_VENDOR_ID=0x1001
E: INTERFACE=eth0
...
  • Set the net queues to 2 for a device with interfaceName equal to eth0 and any devices that have a vendorID=0x1af4 with the following performance profile:

    apiVersion: performance.openshift.io/v2
    metadata:
      name: performance
    spec:
      kind: PerformanceProfile
        spec:
          cpu:
            reserved: 0-1  #total = 2
            isolated: 2-8
          net:
            userLevelNetworking: true
            devices:
            - interfaceName = eth0
            - vendorID = 0x1af4
    ...
  • Verify the queue status after the profile is applied:

    $ ethtool -l ens4
    Example output
    Channel parameters for ens4:
    Pre-set maximums:
    RX:         0
    TX:         0
    Other:      0
    Combined:   4
    Current hardware settings:
    RX:         0
    TX:         0
    Other:      0
    Combined:   2 (1)
    
    1 The total count of reserved CPUs for all supported devices with vendorID=0x1af4 is set to 2. For example, if there is another network device ens2 with vendorID=0x1af4, it will also have the total net queues set to 2. Similarly, a device with interfaceName equal to eth0 will have total net queues set to 2.

Logging associated with adjusting NIC queues

Log messages detailing the assigned devices are recorded in the respective Tuned daemon logs. The following messages might be recorded to the /var/log/tuned/tuned.log file:

  • An INFO message is recorded detailing the successfully assigned devices:

    INFO tuned.plugins.base: instance net_test (net): assigning devices ens1, ens2, ens3
  • A WARNING message is recorded if none of the devices can be assigned:

    WARNING  tuned.plugins.base: instance net_test: no matching devices available

Performing end-to-end tests for platform verification

The Cloud-native Network Functions (CNF) tests image is a containerized test suite that validates features required to run CNF payloads. You can use this image to validate a CNF-enabled OpenShift cluster where all the components required for running CNF workloads are installed.

The tests run by the image are split into three different phases:

  • Simple cluster validation

  • Setup

  • End to end tests

The validation phase checks that all the features required to be tested are deployed correctly on the cluster.

Validations include:

  • Targeting a machine config pool that belong to the machines to be tested

  • Enabling SCTP on the nodes

  • Enabling xt_u32 kernel module via machine config

  • Having the Performance Addon Operator installed

  • Having the SR-IOV Operator installed

  • Having the PTP Operator installed

  • Enabling the contain-mount-namespace mode via machine config

  • Using OVN-kubernetes as the cluster network provider

Latency tests, a part of the CNF-test container, also require the same validations. For more information about running a latency test, see the Running the latency tests section.

The tests need to perform an environment configuration every time they are executed. This involves items such as creating SR-IOV node policies, performance profiles, or PTP profiles. Allowing the tests to configure an already configured cluster might affect the functionality of the cluster. Also, changes to configuration items such as SR-IOV node policy might result in the environment being temporarily unavailable until the configuration change is processed.

Prerequisites

  • The test entrypoint is /usr/bin/test-run.sh. It runs both a setup test set and the real conformance test suite. The minimum requirement is to provide it with a kubeconfig file and its related $KUBECONFIG environment variable, mounted through a volume.

  • The tests assumes that a given feature is already available on the cluster in the form of an Operator, flags enabled on the cluster, or machine configs.

  • Some tests require a pre-existing machine config pool to append their changes to. This must be created on the cluster before running the tests.

    The default worker pool is worker-cnf and can be created with the following manifest:

    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfigPool
    metadata:
      name: worker-cnf
      labels:
        machineconfiguration.openshift.io/role: worker-cnf
    spec:
      machineConfigSelector:
        matchExpressions:
          - {
              key: machineconfiguration.openshift.io/role,
              operator: In,
              values: [worker-cnf, worker],
            }
      paused: false
      nodeSelector:
        matchLabels:
          node-role.kubernetes.io/worker-cnf: ""

    You can use the ROLE_WORKER_CNF variable to override the worker pool name:

    $ docker run -v $(pwd)/:/kubeconfig -e KUBECONFIG=/kubeconfig/kubeconfig -e
    ROLE_WORKER_CNF=custom-worker-pool registry.redhat.io/openshift4/cnf-tests-rhel8:v4.9 /usr/bin/test-run.sh

    Currently, not all tests run selectively on the nodes belonging to the pool.

Dry run

Use this command to run in dry-run mode. This is useful for checking what is in the test suite and provides output for all of the tests the image would run.

$ docker run -v $(pwd)/:/kubeconfig -e KUBECONFIG=/kubeconfig/kubeconfig registry.redhat.io/openshift4/cnf-tests-rhel8:v4.9 /usr/bin/test-run.sh -ginkgo.dryRun -ginkgo.v

Disconnected mode

The CNF tests image support running tests in a disconnected cluster, meaning a cluster that is not able to reach outer registries. This is done in two steps:

  1. Performing the mirroring.

  2. Instructing the tests to consume the images from a custom registry.

Mirroring the images to a custom registry accessible from the cluster

A mirror executable is shipped in the image to provide the input required by oc to mirror the images needed to run the tests to a local registry.

Run this command from an intermediate machine that has access both to the cluster and to registry.redhat.io over the internet:

$ docker run -v $(pwd)/:/kubeconfig -e KUBECONFIG=/kubeconfig/kubeconfig registry.redhat.io/openshift4/cnf-tests-rhel8:v4.9 /usr/bin/mirror -registry my.local.registry:5000/ |  oc image mirror -f -

Then, follow the instructions in the following section about overriding the registry used to fetch the images.

Instruct the tests to consume those images from a custom registry

This is done by setting the IMAGE_REGISTRY environment variable:

$ docker run -v $(pwd)/:/kubeconfig -e KUBECONFIG=/kubeconfig/kubeconfig -e IMAGE_REGISTRY="my.local.registry:5000/" -e CNF_TESTS_IMAGE="custom-cnf-tests-image:latests" registry.redhat.io/openshift4/cnf-tests-rhel8:v4.9 /usr/bin/test-run.sh

Mirroring to the cluster internal registry

OpenShift Container Platform provides a built-in container image registry, which runs as a standard workload on the cluster.

Procedure
  1. Gain external access to the registry by exposing it with a route:

    $ oc patch configs.imageregistry.operator.openshift.io/cluster --patch '{"spec":{"defaultRoute":true}}' --type=merge
  2. Fetch the registry endpoint:

    REGISTRY=$(oc get route default-route -n openshift-image-registry --template='{{ .spec.host }}')
  3. Create a namespace for exposing the images:

    $ oc create ns cnftests
  4. Make that image stream available to all the namespaces used for tests. This is required to allow the tests namespaces to fetch the images from the cnftests image stream.

    $ oc policy add-role-to-user system:image-puller system:serviceaccount:sctptest:default --namespace=cnftests
    $ oc policy add-role-to-user system:image-puller system:serviceaccount:cnf-features-testing:default --namespace=cnftests
    $ oc policy add-role-to-user system:image-puller system:serviceaccount:performance-addon-operators-testing:default --namespace=cnftests
    $ oc policy add-role-to-user system:image-puller system:serviceaccount:dpdk-testing:default --namespace=cnftests
    $ oc policy add-role-to-user system:image-puller system:serviceaccount:sriov-conformance-testing:default --namespace=cnftests
    $ oc policy add-role-to-user system:image-puller system:serviceaccount:xt-u32-testing:default --namespace=cnftests
    $ oc policy add-role-to-user system:image-puller system:serviceaccount:vrf-testing:default --namespace=cnftests
    $ oc policy add-role-to-user system:image-puller system:serviceaccount:gatekeeper-testing:default --namespace=cnftests
    $ oc policy add-role-to-user system:image-puller system:serviceaccount:ovs-qos-testing:default --namespace=cnftests
  5. Retrieve the docker secret name and auth token:

    SECRET=$(oc -n cnftests get secret | grep builder-docker | awk {'print $1'}
    TOKEN=$(oc -n cnftests get secret $SECRET -o jsonpath="{.data['\.dockercfg']}" | base64 --decode | jq '.["image-registry.openshift-image-registry.svc:5000"].auth')
  6. Write a dockerauth.json similar to this:

    echo "{\"auths\": { \"$REGISTRY\": { \"auth\": $TOKEN } }}" > dockerauth.json
  7. Do the mirroring:

    $ docker run -v $(pwd)/:/kubeconfig -e KUBECONFIG=/kubeconfig/kubeconfig registry.redhat.io/openshift4/cnf-tests-rhel8:v4.9 /usr/bin/mirror -registry $REGISTRY/cnftests |  oc image mirror --insecure=true -a=$(pwd)/dockerauth.json -f -
  8. Run the tests:

    $