Network Interface Card (NIC) SR-IOV hardware is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.

About SR-IOV hardware on OpenShift Container Platform

OpenShift Container Platform includes the capability to use SR-IOV hardware on your nodes. You can attach SR-IOV virtual function (VF) interfaces to Pods on nodes with SR-IOV hardware.

You can use the OpenShift Container Platform console to install SR-IOV by deploying the SR-IOV Network Operator. The SR-IOV Network Operator creates and manages the components of the SR-IOV stack. The Operator provides following features:

  • Discover the SR-IOV network device in cluster.

  • Initialize the supported SR-IOV NIC models on nodes.

  • Provision the SR-IOV network device plug-in on nodes.

  • Provision the SR-IOV CNI plug-in executable on nodes.

  • Provision the Network Resources Injector in cluster.

  • Manage configuration of SR-IOV network device plug-in.

  • Generate NetworkAttachmentDefinition custom resources (CR) for the SR-IOV CNI plug-in.

Here’s the function of each above mentioned SR-IOV components.

  • The SR-IOV network device plug-in is a Kubernetes device plug-in for discovering, advertising, and allocating SR-IOV network virtual function (VF) resources. Device plug-ins are used in Kubernetes to enable the use of limited resources, typically in physical devices. Device plug-ins give the Kubernetes scheduler awareness of resource availability, so the scheduler can schedule Pods on nodes with sufficient resources.

  • The SR-IOV CNI plug-in plumbs VF interfaces allocated from the SR-IOV device plug-in directly into a Pod.

  • The Network Resources Injector is a Kubernetes Dynamic Admission Controller Webhook that provides functionality for patching Kubernetes Pod specifications with requests and limits for custom network resources such as SR-IOV VFs.

The Network Resources Injector is enabled by default and cannot be disabled.

Supported devices

The following Network Interface Card (NIC) models are supported in OpenShift Container Platform:

  • Intel XXV710-DA2 25G card with vendor ID 0x8086 and device ID 0x158b

  • Mellanox MT27710 Family [ConnectX-4 Lx] 25G card with vendor ID 0x15b3 and device ID 0x1015

  • Mellanox MT27800 Family [ConnectX-5] 100G card with vendor ID 0x15b3 and device ID 0x1017

Automated discovery of SR-IOV network devices

The SR-IOV Network Operator will search your cluster for SR-IOV capable network devices on worker nodes. The Operator creates and updates a SriovNetworkNodeState Custom Resource (CR) for each worker node that provides a compatible SR-IOV network device.

One CR is created for each worker node, and shares the same name as the node. The .spec.interfaces list provides information about the network devices on a node.

Do not modify a SriovNetworkNodeState CR. The Operator creates and manages these resources automatically.

The following is an example of a SriovNetworkNodeState CR created by the SR-IOV Network Operator:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodeState
metadata:
  name: node-25 (1)
  namespace: openshift-sriov-network-operator
  ownerReferences:
  - apiVersion: sriovnetwork.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: SriovNetworkNodePolicy
    name: default
spec:
  dpConfigVersion: d41d8cd98f00b204e9800998ecf8427e
status:
  interfaces: (2)
  - deviceID: "1017"
    driver: mlx5_core
    mtu: 1500
    name: ens785f0
    pciAddress: "0000:18:00.0"
    totalvfs: 8
    vendor: 15b3
  - deviceID: "1017"
    driver: mlx5_core
    mtu: 1500
    name: ens785f1
    pciAddress: "0000:18:00.1"
    totalvfs: 8
    vendor: 15b3
  - deviceID: 158b
    driver: i40e
    mtu: 1500
    name: ens817f0
    pciAddress: 0000:81:00.0
    totalvfs: 64
    vendor: "8086"
  - deviceID: 158b
    driver: i40e
    mtu: 1500
    name: ens817f1
    pciAddress: 0000:81:00.1
    totalvfs: 64
    vendor: "8086"
  - deviceID: 158b
    driver: i40e
    mtu: 1500
    name: ens803f0
    pciAddress: 0000:86:00.0
    totalvfs: 64
    vendor: "8086"
  syncStatus: Succeeded
1 The value for the name parameter is the same as the name of the worker node.
2 The interfaces collection includes a list of all of the SR-IOV devices discovered by the Operator on the worker node.

Example use of a virtual function (VF) in a Pod

You can run a remote direct memory access (RDMA) or a Data Plane Development Kit (DPDK) application in a Pod with SR-IOV VF attached.

This example shows a Pod using a VF in RDMA mode:

apiVersion: v1
kind: Pod
metadata:
  name: rdma-app
  annotations:
    k8s.v1.cni.cncf.io/networks: sriov-rdma-mlnx
spec:
  containers:
  - name: testpmd
    image: <RDMA_image>
    imagePullPolicy: IfNotPresent
    securityContext:
     capabilities:
        add: ["IPC_LOCK"]
    command: ["sleep", "infinity"]

The following example shows a Pod with a VF in DPDK mode:

apiVersion: v1
kind: Pod
metadata:
  name: dpdk-app
  annotations:
    k8s.v1.cni.cncf.io/networks: sriov-dpdk-net
spec:
  containers:
  - name: testpmd
    image: <DPDK_image>
    securityContext:
     capabilities:
        add: ["IPC_LOCK"]
    volumeMounts:
    - mountPath: /dev/hugepages
      name: hugepage
    resources:
      limits:
        memory: "1Gi"
        cpu: "2"
        hugepages-1Gi: "4Gi"
      requests:
        memory: "1Gi"
        cpu: "2"
        hugepages-1Gi: "4Gi"
    command: ["sleep", "infinity"]
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages

An optional library is available to aid the application running in a container in gathering network information associated with a pod. This library is called 'app-netutil'. See the library’s source code in the app-netutil GitHub repo.

This library is intended to ease the integration of the SR-IOV VFs in DPDK mode into the container. The library provides both a GO API and a C API, as well as examples of using both languages.

There is also a sample Docker image, 'dpdk-app-centos', which can run one of the following DPDK sample applications based on an environmental variable in the pod-spec: l2fwd, l3wd or testpmd. This Docker image provides an example of integrating the 'app-netutil' into the container image itself. The library can also integrate into an init-container which collects the desired data and passes the data to an existing DPDK workload.

Installing SR-IOV Network Operator

As a cluster administrator, you can install the SR-IOV Network Operator using the OpenShift Container Platform CLI or the web console.

Installing the Operator using the CLI

As a cluster administrator, you can install the Operator using the CLI.

Prerequisites
  • A cluster installed on bare-metal hardware with nodes that have hardware that supports SR-IOV.

  • The OpenShift Container Platform Command-line Interface (CLI), commonly known as oc.

  • Log in as a user with cluster-admin privileges.

Procedure
  1. Create a namespace for the SR-IOV Network Operator by completing the following actions:

    1. Create the following Namespace Custom Resource (CR) that defines the openshift-sriov-network-operator namespace, and then save the YAML in the sriov-namespace.yaml file:

      apiVersion: v1
      kind: Namespace
      metadata:
        name: openshift-sriov-network-operator
        labels:
          openshift.io/run-level: "1"
    2. Create the namespace by running the following command:

      $ oc create -f sriov-namespace.yaml
  2. Install the SR-IOV Network Operator in the namespace you created in the previous step by creating the following objects:

    1. Create the following OperatorGroup CR and save the YAML in the sriov-operatorgroup.yaml file:

      apiVersion: operators.coreos.com/v1
      kind: OperatorGroup
      metadata:
        name: sriov-network-operators
        namespace: openshift-sriov-network-operator
      spec:
        targetNamespaces:
        - openshift-sriov-network-operator
    2. Create the OperatorGroup CR by running the following command:

      $ oc create -f sriov-operatorgroup.yaml
    3. Run the following command to get the channel value required for the next step.

      $ oc get packagemanifest sriov-network-operator -n openshift-marketplace -o jsonpath='{.status.channels[].name}'
      
      alpha
    4. Create the following Subscription CR and save the YAML in the sriov-sub.yaml file:

      Example Subscription
      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: sriov-network-operator-subsription
        namespace: openshift-sriov-network-operator
      spec:
        channel: <channel> (1)
        name: sriov-network-operator
        source: redhat-operators (2)
        sourceNamespace: openshift-marketplace
      1 Specify the value from you obtained in the previous step for the .status.channels[].name parameter.
      2 You must specify the redhat-operators value.
    5. Create the Subscription object by running the following command:

      $ oc create -f sriov-sub.yaml
    6. Change to the openshift-sriov-network-operator project:

      $ oc project openshift-sriov-network-operator
      
      Now using project "openshift-sriov-network-operator"

Installing the Operator using the web console

As a cluster administrator, you can install the Operator using the web console.

You have to create the Namespace CR and OperatorGroup CR as mentioned in the previous section.

Procedure
  1. Install the SR-IOV Network Operator using the OpenShift Container Platform web console:

    1. In the OpenShift Container Platform web console, click OperatorsOperatorHub.

    2. Choose SR-IOV Network Operator from the list of available Operators, and then click Install.

    3. On the Create Operator Subscription page, under A specific namespace on the cluster select openshift-sriov-network-operator. Then, click Subscribe.

  2. Optional: Verify that the SR-IOV Network Operator installed successfully:

    1. Switch to the OperatorsInstalled Operators page.

    2. Ensure that SR-IOV Network Operator is listed in the openshift-sriov-network-operator project with a Status of InstallSucceeded.

      During installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.

      If the operator does not appear as installed, to troubleshoot further:

      • Go to the OperatorsInstalled Operators page and inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.

      • Go to the WorkloadsPods page and check the logs for Pods in the openshift-sriov-network-operator project.

Configuring the SR-IOV Network Operator

The SR-IOV Network Operator adds the SriovOperatorConfig.sriovnetwork.openshift.io Custom Resource Definition (CRD) to OpenShift Container Platform. You can configure the SR-IOV software compoenents by modifying the default SriovOperatorConfig Custom Resource (CR).

There is one Custom Resource of SriovOperatorConfig being created when the operator is up. Change the operator configuration by modifing this pre-created Custom Resource.

Disabling Network Resource Injector

The Network Resources Injector is a Kubernetes Dynamic Admission Controller application that provides functionality of patching pod specifications with requests and limits of custom network resources managed by the SRIOV network device plugin

By default the Network Resource Injector will be enabled by the operator. Follow this procedure to disable it.

Prerequisites
  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.

  • Log in as a user with cluster-admin privileges.

  • You must have installed the SR-IOV Operator.

Procedure
  1. Update the default SriovOperatorConfig Custom Resource

    oc patch sriovoperatorconfig default --patch '{ "spec": { "enableInjector": false}}' --type=merge

Disabling SRIOV Operator Admission Control Webhook

The SRIOV Operator Admission Control Webhook a Kubernetes Dynamic Admission Controller application that:

  • Validating the SriovNetworkNodePolicy Custom Resource when it being created or updated.

  • Mutating the SriovNetworkNodePolicy Custom Resource by setting the default value to priority and deviceType when it being created or updated.

By default the SRIOV Operator Admission Control Webhook will be enabled by the operator. Follow this procedure to disable it.

Prerequisites
  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.

  • Log in as a user with cluster-admin privileges.

  • You must have installed the SR-IOV Operator.

Procedure
  1. Update the default SriovOperatorConfig Custom Resource

    oc patch sriovoperatorconfig default --patch '{ "spec": { "enableOperatorWebhook": false}}' --type=merge

Configuring Custom NodeSelector for SRIOV Network Config Daemon

The SRIOV Network Config Daemon is in charge of discover and config the SRIOV network devices on the nodes.

By default, it will be deployed to all the worker nodes in the cluster. You can follow this procedure to select the nodes, where the SRIOV Network Config Daemon shall be deployed to, with node labels.

Procedure
  1. Update the default SriovOperatorConfig Custom Resource

    oc patch sriovoperatorconfig default --patch '[{"op": "replace", "path": "/spec/configDaemonNodeSelector", "value": {<node-label>}}]' --type=json

Configuring SR-IOV network devices

The SR-IOV Network Operator adds the SriovNetworkNodePolicy.sriovnetwork.openshift.io Custom Resource Definition (CRD) to OpenShift Container Platform. You can configure the SR-IOV network device by creating a SriovNetworkNodePolicy Custom Resource (CR).

When applying the configuration specified in a SriovNetworkNodePolicy CR, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It may take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.

After the configuration update is applied, all the Pods in sriov-network-operator namespace will change to a Running status.

Prerequisites
  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.

  • Log in as a user with cluster-admin privileges.

  • You must have installed the SR-IOV Operator.

Procedure
  1. Create the following SriovNetworkNodePolicy CR, and then save the YAML in the <name>-sriov-node-network.yaml file. Replace <name> with the name for this configuration.

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: <name> (1)
  namespace: openshift-sriov-network-operator (2)
spec:
  resourceName: <sriov_resource_name> (3)
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true" (4)
  priority: <priority> (5)
  mtu: <mtu> (6)
  numVfs: <num> (7)
  nicSelector: (8)
    vendor: "<vendor_code>" (9)
    deviceID: "<device_id>" (10)
    pfName: ["<pf_name>", ...] (11)
    rootDevices: ["<pci_bus_id>", "..."] (12)
  deviceType: <device_type> (13)
  isRdma: false (14)
1 Specify a name for the CR.
2 Specify the namespace where the SR-IOV Operator is installed.
3 Specify the resource name of the SR-IOV device plug-in. The prefix openshift.io/ will be added when it’s referred in Pod spec. You can create multiple SriovNetworkNodePolicy CRs for a resource name.
4 Specify the node selector to select which node to be configured. User can choose to label the nodes manually or with tools like Kubernetes Node Feature Discovery. Only SR-IOV network devices on selected nodes will be configured. The SR-IOV CNI plug-in and device plug-in will be only deployed on selected nodes.
5 Optional. Specify an integer value between 0 and 99. A larger number gets lower priority, so a priority of 99 is lower than a priority of 10. The default value is 99.
6 Optional. Specify a value for the maximum transmission unit (MTU) of the virtual function. The maximum MTU value can vary for different NIC models.
7 Specify the number of the virtual functions (VF) to create for the SR-IOV physical network device. For an Intel Network Interface Card (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than 128.
8 The nicSelector mapping selects the Ethernet device for the Operator to configure. You do not need to specify values for all the parameters. It is recommended to identify the Ethernet adapter with enough precision to minimize the possibility of selecting an Ethernet device unintentionally. If you specify rootDevices, you must also specify a value for vendor, deviceID, or pfName. If you specify both pfName and rootDevices at the same time, ensure that they point to an identical device.
9 Optional. Specify the vendor hex code of the SR-IOV network device. The only allowed values are either 8086 or 15b3.
10 Optional. Specify the device hex code of SR-IOV network device. The only allowed values are 158b, 1015, 1017.
11 Optional. The parameter accepts an array of one or more physical function (PF) names for the Ethernet device.
12 The parameter accepts an array of one or more PCI bus addresses for the physical function of the Ethernet device. Provide the address in the following format: 0000:02:00.1.
13 Optional. Specify the driver type for the virtual functions. You can specify one of the following values: netdevice or vfio-pci. The default value is netdevice.

For a Mellanox card to work in dpdk mode, use the netdevice driver type.

14 Optional. Specify whether to enable RDMA mode. The default value is false. Only RDMA over Converged Ethernet (RoCE) mode is supported on Mellanox Ethernet adapters.

If RDMA flag is set to true, you can continue to use the RDMA enabled VF as a normal network device. A device can be used in either mode.

  1. Create the CR by running the following command:

    $ oc create -f <filename> (1)
    1 Replace <filename> with the name of the file you created in the previous step.

Configuring SR-IOV additional network

You can configure an additional network that uses SR-IOV hardware by creating a SriovNetwork Custom Resource (CR). When you create a SriovNetwork CR, the SR-IOV Operator automatically creates a NetworkAttachmentDefinition CR.

Do not modify or delete a SriovNetwork Custom Resource (CR) if it is attached to any Pods in the running state.

Prerequisites
  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.

  • Log in as a user with cluster-admin privileges.

Procedure
  1. Create the following SriovNetwork CR, and then save the YAML in the <name>-sriov-network.yaml file. Replace <name> with a name for this additional network.

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: <name> (1)
  namespace: openshift-sriov-network-operator (2)
spec:
  networkNamespace: <target_namespace> (3)
  ipam: <ipam> (4)
  vlan: <vlan> (5)
  resourceName: <sriov_resource_name> (6)
  linkState: <link_state> (7)
  maxTxRate: <max_tx_rate> (8)
  minTxRate: <min_rx_rate> (9)
  vlanQoS: <vlan_qos> (10)
  spoofChk: "<spoof_check>" (11)
  trust: "<trust_vf>" (12)
  capabilities: <capabilities> (13)
1 Replace <name> with a name for the CR. The Operator will create a NetworkAttachmentDefinition CR with same name.
2 Specify the namespace where the SR-IOV Operator is installed.
3 Optional. Replace <target_namespace> with the namespace where the NetworkAttachmentDefinition CR will be created. The default value is openshift-sriov-network-operator.
4 Optional. Replace <ipam> a configuration object for the ipam CNI plug-in as a YAML block scalar. The plug-in manages IP address assignment for the attachment definition.
5 Optional. Replace <vlan> with a Virtual LAN (VLAN) ID for the additional network. The integer value must be from 0 to 4095. The default value is 0.
6 Replace <sriov_resource_name> with the value for the .spec.resourceName parameter from the SriovNetworkNodePolicy CR that defines the SR-IOV hardware for this additional network.
7 Optional. Replace <link_state> with the link state of Virtual Function (VF). Allowed value are enable, disable and auto.
8 Optional. Replace <max_tx_rate> with a maximum transmission rate, in Mbps, for the VF.
9 Optional. Replace <min_tx_rate> with a minimum transmission rate, in Mbps, for the VF. This value should always be less than or equal to Maximum transmission rate.

Intel NICs do not support the minTxRate parameter. For more information, see BZ#1772847.

10 Optional. Replace <vlan_qos> with an IEEE 802.1p priority level for the VF. The default value is 0.
11 Optional. Replace <spoof_check> with the spoof check mode of the VF. The allowed values are the strings "on" and "off".

You must enclose the value you specify in quotes or the CR will be rejected by the SR-IOV Network Operator.

12 Optional. Replace <trust_vf> with the trust mode of the VF. The allowed values are the strings "on" and "off".

You must enclose the value you specify in quotes or the CR will be rejected by the SR-IOV Network Operator.

13 Optional. Replace <capabilities> with the capabilities to configure for this network. You can specify "{ "ips": true }" to enable IP address support or "{ "mac": true }" to enable MAC address support.
  1. Create the CR by running the following command:

    $ oc create -f <filename> (1)
    1 Replace <filename> with the name of the file you created in the previous step.
  2. Optional: Confirm that the NetworkAttachmentDefinition CR associated with the SriovNetwork CR that you created in the previous step exists by running the following command. Replace <namespace> with the namespace you specified in the SriovNetwork CR.

    oc get net-attach-def -n <namespace>

Configuration for ipam CNI plug-in

The IP address management (IPAM) CNI plug-in manages IP address assignment for other CNI plug-ins. You can configure ipam for either static IP address assignment or dynamic IP address assignment by using DHCP. The DHCP server you specify must be reachable from the additional network.

The following JSON configuration object describes the parameters that you can set.

If you set the type parameter to the DHCP value, you cannot set any other parameters.
ipam CNI plug-in JSON configuration object
{
  "ipam": {
    "type": "<type>", (1)
    "addresses": [ (2)
      {
        "address": "<address>", (3)
        "gateway": "<gateway>" (4)
      }
    ],
    "routes": [ (5)
      {
        "dst": "<dst>" (6)
        "gw": "<gw>" (7)
      }
    ],
    "dns": { (8)
      "nameservers": ["<nameserver>"], (9)
      "domain": "<domain>", (10)
      "search": ["<search_domain>"] (11)
    }
  }
}
1 Specify static to configure the plug-in to manage IP address assignment. Specify DHCP to allow a DHCP server to manage IP address assignment. You cannot specify any additional parameters if you specify a value of DHCP.
2 An array describing IP addresses to assign to the virtual interface. Both IPv4 and IPv6 IP addresses are supported.
3 A block of IP addresses that you specify in CIDR format to assign to Pods on a worker node, such as 10.1.1.0/24.
4 The default gateway to route egress network traffic to.
5 An array describing routes to configure inside the Pod.
6 The IP address range in CIDR format.
7 The gateway to use to route network traffic to.
8 The DNS configuration. Optional.
9 An of array of one or more IP addresses for to send DNS queries to.
10 The default domain to append to a host name. For example, if the domain is set to example.com, a DNS lookup query for example-host will be rewritten as example-host.example.com.
11 An array of domain names to append to an unqualified host name, such as example-host, during a DNS lookup query.

Static IP address assignment configuration example

You can configure ipam for static IP address assignment:

{
  "ipam": {
    "type": "static",
      "addresses": [
        {
          "address": "191.168.1.1/24"
        }
      ]
  }
}

Dynamic IP address assignment configuration example

You can configure ipam for DHCP:

{
  "ipam": {
    "type": "DHCP"
  }
}

Adding a Pod to an additional network

You can add a Pod to an additional network. The Pod continues to send normal cluster related network traffic over the default network.

The Network Resources Injector will inject the resource parameter into the Pod CR automatically if a NetworkAttachmentDefinition CR associated with the SR-IOV CNI plug-in is specified.

Prerequisites
  • The Pod must be in the same namespace as the additional network.

  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.

  • You must log in to the cluster.

  • You must have the SR-IOV Operator installed and a SriovNetwork CR defined.

Procedure

To add a Pod with additional networks, complete the following steps:

  1. Create the Pod resource definition and add the k8s.v1.cni.cncf.io/networks parameter to the Pod metadata mapping. The k8s.v1.cni.cncf.io/networks accepts a comma separated string of one or more NetworkAttachmentDefinition Custom Resource (CR) names:

    metadata:
      annotations:
        k8s.v1.cni.cncf.io/networks: <network>[,<network>,...] (1)
    1 Replace <network> with the name of the additional network to associate with the Pod. To specify more than one additional network, separate each network with a comma. Do not include whitespace between the comma. If you specify the same additional network multiple times, that Pod will have multiple network interfaces attached to that network.

    In the following example, two additional networks are attached to the Pod:

    apiVersion: v1
    kind: Pod
    metadata:
      name: example-pod
      annotations:
        k8s.v1.cni.cncf.io/networks: net1,net2
    spec:
      containers:
      - name: example-pod
        command: ["/bin/bash", "-c", "sleep 2000000000000"]
        image: centos/tools
  2. Create the Pod by running the following command:

    $ oc create -f pod.yaml
  3. Optional: Confirm that the annotation exists in the Pod CR by running the following command. Replace <name> with the name of the Pod.

    $ oc get pod <name> -o yaml

    In the following example, the example-pod Pod is attached to the net1 additional network:

    $ oc get pod example-pod -o yaml
    apiVersion: v1
    kind: Pod
    metadata:
      annotations:
        k8s.v1.cni.cncf.io/networks: macvlan-bridge
        k8s.v1.cni.cncf.io/networks-status: |- (1)
          [{
              "name": "openshift-sdn",
              "interface": "eth0",
              "ips": [
                  "10.128.2.14"
              ],
              "default": true,
              "dns": {}
          },{
              "name": "macvlan-bridge",
              "interface": "net1",
              "ips": [
                  "20.2.2.100"
              ],
              "mac": "22:2f:60:a5:f8:00",
              "dns": {}
          }]
      name: example-pod
      namespace: default
    spec:
      ...
    status:
      ...
    1 The k8s.v1.cni.cncf.io/networks-status parameter is a JSON array of objects. Each object describes the status of an additional network attached to the Pod. The annotation value is stored as a plain text value.

Configuring static MAC and IP addresses on additional SR-IOV networks

You can configure static MAC and IP addresses on additional an SR-IOV network by specifying CNI runtimeConfig data in a pod annotation.

Prerequisites
  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.

  • Log in as a user with cluster-admin privileges when creating the SriovNetwork CR.

Procedure
  1. Create the following SriovNetwork CR, and then save the YAML in the <name>-sriov-network.yaml file. Replace <name> with a name for this additional network.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetwork
    metadata:
      name: <name> (1)
      namespace: sriov-network-operator (2)
    spec:
      networkNamespace: <target_namespace> (3)
      ipam: '{"type": "static"}' (4)
      capabilities: '{"mac": true, "ips": true}' (5)
      resourceName: <sriov_resource_name> (6)
    1 Replace <name> with a name for the CR. The Operator will create a NetworkAttachmentDefinition CR with same name.
    2 Specify the namespace where the SR-IOV Operator is installed.
    3 Replace <target_namespace> with the namespace where the NetworkAttachmentDefinition CR will be created.
    4 Specify static type for the ipam CNI plug-in as a YAML block scalar.
    5 Specify mac and ips capabilities to true.
    6 Replace <sriov_resource_name> with the value for the .spec.resourceName parameter from the SriovNetworkNodePolicy CR that defines the SR-IOV hardware for this additional network.
  2. Create the CR by running the following command:

    $ oc create -f <filename> (1)
    1 Replace <filename> with the name of the file you created in the previous step.
  3. Optional: Confirm that the NetworkAttachmentDefinition CR associated with the SriovNetwork CR that you created in the previous step exists by running the following command. Replace <namespace> with the namespace you specified in the SriovNetwork CR.

    oc get net-attach-def -n <namespace>

Do not modify or delete a SriovNetwork Custom Resource (CR) if it is attached to any Pods in the running state.

  1. Create the following SR-IOV pod spec, and then save the YAML in the <name>-sriov-pod.yaml file. Replace <name> with a name for this pod.

    apiVersion: v1
    kind: Pod
    metadata:
      name: sample-pod
      annotations:
        k8s.v1.cni.cncf.io/networks: '[
            {
                    "name": "<name>", (1)
                    "mac": "20:04:0f:f1:88:01", (2)
                    "ips": ["192.168.10.1/24", "2001::1/64"] (3)
            }
    ]'
    spec:
      containers:
      - name: sample-container
        image: <image>
        imagePullPolicy: IfNotPresent
        command: ["sleep", "infinity"]
    1 Replace <name> with then name of the SR-IOV network attachment definition CR.
    2 Specify the mac address for the SR-IOV device which is allocated from the resource type defined in the SR-IOV network attachment definition CR.
    3 Specify the IPv4 and/or IPv6 addresses for the SR-IOV device which is allocated from the resource type defined in the SR-IOV network attachment definition CR.
  2. Create the sample SR-IOV pod by running the following command:

    $ oc create -f <filename> (1)
    1 Replace <filename> with the name of the file you created in the previous step.
  3. Optional: Confirm that mac and ips addresses are applied to the SR-IOV device by running the following command. Replace <namespace> with the namespace you specified in the SriovNetwork CR.

    oc exec sample-pod -n <namespace> -- ip addr show

Creating a non-uniform memory access (NUMA) aligned SR-IOV pod

You can create a NUMA aligned SR-IOV pod by restricting SR-IOV and the CPU resources allocated from the same NUMA node with restricted or single-numa-node Topology Manager polices.

Prerequisites
  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.

  • Enable a LatencySensitive profile and configure the CPU Manager policy to static.

Procedure
  1. Create the following SR-IOV pod spec, and then save the YAML in the <name>-sriov-pod.yaml file. Replace <name> with a name for this pod.

    The following example shows a SR-IOV pod spec:

    apiVersion: v1
    kind: Pod
    metadata:
      name: sample-pod
      annotations:
        k8s.v1.cni.cncf.io/networks: <name> (1)
    spec:
      containers:
      - name: sample-container
        image: <image> (2)
        command: ["sleep", "infinity"]
        resources:
          limits:
            memory: "1Gi" (3)
            cpu: "2" (4)
          requests:
            memory: "1Gi"
            cpu: "2"
    1 Replace <name> with the name of the SR-IOV network attachment definition CR.
    2 Replace <image> with the name of the sample-pod image.
    3 To create the SR-IOV pod with guaranteed QoS, set memory limits equal to memory requests.
    4 To create the SR-IOV pod with guaranteed QoS, set cpu limits equals to cpu requests.
  2. Create the sample SR-IOV pod by running the following command:

    $ oc create -f <filename> (1)
    1 Replace <filename> with the name of the file you created in the previous step.
  3. Optional: Confirm that the sample-pod is configured with guaranteed QoS.

    oc describe pod sample-pod
  4. Optional: Confirm that the sample-pod is allocated with exclusive CPUs.

    oc exec sample-pod -- cat /sys/fs/cgroup/cpuset/cpuset.cpus
  5. Optional: Confirm that the SR-IOV device and CPUs that are allocated for the sample-pod are on the same NUMA node.

    oc exec sample-pod -- cat /sys/fs/cgroup/cpuset/cpuset.cpus

Configuring performance multicast

Openshift SDN supports multicast between Pods on the default network. This is best used for low-bandwidth coordination or service discovery, and not a high-bandwidth solution. For applications of streaming media, such as IPTV and multipoint videoconferencing, you can utilize SR-IOV to provide near-native performance.

When using additional SR-IOV interfaces for multicast:

  • Multicast packages must be sent or received by a Pod through the additional SR-IOV interface.

  • The physical network which connects the SR-IOV interfaces decides the multicast routing and topology, which is not controlled by Openshift.

Using an SR-IOV interface for multicast

The follow procedure creates an example SR-IOV interface for multicast.

Prerequisites
  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.

  • You must log in to the cluster with a user that has the cluster-admin role.

Procedure
  1. Create a SriovNetworkNodePolicy CR

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetworkNodePolicy
    metadata:
      name: policy-example
      namespace: openshift-sriov-network-operator
    spec:
      resourceName: example
      nodeSelector:
        feature.node.kubernetes.io/network-sriov.capable: "true"
      numVfs: 4
      nicSelector:
        vendor: "8086"
        pfName: ['ens803f0']
        rootDevices: ['0000:86:00.0']
  2. Create a SriovNetwork CR

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetwork
    metadata:
      name: net-example
      namespace: openshift-sriov-network-operator
    spec:
      networkNamespace: default
      ipam: |
        {
          "type": "host-local", (1)
          "subnet": "10.56.217.0/24",
          "rangeStart": "10.56.217.171",
          "rangeEnd": "10.56.217.181",
          "routes": [
            {"dst": "224.0.0.0/5"},
            {"dst": "232.0.0.0/5"}
          ],
          "gateway": "10.56.217.1"
        }
      resourceName: example
    1 Make sure to provision the following default routes through your DHCP server 224.0.0.0/5, 232.0.0.0/5, if you choose to use DHCP as IPAM. This is to override the static multicast route set by Openshift-SDN.
  3. User can now create a Pod with multicast application

    apiVersion: v1
    kind: Pod
    metadata:
      name: testpmd
      namespace: default
      annotations:
        k8s.v1.cni.cncf.io/networks: nic1
    spec:
      containers:
      - name: example
        image: rhel7:latest
        securityContext:
          capabilities:
            add: ["NET_ADMIN"] (1)
        command: [ "sleep", "infinity"]
    1 The NET_ADMIN capability is only required if your application needs to assign the multicast IP address to the SR-IOV interface. Otherwise, it can be omitted.

Examples of using virtual functions in DPDK and RDMA modes

The Data Plane Development Kit (DPDK) is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs), might not be functionally complete, and Red Hat does not recommend to use them for production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information on Red Hat Technology Preview features support scope, see https://access.redhat.com/support/offerings/techpreview/.

Prerequisites
  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.

  • Log in as a user with cluster-admin privileges.

  • You must have installed the SR-IOV Operator.

Example use of virtual function (VF) in DPDK mode with Intel NICs

Procedure
  1. Create the following SriovNetworkNodePolicy CR, and then save the YAML in the intel-dpdk-node-policy.yaml file.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetworkNodePolicy
    metadata:
      name: intel-dpdk-node-policy
      namespace: openshift-sriov-network-operator
    spec:
      resourceName: intelnics
      nodeSelector:
        feature.node.kubernetes.io/network-sriov.capable: "true"
      priority: <priority>
      numVfs: <num>
      nicSelector:
        vendor: "8086"
        deviceID: "158b"
        pfName: ["<pf_name>", ...]
        rootDevices: ["<pci_bus_id>", "..."]
      deviceType: vfio-pci (1)
1 Specify the driver type for the virtual functions to vfio-pci.

Please refer to the Configuring SR-IOV network devices section for a detailed explanation on each option in SriovNetworkNodePolicy.

When applying the configuration specified in a SriovNetworkNodePolicy CR, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It may take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.

After the configuration update is applied, all the Pods in openshift-sriov-network-operator namespace will change to a Running status.

  1. Create the SriovNetworkNodePolicy CR by running the following command:

    $ oc create -f intel-dpdk-node-policy.yaml
  2. Create the following SriovNetwork CR, and then save the YAML in the intel-dpdk-network.yaml file.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetwork
    metadata:
      name: intel-dpdk-network
      namespace: openshift-sriov-network-operator
    spec:
      networkNamespace: <target_namespace>
      ipam: "{}" (1)
      vlan: <vlan>
      resourceName: intelnics
1 Specify an empty object "{}" for the ipam CNI plug-in. DPDK works in userspace mode and does not require an IP address.

Please refer to the Configuring SR-IOV additional network section for a detailed explanation on each option in SriovNetwork.

  1. Create the SriovNetworkNodePolicy CR by running the following command:

    $ oc create -f intel-dpdk-network.yaml
  2. Create the following Pod spec, and then save the YAML in the intel-dpdk-pod.yaml file.

    apiVersion: v1
    kind: Pod
    metadata:
      name: dpdk-app
      namespace: <target_namespace> (1)
      annotations:
        k8s.v1.cni.cncf.io/networks: intel-dpdk-network
    spec:
      containers:
      - name: testpmd
        image: <DPDK_image> (2)
        securityContext:
         capabilities:
            add: ["IPC_LOCK"] (3)
        volumeMounts:
        - mountPath: /dev/hugepages (4)
          name: hugepage
        resources:
          limits:
            openshift.io/intelnics: "1" (5)
            memory: "1Gi"
            cpu: "4" (6)
            hugepages-1Gi: "4Gi" (7)
          requests:
            openshift.io/intelnics: "1"
            memory: "1Gi"
            cpu: "4"
            hugepages-1Gi: "4Gi"
        command: ["sleep", "infinity"]
      volumes:
      - name: hugepage
        emptyDir:
          medium: HugePages
1 Specify the same target_namespace where the SriovNetwork CR intel-dpdk-network is created. If you would like to create the Pod in a different namespace, change target_namespace in both the Pod spec and the SriovNetowrk CR.
2 Specify the DPDK image which includes your application and the DPDK library used by application.
3 Specify the IPC_LOCK capability which is required by the application to allocate hugepage memory inside container.
4 Mount a hugepage volume to the DPDK Pod under /dev/hugepages. The hugepage volume is backed by the emptyDir volume type with the medium being Hugepages.
5 (optional) Specify the number of DPDK devices allocated to DPDK Pod. This resource request and limit, if not explicitly specified, will be automatically added by the SR-IOV network resource injector. The SR-IOV network resource injector is an admission controller component managed by the SR-IOV Operator. It is enabled by default and can be disabled by setting Injector option to false in the default SriovOperatorConfig CR.
6 Specify the number of CPUs. The DPDK Pod usually requires exclusive CPUs to be allocated from the kubelet. This is achieved by setting CPU Manager policy to static and creating a Pod with Guaranteed QoS.
7 Specify hugepage size hugepages-1Gi or hugepages-2Mi and the quantity of hugepages that will be allocated to the DPDK Pod. Configure 2Mi and 1Gi hugepages separately. Configuring 1Gi hugepage requires adding kernel arguments to Nodes. For example, adding kernel arguments default_hugepagesz=1GB, hugepagesz=1G and hugepages=16 will result in 16*1Gi hugepages be allocated during system boot.
  1. Create the DPDK Pod by running the following command:

    $ oc create -f intel-dpdk-pod.yaml

Example use of a virtual function in DPDK mode with Mellanox NICs

Procedure
  1. Create the following SriovNetworkNodePolicy CR, and then save the YAML in the mlx-dpdk-node-policy.yaml file.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetworkNodePolicy
    metadata:
      name: mlx-dpdk-node-policy
      namespace: openshift-sriov-network-operator
    spec:
      resourceName: mlxnics
      nodeSelector:
        feature.node.kubernetes.io/network-sriov.capable: "true"
      priority: <priority>
      numVfs: <num>
      nicSelector:
        vendor: "15b3"
        deviceID: "1015" (1)
        pfName: ["<pf_name>", ...]
        rootDevices: ["<pci_bus_id>", "..."]
      deviceType: netdevice (2)
      isRdma: true (3)
1 Specify the device hex code of the SR-IOV network device. The only allowed values for Mellanox cards are 1015, 1017.
2 Specify the driver type for the the virtual functions to netdevice. Mellanox SR-IOV VF can work in DPDK mode without using the vfio-pci device type. VF device appears as a kernel network interface inside a container.
3 Enable RDMA mode. This is required by Mellanox cards to work in DPDK mode.

Please refer to Configuring SR-IOV network devices section for detailed explanation on each option in SriovNetworkNodePolicy.

When applying the configuration specified in a SriovNetworkNodePolicy CR, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It may take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.

After the configuration update is applied, all the Pods in the openshift-sriov-network-operator namespace will change to a Running status.

  1. Create the SriovNetworkNodePolicy CR by running the following command:

    $ oc create -f mlx-dpdk-node-policy.yaml
  2. Create the following SriovNetwork CR, and then save the YAML in the mlx-dpdk-network.yaml file.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetwork
    metadata:
      name: mlx-dpdk-network
      namespace: openshift-sriov-network-operator
    spec:
      networkNamespace: <target_namespace>
      ipam: |- (1)
        ...
      vlan: <vlan>
      resourceName: mlxnics
1 Specify a configuration object for the ipam CNI plug-in as a YAML block scalar. The plug-in manages IP address assignment for the attachment definition.

Please refer to Configuring SR-IOV additional network section for detailed explanation on each option in SriovNetwork.

  1. Create the SriovNetworkNodePolicy CR by running the following command:

    $ oc create -f mlx-dpdk-network.yaml
  2. Create the following Pod spec, and then save the YAML in the mlx-dpdk-pod.yaml file.

    apiVersion: v1
    kind: Pod
    metadata:
      name: dpdk-app
      namespace: <target_namespace> (1)
      annotations:
        k8s.v1.cni.cncf.io/networks: mlx-dpdk-network
    spec:
      containers:
      - name: testpmd
        image: <DPDK_image> (2)
        securityContext:
         capabilities:
            add: ["IPC_LOCK"] (3)
        volumeMounts:
        - mountPath: /dev/hugepages (4)
          name: hugepage
        resources:
          limits:
            openshift.io/mlxnics: "1" (5)
            memory: "1Gi"
            cpu: "4" (6)
            hugepages-1Gi: "4Gi" (7)
          requests:
            openshift.io/mlxnics: "1"
            memory: "1Gi"
            cpu: "4"
            hugepages-1Gi: "4Gi"
        command: ["sleep", "infinity"]
      volumes:
      - name: hugepage
        emptyDir:
          medium: HugePages
1 Specify the same target_namespace where SriovNetwork CR mlx-dpdk-network is created. If you would like to create the Pod in a different namespace, change target_namespace in both Pod spec and SriovNetowrk CR.
2 Specify the the DPDK image which includes your application and the DPDK library used by application.
3 Specify the IPC_LOCK capability which is required by the application to allocate hugepage memory inside the container.
4 Mount the hugepage volume to the DPDK Pod under /dev/hugepages. The hugepage volume is backed by the emptyDir volume type with the medium being Hugepages.
5 (optional) Specify the number of DPDK devices allocated to the DPDK Pod. This resource request and limit, if not explicitly specified, will be automatically added by SR-IOV network resource injector. The SR-IOV network resource injector is an admission controller component managed by SR-IOV Operator. It is enabled by default and can be disabled by setting the Injector option to false in the default SriovOperatorConfig CR.
6 Specify the number of CPUs. The DPDK Pod usually requires exclusive CPUs be allocated from kubelet. This is achieved by setting CPU Manager policy to static and creating a Pod with Guaranteed QoS.
7 Specify hugepage size hugepages-1Gi or hugepages-2Mi and the quantity of hugepages that will be allocated to DPDK Pod. Configure 2Mi and 1Gi hugepages separately. Configuring 1Gi hugepage requires adding kernel arguments to Nodes.
  1. Create the DPDK Pod by running the following command:

    $ oc create -f mlx-dpdk-pod.yaml

Example of a virtual function in RDMA mode with Mellanox NICs

RDMA over Converged Ethernet (RoCE) is the only supported mode when using RDMA on OpenShift Container Platform.

Procedure
  1. Create the following SriovNetworkNodePolicy CR, and then save the YAML in the mlx-rdma-node-policy.yaml file.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetworkNodePolicy
    metadata:
      name: mlx-rdma-node-policy
      namespace: openshift-sriov-network-operator
    spec:
      resourceName: mlxnics
      nodeSelector:
        feature.node.kubernetes.io/network-sriov.capable: "true"
      priority: <priority>
      numVfs: <num>
      nicSelector:
        vendor: "15b3"
        deviceID: "1015" (1)
        pfName: ["<pf_name>", ...]
        rootDevices: ["<pci_bus_id>", "..."]
      deviceType: netdevice (2)
      isRdma: true (3)
1 Specify the device hex code of SR-IOV network device. The only allowed values for Mellanox cards are 1015, 1017.
2 Specify the driver type for the virtual functions to netdevice.
3 Enable RDMA mode.

Please refer to the Configuring SR-IOV network devices section for a detailed explanation on each option in SriovNetworkNodePolicy.

When applying the configuration specified in a SriovNetworkNodePolicy CR, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It may take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.

After the configuration update is applied, all the Pods in the openshift-sriov-network-operator namespace will change to a Running status.

  1. Create the SriovNetworkNodePolicy CR by running the following command:

    $ oc create -f mlx-rdma-node-policy.yaml
  2. Create the following SriovNetwork CR, and then save the YAML in the mlx-rdma-network.yaml file.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetwork
    metadata:
      name: mlx-rdma-network
      namespace: openshift-sriov-network-operator
    spec:
      networkNamespace: <target_namespace>
      ipam: |- (1)
        ...
      vlan: <vlan>
      resourceName: mlxnics
1 Specify a configuration object for the ipam CNI plug-in as a YAML block scalar. The plug-in manages IP address assignment for the attachment definition.

Please refer to Configuring SR-IOV additional network section for detailed explanation on each option in SriovNetwork.

  1. Create the SriovNetworkNodePolicy CR by running the following command:

    $ oc create -f mlx-rdma-network.yaml
  2. Create the following Pod spec, and then save the YAML in the mlx-rdma-pod.yaml file.

    apiVersion: v1
    kind: Pod
    metadata:
      name: rdma-app
      namespace: <target_namespace> (1)
      annotations:
        k8s.v1.cni.cncf.io/networks: mlx-rdma-network
    spec:
      containers:
      - name: testpmd
        image: <RDMA_image> (2)
        securityContext:
         capabilities:
            add: ["IPC_LOCK"] (3)
        volumeMounts:
        - mountPath: /dev/hugepages (4)
          name: hugepage
        resources:
          limits:
            memory: "1Gi"
            cpu: "4" (5)
            hugepages-1Gi: "4Gi" (6)
          requests:
            memory: "1Gi"
            cpu: "4"
            hugepages-1Gi: "4Gi"
        command: ["sleep", "infinity"]
      volumes:
      - name: hugepage
        emptyDir:
          medium: HugePages
1 Specify the same target_namespace where SriovNetwork CR mlx-rdma-network is created. If you would like to create the Pod in a different namespace, change target_namespace in both Pod spec and SriovNetowrk CR.
2 Specify the RDMA image which includes your application and RDMA library used by application.
3 Specify the IPC_LOCK capability which is required by the application to allocate hugepage memory inside the container.
4 Mount the hugepage volume to RDMA Pod under /dev/hugepages. The hugepage volume is backed by the emptyDir volume type with the medium being Hugepages.
5 Specify number of CPUs. The RDMA Pod usually requires exclusive CPUs be allocated from the kubelet. This is achieved by setting CPU Manager policy to static and create Pod with Guaranteed QoS.
6 Specify hugepage size hugepages-1Gi or hugepages-2Mi and the quantity of hugepages that will be allocated to the RDMA Pod. Configure 2Mi and 1Gi hugepages separately. Configuring 1Gi hugepage requires adding kernel arguments to Nodes.
  1. Create the RDMA Pod by running the following command:

    $ oc create -f mlx-rdma-pod.yaml