Network Interface Card (NIC) SR-IOV hardware is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.

About SR-IOV hardware on OpenShift Container Platform

OpenShift Container Platform includes the capability to use SR-IOV hardware on your nodes. You can attach SR-IOV virtual function (VF) interfaces to Pods on nodes with SR-IOV hardware.

You can use the OpenShift Container Platform console to install SR-IOV by deploying the SR-IOV Network Operator. The SR-IOV Network Operator creates and manages the components of the SR-IOV stack. The Operator provides following features:

  • Discover the SR-IOV network device in cluster.

  • Initialize the supported SR-IOV NIC models on nodes.

  • Provision the SR-IOV network device plug-in on nodes.

  • Provision the SR-IOV CNI plug-in executable on nodes.

  • Provision the Network Resources Injector in cluster.

  • Manage configuration of SR-IOV network device plug-in.

  • Generate NetworkAttachmentDefinition custom resources (CR) for the SR-IOV CNI plug-in.

Here’s the function of each above mentioned SR-IOV components.

  • The SR-IOV network device plug-in is a Kubernetes device plug-in for discovering, advertising, and allocating SR-IOV network virtual function (VF) resources. Device plug-ins are used in Kubernetes to enable the use of limited resources, typically in physical devices. Device plug-ins give the Kubernetes scheduler awareness of resource availability, so the scheduler can schedule Pods on nodes with sufficient resources.

  • The SR-IOV CNI plug-in plumbs VF interfaces allocated from the SR-IOV device plug-in directly into a Pod.

  • The Network Resources Injector is a Kubernetes Dynamic Admission Controller Webhook that provides functionality for patching Kubernetes Pod specifications with requests and limits for custom network resources such as SR-IOV VFs.

The Network Resources Injector is enabled by default and cannot be disabled.

Supported devices

The following Network Interface Card (NIC) models are supported in OpenShift Container Platform:

  • Intel XXV710-DA2 25G card with vendor ID 0x8086 and device ID 0x158b

  • Mellanox MT27710 Family [ConnectX-4 Lx] 25G card with vendor ID 0x15b3 and device ID 0x1015

  • Mellanox MT27800 Family [ConnectX-5] 100G card with vendor ID 0x15b3 and device ID 0x1017

Automated discovery of SR-IOV network devices

The SR-IOV Network Operator will search your cluster for SR-IOV capable network devices on worker nodes. The Operator creates and updates a SriovNetworkNodeState Custom Resource (CR) for each worker node that provides a compatible SR-IOV network device.

One CR is created for each worker node, and shares the same name as the node. The .spec.interfaces list provides information about the network devices on a node.

Do not modify a SriovNetworkNodeState CR. The Operator creates and manages these resources automatically.

The following is an example of a SriovNetworkNodeState CR created by the SR-IOV Network Operator:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodeState
metadata:
  name: node-25 (1)
  namespace: sriov-network-operator
  ownerReferences:
  - apiVersion: sriovnetwork.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: SriovNetworkNodePolicy
    name: default
spec:
  dpConfigVersion: d41d8cd98f00b204e9800998ecf8427e
status:
  interfaces: (2)
  - deviceID: "1017"
    driver: mlx5_core
    mtu: 1500
    name: ens785f0
    pciAddress: "0000:18:00.0"
    totalvfs: 8
    vendor: 15b3
  - deviceID: "1017"
    driver: mlx5_core
    mtu: 1500
    name: ens785f1
    pciAddress: "0000:18:00.1"
    totalvfs: 8
    vendor: 15b3
  - deviceID: 158b
    driver: i40e
    mtu: 1500
    name: ens817f0
    pciAddress: 0000:81:00.0
    totalvfs: 64
    vendor: "8086"
  - deviceID: 158b
    driver: i40e
    mtu: 1500
    name: ens817f1
    pciAddress: 0000:81:00.1
    totalvfs: 64
    vendor: "8086"
  - deviceID: 158b
    driver: i40e
    mtu: 1500
    name: ens803f0
    pciAddress: 0000:86:00.0
    totalvfs: 64
    vendor: "8086"
  syncStatus: Succeeded
1 The value for the name parameter is the same as the name of the worker node.
2 The interfaces collection includes a list of all of the SR-IOV devices discovered by the Operator on the worker node.

Example use of virtual function (VF) in a Pod

You can run a remote direct memory access (RDMA) or a Data Plane Development Kit (DPDK) application in a Pod with SR-IOV VF attached. In the following example, a Pod is using a VF in RDMA mode:

apiVersion: v1
kind: Pod
metadata:
  name: rdma-app
  annotations:
    k8s.v1.cni.cncf.io/networks: sriov-rdma-mlnx
spec:
  containers:
  - name: testpmd
    image: <RDMA_image>
    imagePullPolicy: IfNotPresent
    securityContext:
     capabilities:
        add: ["IPC_LOCK"]
    command: ["sleep", "infinity"]

The following example shows a Pod with VF in DPDK mode:

apiVersion: v1
kind: Pod
metadata:
  name: dpdk-app
  annotations:
    k8s.v1.cni.cncf.io/networks: sriov-dpdk-net
spec:
  containers:
  - name: testpmd
    image: <DPDK_image>
    securityContext:
     capabilities:
        add: ["IPC_LOCK"]
    volumeMounts:
    - mountPath: /dev/hugepages
      name: hugepage
    resources:
      limits:
        memory: "1Gi"
        cpu: "2"
        hugepages-1Gi: "4Gi"
      requests:
        memory: "1Gi"
        cpu: "2"
        hugepages-1Gi: "4Gi"
    command: ["sleep", "infinity"]
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages

Installing SR-IOV Network Operator

As a cluster administrator, you can install the SR-IOV Network Operator using the OpenShift Container Platform CLI or the web console.

Installing the Operator using the CLI

As a cluster administrator, you can install the Operator using the CLI.

Prerequisites
  • A cluster installed on bare-metal hardware with nodes that have hardware that supports SR-IOV.

  • The OpenShift Container Platform Command-line Interface (CLI), commonly known as oc.

  • Log in as a user with cluster-admin privileges.

Procedure
  1. Create a namespace for the SR-IOV Network Operator by completing the following actions:

    1. Create the following Namespace Custom Resource (CR) that defines the sriov-network-operator namespace, and then save the YAML in the sriov-namespace.yaml file:

      apiVersion: v1
      kind: Namespace
      metadata:
        name: sriov-network-operator
        labels:
          openshift.io/run-level: "1"
    2. Create the namespace by running the following command:

      $ oc create -f sriov-namespace.yaml
  2. Install the SR-IOV Network Operator in the namespace you created in the previous step by creating the following objects:

    1. Create the following OperatorGroup CR and save the YAML in the sriov-operatorgroup.yaml file:

      apiVersion: operators.coreos.com/v1
      kind: OperatorGroup
      metadata:
        name: sriov-network-operators
        namespace: sriov-network-operator
      spec:
        targetNamespaces:
        - sriov-network-operator
    2. Create the OperatorGroup CR by running the following command:

      $ oc create -f sriov-operatorgroup.yaml
    3. Run the following command to get the channel value required for the next step.

      $ oc get packagemanifest sriov-network-operator -n openshift-marketplace -o jsonpath='{.status.channels[].name}'
      
      alpha
    4. Create the following Subscription CR and save the YAML in the sriov-sub.yaml file:

      Example Subscription
      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: sriov-network-operator-subsription
        namespace: sriov-network-operator
      spec:
        channel: <channel> (1)
        name: sriov-network-operator
        source: redhat-operators (2)
        sourceNamespace: openshift-marketplace
      1 Specify the value from you obtained in the previous step for the .status.channels[].name parameter.
      2 You must specify the redhat-operators value.
    5. Create the Subscription object by running the following command:

      $ oc create -f sriov-sub.yaml
    6. Change to the sriov-network-operator project:

      $ oc project sriov-network-operator
      
      Now using project "sriov-network-operator"

Installing the Operator using the web console

As a cluster administrator, you can install the Operator using the web console.

You have to create the Namespace CR and OperatorGroup CR as mentioned in the previous section.

Procedure
  1. Install the SR-IOV Network Operator using the OpenShift Container Platform web console:

    1. In the OpenShift Container Platform web console, click OperatorsOperatorHub.

    2. Choose SR-IOV Network Operator from the list of available Operators, and then click Install.

    3. On the Create Operator Subscription page, under A specific namespace on the cluster select sriov-network-operator. Then, click Subscribe.

  2. Optional: Verify that the SR-IOV Network Operator installed successfully:

    1. Switch to the OperatorsInstalled Operators page.

    2. Ensure that SR-IOV Network Operator is listed in the sriov-network-operator project with a Status of InstallSucceeded.

      During installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.

      If the operator does not appear as installed, to troubleshoot further:

      • Go to the OperatorsInstalled Operators page and inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.

      • Go to the WorkloadsPods page and check the logs for Pods in the sriov-network-operator project.

Configuring SR-IOV network devices

The SR-IOV Network Operator adds the SriovNetworkNodePolicy.sriovnetwork.openshift.io Custom Resource Definition (CRD) to OpenShift Container Platform. You can configure the SR-IOV network device by creating a SriovNetworkNodePolicy Custom Resource (CR).

When applying the configuration specified in a SriovNetworkNodePolicy CR, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It may take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.

After the configuration update is applied, all the Pods in sriov-network-operator namespace will change to a Running status.

Prerequisites
  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.

  • Log in as a user with cluster-admin privileges.

  • You must have installed the SR-IOV Operator.

Procedure
  1. Create the following SriovNetworkNodePolicy CR, and then save the YAML in the <name>-sriov-node-network.yaml file. Replace <name> with the name for this configuration.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetworkNodePolicy
    metadata:
      name: <name> (1)
      namespace: sriov-network-operator (2)
    spec:
      resourceName: <sriov_resource_name> (3)
      nodeSelector:
        feature.node.kubernetes.io/network-sriov.capable: "true" (4)
      priority: <priority> (5)
      mtu: <mtu> (6)
      numVfs: <num> (7)
      nicSelector: (8)
        vendor: "<vendor_code>" (9)
        deviceID: "<device_id>" (10)
        pfName: ["<pf_name>", ...] (11)
        rootDevices: ["<pci_bus_id>", "..."] (12)
      deviceType: <device_type> (13)
      isRdma: false (14)
    1 Specify a name for the CR.
    2 Specify the namespace where the SR-IOV Operator is installed.
    3 Specify the resource name of the SR-IOV device plug-in. The prefix openshift.io/ will be added when it’s referred in Pod spec. You can create multiple SriovNetworkNodePolicy CRs for a resource name.
    4 Specify the node selector to select which node to be configured. User can choose to label the nodes manually or with tools like Kubernetes Node Feature Discovery. Only SR-IOV network devices on selected nodes will be configured. The SR-IOV CNI plug-in and device plug-in will be only deployed on selected nodes.
    5 Specify an integer value between 0 and 99. A larger number gets lower priority, so a priority of 99 is lower than a priority of 10.
    6 Specify a value for the maximum transmission unit (MTU) of the virtual function. The value for MTU must be in the range from 1 to 9000. If you do not need to specify the MTU, specify a value of ''.
    7 Specify the number of the virtual functions (VF) to create for the SR-IOV physical network device. For an Intel Network Interface Card (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than 128.
    8 The nicSelector mapping selects the Ethernet device for the Operator to configure. You do not need to specify values for all the parameters. It is recommended to identify the Ethernet adapter with enough precision to minimize the possibility of selecting an Ethernet device unintentionally. If you specify rootDevices, you must also specify a value for vendor, deviceID, or pfName. If you specify both pfName and rootDevices at the same time, ensure that they point to an identical device.
    9 Specify the vendor hex code of the SR-IOV network device. The only allowed values are either 8086 or 15b3.
    10 Specify the device hex code of SR-IOV network device. The only allowed values are 158b, 1015, 1017.
    11 The parameter accepts an array of one or more physical function (PF) names for the Ethernet device.
    12 The parameter accepts an array of one or more PCI bus addresses for the physical function of the Ethernet device. Provide the address in the following format: 0000:02:00.1.
    13 Specify the driver type for the virtual functions. You can specify one of the following values: netdevice or vfio-pci. The default value is netdevice.
    14 Specify whether to enable RDMA mode. The default value is false. Only RDMA over Converged Ethernet (RoCE) mode is supported on Mellanox Ethernet adapters.

    If RDMA flag is set to true, you can continue to use the RDMA enabled VF as a normal network device. A device can be used in either mode.

  2. Create the CR by running the following command:

    $ oc create -f <filename> (1)
    1 Replace <filename> with the name of the file you created in the previous step.

Configuring SR-IOV additional network

You can configure an additional network that uses SR-IOV hardware by creating a SriovNetwork Custom Resource (CR). When you create a SriovNetwork CR, the SR-IOV Operator automatically creates a NetworkAttachmentDefinition CR.

Do not modify or delete a SriovNetwork Custom Resource (CR) if it is attached to any Pods in the running state.

Prerequisites
  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.

  • Log in as a user with cluster-admin privileges.

Procedure
  1. Create the following SriovNetwork CR, and then save the YAML in the <name>-sriov-network.yaml file. Replace <name> with a name for this additional network.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetwork
    metadata:
      name: <name> (1)
      namespace: sriov-network-operator (2)
    spec:
      networkNamespace: <target_namespace> (3)
      ipam: |- (4)
        ...
      vlan: <vlan> (5)
      resourceName: <sriov_resource_name> (6)
    1 Replace <name> with a name for the CR. The Operator will create a NetworkAttachmentDefinition CR with same name.
    2 Specify the namespace where the SR-IOV Operator is installed.
    3 Replace <target_namespace> with the namespace where the NetworkAttachmentDefinition CR will be created.
    4 Specify a configuration object for the ipam CNI plug-in as a YAML block scalar. The plug-in manages IP address assignment for the attachment definition.
    5 Replace <vlan> with a Virtual LAN (VLAN) ID for the additional network. The integer value must be from 0 to 4095. The default value is 0.
    6 Replace <sriov_resource_name> with the value for the .spec.resourceName parameter from the SriovNetworkNodePolicy CR that defines the SR-IOV hardware for this additional network.
  2. Create the CR by running the following command:

    $ oc create -f <filename> (1)
    1 Replace <filename> with the name of the file you created in the previous step.
  3. Optional: Confirm that the NetworkAttachmentDefinition CR associated with the SriovNetwork CR that you created in the previous step exists by running the following command. Replace <namespace> with the namespace you specified in the SriovNetwork CR.

    oc get net-attach-def -n <namespace>

Configuration for ipam CNI plug-in

The IP address management (IPAM) CNI plug-in manages IP address assignment for other CNI plug-ins. You can configure ipam for either static IP address assignment or dynamic IP address assignment by using DHCP. The DHCP server you specify must be reachable from the additional network.

In OpenShift Container Platform 4.2.0, if you attach a Pod to an additional network that uses DHCP for IP address management, the Pod will fail to start. This is fixed in OpenShift Container Platform 4.2.1. For more information, see BZ#1754686.

The following JSON configuration object describes the parameters that you can set.

If you set the type parameter to the DHCP value, you cannot set any other parameters.
ipam CNI plug-in JSON configuration object
{
  "ipam": {
    "type": "<type>", (1)
    "addresses": [ (2)
      {
        "address": "<address>", (3)
        "gateway": "<gateway>" (4)
      }
    ],
    "routes": [ (5)
      {
        "dst": "<dst>" (6)
        "gw": "<gw>" (7)
      }
    ],
    "dns": { (8)
      "nameservers": ["<nameserver>"], (9)
      "domain": "<domain>", (10)
      "search": ["<search_domain>"] (11)
    }
  }
}
1 Specify static to configure the plug-in to manage IP address assignment. Specify DHCP to allow a DHCP server to manage IP address assignment. You cannot specify any additional parameters if you specify a value of DHCP.
2 An array describing IP addresses to assign to the virtual interface. Both IPv4 and IPv6 IP addresses are supported.
3 A block of IP addresses that you specify in CIDR format to assign to Pods on a worker node, such as 10.1.1.0/24.
4 The default gateway to route egress network traffic to.
5 An array describing routes to configure inside the Pod.
6 The IP address range in CIDR format.
7 The gateway to use to route network traffic to.
8 The DNS configuration. Optional.
9 An of array of one or more IP addresses for to send DNS queries to.
10 The default domain to append to a host name. For example, if the domain is set to example.com, a DNS lookup query for example-host will be rewritten as example-host.example.com.
11 An array of domain names to append to an unqualified host name, such as example-host, during a DNS lookup query.

Static IP address assignment configuration example

You can configure ipam for static IP address assignment:

{
  "ipam": {
    "type": "static",
      "addresses": [
        {
          "address": "191.168.1.1/24"
        }
      ]
  }
}

Dynamic IP address assignment configuration example

You can configure ipam for DHCP:

{
  "ipam": {
    "type": "DHCP"
  }
}

Adding a Pod to an additional network

You can add a Pod to an additional network. The Pod continues to send normal cluster related network traffic over the default network.

The Network Resources Injector will inject the resource parameter into the Pod CR automatically if a NetworkAttachmentDefinition CR associated with the SR-IOV CNI plug-in is specified.

Prerequisites
  • The Pod must be in the same namespace as the additional network.

  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.

  • You must log in to the cluster.

  • You must have the SR-IOV Operator installed and a SriovNetwork CR defined.

Procedure

To add a Pod to an additional network, complete the following steps:

  1. Edit the Pod resource definition. If you are editing an existing Pod, run the following command to edit its definition in the default editor. Replace <name> with the name of the Pod to edit.

    $ oc edit pod <name>
  2. In the Pod resource definition, add the k8s.v1.cni.cncf.io/networks parameter to the Pod metadata mapping. The k8s.v1.cni.cncf.io/networks accepts a comma separated string of one or more NetworkAttachmentDefinition Custom Resource (CR) names:

    metadata:
      annotations:
        k8s.v1.cni.cncf.io/networks: <network>[,<network>,...] (1)
    1 Replace <network> with the name of the additional network to associate with the Pod. To specify more than one additional network, separate each network with a comma. Do not include whitespace between the comma. If you specify the same additional network multiple times, that Pod will have multiple network interfaces attached to that network.

    In the following example, two additional networks are attached to the Pod:

    apiVersion: v1
    kind: Pod
    metadata:
      name: example-pod
      annotations:
        k8s.v1.cni.cncf.io/networks: net1,net2
    spec:
      containers:
      - name: example-pod
        command: ["/bin/bash", "-c", "sleep 2000000000000"]
        image: centos/tools
  3. Optional: Confirm that the annotation exists in the Pod CR by running the following command. Replace <name> with the name of the Pod.

    $ oc get pod <name> -o yaml

    In the following example, the example-pod Pod is attached to the net1 additional network:

    $ oc get pod example-pod -o yaml
    apiVersion: v1
    kind: Pod
    metadata:
      annotations:
        k8s.v1.cni.cncf.io/networks: macvlan-bridge
        k8s.v1.cni.cncf.io/networks-status: |- (1)
          [{
              "name": "openshift-sdn",
              "interface": "eth0",
              "ips": [
                  "10.128.2.14"
              ],
              "default": true,
              "dns": {}
          },{
              "name": "macvlan-bridge",
              "interface": "net1",
              "ips": [
                  "20.2.2.100"
              ],
              "mac": "22:2f:60:a5:f8:00",
              "dns": {}
          }]
      name: example-pod
      namespace: default
    spec:
      ...
    status:
      ...
    1 The k8s.v1.cni.cncf.io/networks-status parameter is a JSON array of objects. Each object describes the status of an additional network attached to the Pod. The annotation value is stored as a plain text value.