You can configure a Single Root I/O Virtualization (SR-IOV) device in your cluster.

Automated discovery of SR-IOV network devices

The SR-IOV Network Operator will search your cluster for SR-IOV capable network devices on worker nodes. The Operator creates and updates a SriovNetworkNodeState Custom Resource (CR) for each worker node that provides a compatible SR-IOV network device.

One CR is created for each worker node, and shares the same name as the node. The .status.interfaces list provides information about the network devices on a node.

Do not modify a SriovNetworkNodeState CR. The Operator creates and manages these resources automatically.

The following is an example of a SriovNetworkNodeState CR created by the SR-IOV Network Operator:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodeState
metadata:
  name: node-25 (1)
  namespace: openshift-sriov-network-operator
  ownerReferences:
  - apiVersion: sriovnetwork.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: SriovNetworkNodePolicy
    name: default
spec:
  dpConfigVersion: "39824"
status:
  interfaces: (2)
  - deviceID: "1017"
    driver: mlx5_core
    mtu: 1500
    name: ens785f0
    pciAddress: "0000:18:00.0"
    totalvfs: 8
    vendor: 15b3
  - deviceID: "1017"
    driver: mlx5_core
    mtu: 1500
    name: ens785f1
    pciAddress: "0000:18:00.1"
    totalvfs: 8
    vendor: 15b3
  - deviceID: 158b
    driver: i40e
    mtu: 1500
    name: ens817f0
    pciAddress: 0000:81:00.0
    totalvfs: 64
    vendor: "8086"
  - deviceID: 158b
    driver: i40e
    mtu: 1500
    name: ens817f1
    pciAddress: 0000:81:00.1
    totalvfs: 64
    vendor: "8086"
  - deviceID: 158b
    driver: i40e
    mtu: 1500
    name: ens803f0
    pciAddress: 0000:86:00.0
    totalvfs: 64
    vendor: "8086"
  syncStatus: Succeeded
1 The value for the name parameter is the same as the name of the worker node.
2 The interfaces collection includes a list of all of the SR-IOV devices discovered by the Operator on the worker node.

Configuring SR-IOV network devices

The SR-IOV Network Operator adds the SriovNetworkNodePolicy.sriovnetwork.openshift.io Custom Resource Definition (CRD) to OpenShift Container Platform. You can configure the SR-IOV network device by creating a SriovNetworkNodePolicy Custom Resource (CR).

When applying the configuration specified in a SriovNetworkNodePolicy CR, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It may take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.

After the configuration update is applied, all the Pods in sriov-network-operator namespace will change to a Running status.

Prerequisites
  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.

  • Log in as a user with cluster-admin privileges.

  • You must have installed the SR-IOV Operator.

Procedure
  1. Create the following SriovNetworkNodePolicy CR, and then save the YAML in the <name>-sriov-node-network.yaml file. Replace <name> with the name for this configuration.

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: <name> (1)
  namespace: openshift-sriov-network-operator (2)
spec:
  resourceName: <sriov_resource_name> (3)
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true" (4)
  priority: <priority> (5)
  mtu: <mtu> (6)
  numVfs: <num> (7)
  nicSelector: (8)
    vendor: "<vendor_code>" (9)
    deviceID: "<device_id>" (10)
    pfNames: ["<pf_name>", ...] (11)
    rootDevices: ["<pci_bus_id>", "..."] (12)
  deviceType: <device_type> (13)
  isRdma: false (14)
1 Specify a name for the CR.
2 Specify the namespace where the SR-IOV Operator is installed.
3 Specify the resource name of the SR-IOV device plug-in. The prefix openshift.io/ will be added when it’s referred in Pod spec. You can create multiple SriovNetworkNodePolicy CRs for a resource name.
4 Specify the node selector to select which node to be configured. User can choose to label the nodes manually or with tools like Kubernetes Node Feature Discovery. Only SR-IOV network devices on selected nodes will be configured. The SR-IOV CNI plug-in and device plug-in will be only deployed on selected nodes.
5 Optional. Specify an integer value between 0 and 99. A larger number gets lower priority, so a priority of 99 is lower than a priority of 10. The default value is 99.
6 Optional. Specify a value for the maximum transmission unit (MTU) of the virtual function. The maximum MTU value can vary for different NIC models.
7 Specify the number of the virtual functions (VF) to create for the SR-IOV physical network device. For an Intel Network Interface Card (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than 128.
8 The nicSelector mapping selects the Ethernet device for the Operator to configure. You do not need to specify values for all the parameters. It is recommended to identify the Ethernet adapter with enough precision to minimize the possibility of selecting an Ethernet device unintentionally. If you specify rootDevices, you must also specify a value for vendor, deviceID, or pfNames. If you specify both pfNames and rootDevices at the same time, ensure that they point to an identical device.
9 Optional. Specify the vendor hex code of the SR-IOV network device. The only allowed values are either 8086 or 15b3.
10 Optional. Specify the device hex code of SR-IOV network device. The only allowed values are 158b, 1015, 1017.
11 Optional. The parameter accepts an array of one or more physical function (PF) names for the Ethernet device.
12 The parameter accepts an array of one or more PCI bus addresses for the physical function of the Ethernet device. Provide the address in the following format: 0000:02:00.1.
13 Optional. Specify the driver type for the virtual functions. You can specify one of the following values: netdevice or vfio-pci. The default value is netdevice.

For a Mellanox card to work in dpdk mode, use the netdevice driver type.

14 Optional. Specify whether to enable RDMA mode. The default value is false. Only RDMA over Converged Ethernet (RoCE) mode is supported on Mellanox Ethernet adapters.

If RDMA flag is set to true, you can continue to use the RDMA enabled VF as a normal network device. A device can be used in either mode.

  1. Create the CR by running the following command:

    $ oc create -f <filename> (1)
    1 Replace <filename> with the name of the file you created in the previous step.