Labeling nodes with an SR-IOV enabled NIC

If you want to enable SR-IOV on only SR-IOV capable nodes there are a couple of ways to do this:

  1. Install the Node Feature Discovery (NFD) Operator. NFD detects the presence of SR-IOV enabled NICs and labels the nodes with node.alpha.kubernetes-incubator.io/nfd-network-sriov.capable = true.

  2. Examine the SriovNetworkNodeState CR for each node. The interfaces stanza includes a list of all of the SR-IOV devices discovered by the SR-IOV Network Operator on the worker node. Label each node with feature.node.kubernetes.io/network-sriov.capable: "true" by using the following command:

    $ oc label node <node_name> feature.node.kubernetes.io/network-sriov.capable="true"

    You can label the nodes with whatever name you want.

Setting one sysctl flag

You can set interface-level network sysctl settings for a pod connected to a SR-IOV network device.

In this example, net.ipv4.conf.IFNAME.accept_redirects is set to 1 on the created virtual interfaces.

The sysctl-tuning-test is a namespace used in this example.

  • Use the following command to create the sysctl-tuning-test namespace:

    $ oc create namespace sysctl-tuning-test

Setting one sysctl flag on nodes with SR-IOV network devices

The SR-IOV Network Operator adds the SriovNetworkNodePolicy.sriovnetwork.openshift.io custom resource definition (CRD) to OpenShift Container Platform. You can configure an SR-IOV network device by creating a SriovNetworkNodePolicy custom resource (CR).

When applying the configuration specified in a SriovNetworkNodePolicy object, the SR-IOV Operator might drain and reboot the nodes.

It can take several minutes for a configuration change to apply.

Follow this procedure to create a SriovNetworkNodePolicy custom resource (CR).

  1. Create an SriovNetworkNodePolicy custom resource (CR). For example, save the following YAML as the file policyoneflag-sriov-node-network.yaml:

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetworkNodePolicy
      name: policyoneflag (1)
      namespace: openshift-sriov-network-operator (2)
      resourceName: policyoneflag (3)
      nodeSelector: (4)
      priority: 10 (5)
      numVfs: 5 (6)
      nicSelector: (7)
        pfNames: ["ens5"] (8)
      deviceType: "netdevice" (9)
      isRdma: false (10)
    1 The name for the custom resource object.
    2 The namespace where the SR-IOV Network Operator is installed.
    3 The resource name of the SR-IOV network device plug-in. You can create multiple SR-IOV network node policies for a resource name.
    4 The node selector specifies the nodes to configure. Only SR-IOV network devices on the selected nodes are configured. The SR-IOV Container Network Interface (CNI) plug-in and device plug-in are deployed on selected nodes only.
    5 Optional: The priority is an integer value between 0 and 99. A smaller value receives higher priority. For example, a priority of 10 is a higher priority than 99. The default value is 99.
    6 The number of the virtual functions (VFs) to create for the SR-IOV physical network device. For an Intel network interface controller (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than 128.
    7 The NIC selector identifies the device for the Operator to configure. You do not have to specify values for all the parameters. It is recommended to identify the network device with enough precision to avoid selecting a device unintentionally. If you specify rootDevices, you must also specify a value for vendor, deviceID, or pfNames. If you specify both pfNames and rootDevices at the same time, ensure that they refer to the same device. If you specify a value for netFilter, then you do not need to specify any other parameter because a network ID is unique.
    8 Optional: An array of one or more physical function (PF) names for the device.
    9 Optional: The driver type for the virtual functions. The only allowed value is netdevice. For a Mellanox NIC to work in DPDK mode on bare metal nodes, set isRdma to true.
    10 Optional: Configures whether to enable remote direct memory access (RDMA) mode. The default value is false. If the isRdma parameter is set to true, you can continue to use the RDMA-enabled VF as a normal network device. A device can be used in either mode. Set isRdma to true and additionally set needVhostNet to true to configure a Mellanox NIC for use with Fast Datapath DPDK applications.

    The vfio-pci driver type is not supported.

  2. Create the SriovNetworkNodePolicy object:

    $ oc create -f policyoneflag-sriov-node-network.yaml

    After applying the configuration update, all the pods in sriov-network-operator namespace change to the Running status.

  3. To verify that the SR-IOV network device is configured, enter the following command. Replace <node_name> with the name of a node with the SR-IOV network device that you just configured.

    $ oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'
    Example output

Configuring sysctl on a SR-IOV network

You can set interface specific sysctl settings on virtual interfaces created by SR-IOV by adding the tuning configuration to the optional metaPlugins parameter of the SriovNetwork resource.

The SR-IOV Network Operator manages additional network definitions. When you specify an additional SR-IOV network to create, the SR-IOV Network Operator creates the NetworkAttachmentDefinition custom resource (CR) automatically.

Do not edit NetworkAttachmentDefinition custom resources that the SR-IOV Network Operator manages. Doing so might disrupt network traffic on your additional network.

To change the interface-level network net.ipv4.conf.IFNAME.accept_redirects sysctl settings, create an additional SR-IOV network with the Container Network Interface (CNI) tuning plug-in.

  • Install the OpenShift Container Platform CLI (oc).

  • Log in to the OpenShift Container Platform cluster as a user with cluster-admin privileges.

  1. Create the SriovNetwork custom resource (CR) for the additional SR-IOV network attachment and insert the metaPlugins configuration, as in the following example CR. Save the YAML as the file sriov-network-interface-sysctl.yaml.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetwork
      name: onevalidflag (1)
      namespace: openshift-sriov-network-operator (2)
      resourceName: policyoneflag (3)
      networkNamespace: sysctl-tuning-test (4)
      ipam: '{ "type": "static" }' (5)
      capabilities: '{ "mac": true, "ips": true }' (6)
      metaPlugins : | (7)
          "type": "tuning",
             "net.ipv4.conf.IFNAME.accept_redirects": "1"
    1 A name for the object. The SR-IOV Network Operator creates a NetworkAttachmentDefinition object with same name.
    2 The namespace where the SR-IOV Network Operator is installed.
    3 The value for the spec.resourceName parameter from the SriovNetworkNodePolicy object that defines the SR-IOV hardware for this additional network.
    4 The target namespace for the SriovNetwork object. Only pods in the target namespace can attach to the additional network.
    5 A configuration object for the IPAM CNI plug-in as a YAML block scalar. The plug-in manages IP address assignment for the attachment definition.
    6 Optional: Set capabilities for the additional network. You can specify "{ "ips": true }" to enable IP address support or "{ "mac": true }" to enable MAC address support.
    7 Optional: The metaPlugins parameter is used to add additional capabilities to the device. In this use case set the type field to tuning. Specify the interface-level network sysctl you want to set in the sysctl field.
  2. Create the SriovNetwork resource:

    $ oc create -f sriov-network-interface-sysctl.yaml
Verifying that the NetworkAttachmentDefinition CR is successfully created
  • Confirm that the SR-IOV Network Operator created the NetworkAttachmentDefinition CR by running the following command:

    $ oc get network-attachment-definitions -n <namespace> (1)
    1 Replace <namespace> with the value for networkNamespace that you specified in the SriovNetwork object. For example, sysctl-tuning-test.
    Example output
    NAME                                  AGE
    onevalidflag                          14m

    There might be a delay before the SR-IOV Network Operator creates the CR.

Verifying that the additional SR-IOV network attachment is successful

To verify that the tuning CNI is correctly configured and the additional SR-IOV network attachment is attached, do the following:

  1. Create a Pod CR. Save the following YAML as the file examplepod.yaml:

    apiVersion: v1
    kind: Pod
      name: tunepod
      namespace: sysctl-tuning-test
        k8s.v1.cni.cncf.io/networks: |-
              "name": "onevalidflag",  (1)
              "mac": "0a:56:0a:83:04:0c", (2)
              "ips": [""] (3)
      - name: podexample
        image: centos
        command: ["/bin/bash", "-c", "sleep INF"]
          runAsUser: 2000
          runAsGroup: 3000
          allowPrivilegeEscalation: false
            drop: ["ALL"]
        runAsNonRoot: true
          type: RuntimeDefault
    1 The name of the SR-IOV network attachment definition CR.
    2 Optional: The MAC address for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR. To use this feature, you also must specify { "mac": true } in the SriovNetwork object.
    3 Optional: IP addresses for the SR-IOV device that are allocated from the resource type defined in the SR-IOV network attachment definition CR. Both IPv4 and IPv6 addresses are supported. To use this feature, you also must specify { "ips": true } in the SriovNetwork object.
  2. Create the Pod CR:

    $ oc apply -f examplepod.yaml
  3. Verify that the pod is created by running the following command:

    $ oc get pod -n sysctl-tuning-test
    Example output
    tunepod   1/1     Running   0          47s
  4. Log in to the pod by running the following command:

    $ oc rsh -n sysctl-tuning-test tunepod
  5. Verify the values of the configured sysctl flag. Find the value net.ipv4.conf.IFNAME.accept_redirects by running the following command::

    $ sysctl net.ipv4.conf.net1.accept_redirects
    Example output
    net.ipv4.conf.net1.accept_redirects = 1

Configuring sysctl settings for pods associated with bonded SR-IOV interface flag

You can set interface-level network sysctl settings for a pod connected to a bonded SR