Examples of using virtual functions in DPDK and RDMA modes

The Data Plane Development Kit (DPDK) is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.

Remote Direct Memory Access (RDMA) is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.

Prerequisites

  • Install the OpenShift CLI (oc).

  • Log in as a user with cluster-admin privileges.

  • You must have installed the SR-IOV Network Operator.

Example use of virtual function (VF) in DPDK mode with Intel NICs

Procedure
  1. Create the following SriovNetworkNodePolicy CR, and then save the YAML in the intel-dpdk-node-policy.yaml file.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetworkNodePolicy
    metadata:
      name: intel-dpdk-node-policy
      namespace: openshift-sriov-network-operator
    spec:
      resourceName: intelnics
      nodeSelector:
        feature.node.kubernetes.io/network-sriov.capable: "true"
      priority: <priority>
      numVfs: <num>
      nicSelector:
        vendor: "8086"
        deviceID: "158b"
        pfNames: ["<pf_name>", ...]
        rootDevices: ["<pci_bus_id>", "..."]
      deviceType: vfio-pci (1)
    1 Specify the driver type for the virtual functions to vfio-pci.

    Please refer to the Configuring SR-IOV network devices section for a detailed explanation on each option in SriovNetworkNodePolicy.

    + When applying the configuration specified in a SriovNetworkNodePolicy CR, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It may take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.

    + After the configuration update is applied, all the pods in openshift-sriov-network-operator namespace will change to a Running status.

  2. Create the SriovNetworkNodePolicy CR by running the following command:

    $ oc create -f intel-dpdk-node-policy.yaml
  3. Create the following SriovNetwork CR, and then save the YAML in the intel-dpdk-network.yaml file.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetwork
    metadata:
      name: intel-dpdk-network
      namespace: openshift-sriov-network-operator
    spec:
      networkNamespace: <target_namespace>
      ipam: "{}" (1)
      vlan: <vlan>
      resourceName: intelnics
    1 Specify an empty object "{}" for the ipam CNI plug-in. DPDK works in userspace mode and does not require an IP address.

    Please refer to the Configuring SR-IOV additional network section for a detailed explanation on each option in SriovNetwork.

  4. Create the SriovNetworkNodePolicy CR by running the following command:

    $ oc create -f intel-dpdk-network.yaml
  5. Create the following Pod spec, and then save the YAML in the intel-dpdk-pod.yaml file.

    apiVersion: v1
    kind: Pod
    metadata:
      name: dpdk-app
      namespace: <target_namespace> (1)
      annotations:
        k8s.v1.cni.cncf.io/networks: intel-dpdk-network
    spec:
      containers:
      - name: testpmd
        image: <DPDK_image> (2)
        securityContext:
         capabilities:
            add: ["IPC_LOCK"] (3)
        volumeMounts:
        - mountPath: /dev/hugepages (4)
          name: hugepage
        resources:
          limits:
            openshift.io/intelnics: "1" (5)
            memory: "1Gi"
            cpu: "4" (6)
            hugepages-1Gi: "4Gi" (7)
          requests:
            openshift.io/intelnics: "1"
            memory: "1Gi"
            cpu: "4"
            hugepages-1Gi: "4Gi"
        command: ["sleep", "infinity"]
      volumes:
      - name: hugepage
        emptyDir:
          medium: HugePages
    1 Specify the same target_namespace where the SriovNetwork CR intel-dpdk-network is created. If you would like to create the Pod in a different namespace, change target_namespace in both the Pod spec and the SriovNetowrk CR.
    2 Specify the DPDK image which includes your application and the DPDK library used by application.
    3 Specify the IPC_LOCK capability which is required by the application to allocate hugepage memory inside container.
    4 Mount a hugepage volume to the DPDK Pod under /dev/hugepages. The hugepage volume is backed by the emptyDir volume type with the medium being Hugepages.
    5 Optional: Specify the number of DPDK devices allocated to DPDK pod. This resource request and limit, if not explicitly specified, will be automatically added by the SR-IOV network resource injector. The SR-IOV network resource injector is an admission controller component managed by the SR-IOV Operator. It is enabled by default and can be disabled by setting enableInjector option to false in the default SriovOperatorConfig CR.
    6 Specify the number of CPUs. The DPDK Pod usually requires exclusive CPUs to be allocated from the kubelet. This is achieved by setting CPU Manager policy to static and creating a Pod with Guaranteed QoS.
    7 Specify hugepage size hugepages-1Gi or hugepages-2Mi and the quantity of hugepages that will be allocated to the DPDK pod. Configure 2Mi and 1Gi hugepages separately. Configuring 1Gi hugepage requires adding kernel arguments to Nodes. For example, adding kernel arguments default_hugepagesz=1GB, hugepagesz=1G and hugepages=16 will result in 16*1Gi hugepages be allocated during system boot.
  6. Create the DPDK Pod by running the following command:

    $ oc create -f intel-dpdk-pod.yaml

Example use of a virtual function in DPDK mode with Mellanox NICs

Procedure
  1. Create the following SriovNetworkNodePolicy CR, and then save the YAML in the mlx-dpdk-node-policy.yaml file.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetworkNodePolicy
    metadata:
      name: mlx-dpdk-node-policy
      namespace: openshift-sriov-network-operator
    spec:
      resourceName: mlxnics
      nodeSelector:
        feature.node.kubernetes.io/network-sriov.capable: "true"
      priority: <priority>
      numVfs: <num>
      nicSelector:
        vendor: "15b3"
        deviceID: "1015" (1)
        pfNames: ["<pf_name>", ...]
        rootDevices: ["<pci_bus_id>", "..."]
      deviceType: netdevice (2)
      isRdma: true (3)
    1 Specify the device hex code of the SR-IOV network device. The only allowed values for Mellanox cards are 1015, 1017.
    2 Specify the driver type for the virtual functions to netdevice. Mellanox SR-IOV VF can work in DPDK mode without using the vfio-pci device type. VF device appears as a kernel network interface inside a container.
    3 Enable RDMA mode. This is required by Mellanox cards to work in DPDK mode.

    Please refer to Configuring SR-IOV network devices section for detailed explanation on each option in SriovNetworkNodePolicy.

    + When applying the configuration specified in a SriovNetworkNodePolicy CR, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It may take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.

    + After the configuration update is applied, all the pods in the openshift-sriov-network-operator namespace will change to a Running status.

  2. Create the SriovNetworkNodePolicy CR by running the following command:

    $ oc create -f mlx-dpdk-node-policy.yaml
  3. Create the following SriovNetwork CR, and then save the YAML in the mlx-dpdk-network.yaml file.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetwork
    metadata:
      name: mlx-dpdk-network
      namespace: openshift-sriov-network-operator
    spec:
      networkNamespace: <target_namespace>
      ipam: |- (1)
        ...
      vlan: <vlan>
      resourceName: mlxnics
    1 Specify a configuration object for the ipam CNI plug-in as a YAML block scalar. The plug-in manages IP address assignment for the attachment definition.

    Please refer to Configuring SR-IOV additional network section for detailed explanation on each option in SriovNetwork.

  4. Create the SriovNetworkNodePolicy CR by running the following command:

    $ oc create -f mlx-dpdk-network.yaml
  5. Create the following Pod spec, and then save the YAML in the mlx-dpdk-pod.yaml file.

    apiVersion: v1
    kind: Pod
    metadata:
      name: dpdk-app
      namespace: <target_namespace> (1)
      annotations:
        k8s.v1.cni.cncf.io/networks: mlx-dpdk-network
    spec:
      containers:
      - name: testpmd
        image: <DPDK_image> (2)
        securityContext:
         capabilities:
            add: ["IPC_LOCK"] (3)
        volumeMounts:
        - mountPath: /dev/hugepages (4)
          name: hugepage
        resources:
          limits:
            openshift.io/mlxnics: "1" (5)
            memory: "1Gi"
            cpu: "4" (6)
            hugepages-1Gi: "4Gi" (7)
          requests:
            openshift.io/mlxnics: "1"
            memory: "1Gi"
            cpu: "4"
            hugepages-1Gi: "4Gi"
        command: ["sleep", "infinity"]
      volumes:
      - name: hugepage
        emptyDir:
          medium: HugePages
    1 Specify the same target_namespace where SriovNetwork CR mlx-dpdk-network is created. If you would like to create the pod in a different namespace, change target_namespace in both Pod spec and SriovNetowrk CR.
    2 Specify the DPDK image which includes your application and the DPDK library used by application.
    3 Specify the IPC_LOCK capability which is required by the application to allocate hugepage memory inside the container.
    4 Mount the hugepage volume to the DPDK pod under /dev/hugepages. The hugepage volume is backed by the emptyDir volume type with the medium being Hugepages.
    5 Optional: Specify the number of DPDK devices allocated to the DPDK pod. This resource request and limit, if not explicitly specified, will be automatically added by SR-IOV network resource injector. The SR-IOV network resource injector is an admission controller component managed by SR-IOV Operator. It is enabled by default and can be disabled by setting the enableInjector option to false in the default SriovOperatorConfig CR.
    6 Specify the number of CPUs. The DPDK pod usually requires exclusive CPUs be allocated from kubelet. This is achieved by setting CPU Manager policy to static and creating a pod with Guaranteed QoS.
    7 Specify hugepage size hugepages-1Gi or hugepages-2Mi and the quantity of hugepages that will be allocated to DPDK pod. Configure 2Mi and 1Gi hugepages separately. Configuring 1Gi hugepage requires adding kernel arguments to Nodes.
  6. Create the DPDK pod by running the following command:

    $ oc create -f mlx-dpdk-pod.yaml

Example of a virtual function in RDMA mode with Mellanox NICs

RDMA over Converged Ethernet (RoCE) is the only supported mode when using RDMA on OpenShift Container Platform.

Procedure
  1. Create the following SriovNetworkNodePolicy CR, and then save the YAML in the mlx-rdma-node-policy.yaml file.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetworkNodePolicy
    metadata:
      name: mlx-rdma-node-policy
      namespace: openshift-sriov-network-operator
    spec:
      resourceName: mlxnics
      nodeSelector:
        feature.node.kubernetes.io/network-sriov.capable: "true"
      priority: <priority>
      numVfs: <num>
      nicSelector:
        vendor: "15b3"
        deviceID: "1015" (1)
        pfNames: ["<pf_name>", ...]
        rootDevices: ["<pci_bus_id>", "..."]
      deviceType: netdevice (2)
      isRdma: true (3)
    1 Specify the device hex code of SR-IOV network device. The only allowed values for Mellanox cards are 1015, 1017.
    2 Specify the driver type for the virtual functions to netdevice.
    3 Enable RDMA mode.

    Please refer to the Configuring SR-IOV network devices section for a detailed explanation on each option in SriovNetworkNodePolicy.

    + When applying the configuration specified in a SriovNetworkNodePolicy CR, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It may take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.

    + After the configuration update is applied, all the pods in the openshift-sriov-network-operator namespace will change to a Running status.

  2. Create the SriovNetworkNodePolicy CR by running the following command:

    $ oc create -f mlx-rdma-node-policy.yaml
  3. Create the following SriovNetwork CR, and then save the YAML in the mlx-rdma-network.yaml file.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetwork
    metadata:
      name: mlx-rdma-network
      namespace: openshift-sriov-network-operator
    spec:
      networkNamespace: <target_namespace>
      ipam: |- (1)
        ...
      vlan: <vlan>
      resourceName: mlxnics
    1 Specify a configuration object for the ipam CNI plug-in as a YAML block scalar. The plug-in manages IP address assignment for the attachment definition.

    Please refer to Configuring SR-IOV additional network section for detailed explanation on each option in SriovNetwork.

  4. Create the SriovNetworkNodePolicy CR by running the following command:

    $ oc create -f mlx-rdma-network.yaml
  5. Create the following Pod spec, and then save the YAML in the mlx-rdma-pod.yaml file.

    apiVersion: v1
    kind: Pod
    metadata:
      name: rdma-app
      namespace: <target_namespace> (1)
      annotations:
        k8s.v1.cni.cncf.io/networks: mlx-rdma-network
    spec:
      containers:
      - name: testpmd
        image: <RDMA_image> (2)
        securityContext:
         capabilities:
            add: ["IPC_LOCK"] (3)
        volumeMounts:
        - mountPath: /dev/hugepages (4)
          name: hugepage
        resources:
          limits:
            memory: "1Gi"
            cpu: "4" (5)
            hugepages-1Gi: "4Gi" (6)
          requests:
            memory: "1Gi"
            cpu: "4"
            hugepages-1Gi: "4Gi"
        command: ["sleep", "infinity"]
      volumes:
      - name: hugepage
        emptyDir:
          medium: HugePages
    1 Specify the same target_namespace where SriovNetwork CR mlx-rdma-network is created. If you would like to create the pod in a different namespace, change target_namespace in both Pod spec and SriovNetowrk CR.
    2 Specify the RDMA image which includes your application and RDMA library used by application.
    3 Specify the IPC_LOCK capability which is required by the application to allocate hugepage memory inside the container.
    4 Mount the hugepage volume to RDMA pod under /dev/hugepages. The hugepage volume is backed by the emptyDir volume type with the medium being Hugepages.
    5 Specify number of CPUs. The RDMA pod usually requires exclusive CPUs be allocated from the kubelet. This is achieved by setting CPU Manager policy to static and create pod with Guaranteed QoS.
    6 Specify hugepage size hugepages-1Gi or hugepages-2Mi and the quantity of hugepages that will be allocated to the RDMA pod. Configure 2Mi and 1Gi hugepages separately. Configuring 1Gi hugepage requires adding kernel arguments to Nodes.
  6. Create the RDMA pod by running the following command:

    $ oc create -f mlx-rdma-pod.yaml