$ oc label node <node_name> node-role.kubernetes.io/worker-dpdk=""
The Data Plane Development Kit (DPDK) provides a set of libraries and drivers for fast packet processing.
You can configure clusters and virtual machines (VMs) to run DPDK workloads over SR-IOV networks.
You can configure an OpenShift Container Platform cluster to run Data Plane Development Kit (DPDK) workloads for improved network performance.
You have access to the cluster as a user with cluster-admin
permissions.
You have installed the OpenShift CLI (oc
).
You have installed the SR-IOV Network Operator.
You have installed the Node Tuning Operator.
Map your compute nodes topology to determine which Non-Uniform Memory Access (NUMA) CPUs are isolated for DPDK applications and which ones are reserved for the operating system (OS).
If your OpenShift Container Platform cluster uses separate control plane and compute nodes for high-availability:
Label a subset of the compute nodes with a custom role; for example, worker-dpdk
:
$ oc label node <node_name> node-role.kubernetes.io/worker-dpdk=""
Create a new MachineConfigPool
manifest that contains the worker-dpdk
label in the spec.machineConfigSelector
object:
MachineConfigPool
manifestapiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: worker-dpdk
labels:
machineconfiguration.openshift.io/role: worker-dpdk
spec:
machineConfigSelector:
matchExpressions:
- key: machineconfiguration.openshift.io/role
operator: In
values:
- worker
- worker-dpdk
nodeSelector:
matchLabels:
node-role.kubernetes.io/worker-dpdk: ""
Create a PerformanceProfile
manifest that applies to the labeled nodes and the machine config pool that you created in the previous steps. The performance profile specifies the CPUs that are isolated for DPDK applications and the CPUs that are reserved for house keeping.
PerformanceProfile
manifestapiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: profile-1
spec:
cpu:
isolated: 4-39,44-79
reserved: 0-3,40-43
globallyDisableIrqLoadBalancing: true
hugepages:
defaultHugepagesSize: 1G
pages:
- count: 8
node: 0
size: 1G
net:
userLevelNetworking: true
nodeSelector:
node-role.kubernetes.io/worker-dpdk: ""
numa:
topologyPolicy: single-numa-node
The compute nodes automatically restart after you apply the |
Retrieve the name of the generated RuntimeClass
resource from the status.runtimeClass
field of the PerformanceProfile
object:
$ oc get performanceprofiles.performance.openshift.io profile-1 -o=jsonpath='{.status.runtimeClass}{"\n"}'
Set the previously obtained RuntimeClass
name as the default container runtime class for the virt-launcher
pods by editing the HyperConverged
custom resource (CR):
$ oc patch hyperconverged kubevirt-hyperconverged -n openshift-cnv \
--type='json' -p='[{"op": "add", "path": "/spec/defaultRuntimeClass", "value":"<runtimeclass-name>"}]'
Editing the |
If your DPDK-enabled compute nodes use Simultaneous multithreading (SMT), enable the AlignCPUs
enabler by editing the HyperConverged
CR:
$ oc patch hyperconverged kubevirt-hyperconverged -n openshift-cnv \
--type='json' -p='[{"op": "replace", "path": "/spec/featureGates/alignCPUs", "value": true}]'
Enabling |
Create an SriovNetworkNodePolicy
object with the spec.deviceType
field set to vfio-pci
:
SriovNetworkNodePolicy
manifestapiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: policy-1
namespace: openshift-sriov-network-operator
spec:
resourceName: intel_nics_dpdk
deviceType: vfio-pci
mtu: 9000
numVfs: 4
priority: 99
nicSelector:
vendor: "8086"
deviceID: "1572"
pfNames:
- eno3
rootDevices:
- "0000:19:00.2"
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
You can delete a custom machine config pool that you previously created for your high-availability cluster.
You have access to the cluster as a user with cluster-admin
permissions.
You have installed the OpenShift CLI (oc
).
You have created a custom machine config pool by labeling a subset of the compute nodes with a custom role and creating a MachineConfigPool
manifest with that label.
Remove the worker-dpdk
label from the compute nodes by running the following command:
$ oc label node <node_name> node-role.kubernetes.io/worker-dpdk-
Delete the MachineConfigPool
manifest that contains the worker-dpdk
label by entering the following command:
$ oc delete mcp worker-dpdk
You can configure the project to run DPDK workloads on SR-IOV hardware.
Your cluster is configured to run DPDK workloads.
Create a namespace for your DPDK applications:
$ oc create ns dpdk-checkup-ns
Create an SriovNetwork
object that references the SriovNetworkNodePolicy
object. When you create an SriovNetwork
object, the SR-IOV Network Operator automatically creates a NetworkAttachmentDefinition
object.
SriovNetwork
manifestapiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: dpdk-sriovnetwork
namespace: openshift-sriov-network-operator
spec:
ipam: |
{
"type": "host-local",
"subnet": "10.56.217.0/24",
"rangeStart": "10.56.217.171",
"rangeEnd": "10.56.217.181",
"routes": [{
"dst": "0.0.0.0/0"
}],
"gateway": "10.56.217.1"
}
networkNamespace: dpdk-checkup-ns (1)
resourceName: intel_nics_dpdk (2)
spoofChk: "off"
trust: "on"
vlan: 1019
1 | The namespace where the NetworkAttachmentDefinition object is deployed. |
2 | The value of the spec.resourceName attribute of the SriovNetworkNodePolicy object that was created when configuring the cluster for DPDK workloads. |
Optional: Run the virtual machine latency checkup to verify that the network is properly configured.
Optional: Run the DPDK checkup to verify that the namespace is ready for DPDK workloads.
You can run Data Packet Development Kit (DPDK) workloads on virtual machines (VMs) to achieve lower latency and higher throughput for faster packet processing in the user space. DPDK uses the SR-IOV network for hardware-based I/O sharing.
Your cluster is configured to run DPDK workloads.
You have created and configured the project in which the VM will run.
Edit the VirtualMachine
manifest to include information about the SR-IOV network interface, CPU topology, CRI-O annotations, and huge pages:
VirtualMachine
manifestapiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: rhel-dpdk-vm
spec:
running: true
template:
metadata:
annotations:
cpu-load-balancing.crio.io: disable (1)
cpu-quota.crio.io: disable (2)
irq-load-balancing.crio.io: disable (3)
spec:
domain:
cpu:
sockets: 1 (4)
cores: 5 (5)
threads: 2
dedicatedCpuPlacement: true
isolateEmulatorThread: true
interfaces:
- masquerade: {}
name: default
- model: virtio
name: nic-east
pciAddress: '0000:07:00.0'
sriov: {}
networkInterfaceMultiqueue: true
rng: {}
memory:
hugepages:
pageSize: 1Gi (6)
guest: 8Gi
networks:
- name: default
pod: {}
- multus:
networkName: dpdk-net (7)
name: nic-east
# ...
1 | This annotation specifies that load balancing is disabled for CPUs that are used by the container. |
2 | This annotation specifies that the CPU quota is disabled for CPUs that are used by the container. |
3 | This annotation specifies that Interrupt Request (IRQ) load balancing is disabled for CPUs that are used by the container. |
4 | The number of sockets inside the VM. This field must be set to 1 for the CPUs to be scheduled from the same Non-Uniform Memory Access (NUMA) node. |
5 | The number of cores inside the VM. This must be a value greater than or equal to 1 . In this example, the VM is scheduled with 5 hyper-threads or 10 CPUs. |
6 | The size of the huge pages. The possible values for x86-64 architecture are 1Gi and 2Mi. In this example, the request is for 8 huge pages of size 1Gi. |
7 | The name of the SR-IOV NetworkAttachmentDefinition object. |
Save and exit the editor.
Apply the VirtualMachine
manifest:
$ oc apply -f <file_name>.yaml
Configure the guest operating system. The following example shows the configuration steps for RHEL 9 operating system:
Configure huge pages by using the GRUB bootloader command-line interface. In the following example, 8 1G huge pages are specified.
$ grubby --update-kernel=ALL --args="default_hugepagesz=1GB hugepagesz=1G hugepages=8"
To achieve low-latency tuning by using the cpu-partitioning
profile in the TuneD application, run the following commands:
$ dnf install -y tuned-profiles-cpu-partitioning
$ echo isolated_cores=2-9 > /etc/tuned/cpu-partitioning-variables.conf
The first two CPUs (0 and 1) are set aside for house keeping tasks and the rest are isolated for the DPDK application.
$ tuned-adm profile cpu-partitioning
Override the SR-IOV NIC driver by using the driverctl
device driver control utility:
$ dnf install -y driverctl
$ driverctl set-override 0000:07:00.0 vfio-pci
Restart the VM to apply the changes.