$ oc label node <node_name> feature.node.kubernetes.io/network-sriov.capable="true"
As a cluster administrator, you can modify interface-level network sysctls using the tuning Container Network Interface (CNI) meta plugin for a pod connected to a SR-IOV network device.
If you want to enable SR-IOV on only SR-IOV capable nodes there are a couple of ways to do this:
Install the Node Feature Discovery (NFD) Operator. NFD detects the presence of SR-IOV enabled NICs and labels the nodes with node.alpha.kubernetes-incubator.io/nfd-network-sriov.capable = true
.
Examine the SriovNetworkNodeState
CR for each node. The interfaces
stanza includes a list of all of the SR-IOV devices discovered by the SR-IOV Network Operator on the worker node. Label each node with feature.node.kubernetes.io/network-sriov.capable: "true"
by using the following command:
$ oc label node <node_name> feature.node.kubernetes.io/network-sriov.capable="true"
You can label the nodes with whatever name you want. |
You can set interface-level network sysctl
settings for a pod connected to a SR-IOV network device.
In this example, net.ipv4.conf.IFNAME.accept_redirects
is set to 1
on the created virtual interfaces.
The sysctl-tuning-test
is a namespace used in this example.
Use the following command to create the sysctl-tuning-test
namespace:
$ oc create namespace sysctl-tuning-test
The SR-IOV Network Operator adds the SriovNetworkNodePolicy.sriovnetwork.openshift.io
custom resource definition (CRD) to OpenShift Container Platform. You can configure an SR-IOV network device by creating a SriovNetworkNodePolicy
custom resource (CR).
When applying the configuration specified in a It can take several minutes for a configuration change to apply. |
Follow this procedure to create a SriovNetworkNodePolicy
custom resource (CR).
Create an SriovNetworkNodePolicy
custom resource (CR). For example, save the following YAML as the file policyoneflag-sriov-node-network.yaml
:
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: policyoneflag (1)
namespace: openshift-sriov-network-operator (2)
spec:
resourceName: policyoneflag (3)
nodeSelector: (4)
feature.node.kubernetes.io/network-sriov.capable="true"
priority: 10 (5)
numVfs: 5 (6)
nicSelector: (7)
pfNames: ["ens5"] (8)
deviceType: "netdevice" (9)
isRdma: false (10)
1 | The name for the custom resource object. |
2 | The namespace where the SR-IOV Network Operator is installed. |
3 | The resource name of the SR-IOV network device plugin. You can create multiple SR-IOV network node policies for a resource name. |
4 | The node selector specifies the nodes to configure. Only SR-IOV network devices on the selected nodes are configured. The SR-IOV Container Network Interface (CNI) plugin and device plugin are deployed on selected nodes only. |
5 | Optional: The priority is an integer value between 0 and 99 . A smaller value receives higher priority. For example, a priority of 10 is a higher priority than 99 . The default value is 99 . |
6 | The number of the virtual functions (VFs) to create for the SR-IOV physical network device. For an Intel network interface controller (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than 127 . |
7 | The NIC selector identifies the device for the Operator to configure. You do not have to specify values for all the parameters. It is recommended to identify the network device with enough precision to avoid selecting a device unintentionally.
If you specify rootDevices , you must also specify a value for vendor , deviceID , or pfNames . If you specify both pfNames and rootDevices at the same time, ensure that they refer to the same device. If you specify a value for netFilter , then you do not need to specify any other parameter because a network ID is unique. |
8 | Optional: An array of one or more physical function (PF) names for the device. |
9 | Optional: The driver type for the virtual functions. The only allowed value is netdevice .
For a Mellanox NIC to work in DPDK mode on bare metal nodes, set isRdma to true . |
10 | Optional: Configures whether to enable remote direct memory access (RDMA) mode. The default value is false .
If the isRdma parameter is set to true , you can continue to use the RDMA-enabled VF as a normal network device. A device can be used in either mode.
Set isRdma to true and additionally set needVhostNet to true to configure a Mellanox NIC for use with Fast Datapath DPDK applications. |
The |
Create the SriovNetworkNodePolicy
object:
$ oc create -f policyoneflag-sriov-node-network.yaml
After applying the configuration update, all the pods in sriov-network-operator
namespace change to the Running
status.
To verify that the SR-IOV network device is configured, enter the following command. Replace <node_name>
with the name of a node with the SR-IOV network device that you just configured.
$ oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'
Succeeded
You can set interface specific sysctl
settings on virtual interfaces created by SR-IOV by adding the tuning configuration to the optional metaPlugins
parameter of the SriovNetwork
resource.
The SR-IOV Network Operator manages additional network definitions. When you specify an additional SR-IOV network to create, the SR-IOV Network Operator creates the NetworkAttachmentDefinition
custom resource (CR) automatically.
Do not edit |
To change the interface-level network net.ipv4.conf.IFNAME.accept_redirects
sysctl
settings, create an additional SR-IOV network with the Container Network Interface (CNI) tuning plugin.
Install the OpenShift Container Platform CLI (oc).
Log in to the OpenShift Container Platform cluster as a user with cluster-admin privileges.
Create the SriovNetwork
custom resource (CR) for the additional SR-IOV network attachment and insert the metaPlugins
configuration, as in the following example CR. Save the YAML as the file sriov-network-interface-sysctl.yaml
.
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: onevalidflag (1)
namespace: openshift-sriov-network-operator (2)
spec:
resourceName: policyoneflag (3)
networkNamespace: sysctl-tuning-test (4)
ipam: '{ "type": "static" }' (5)
capabilities: '{ "mac": true, "ips": true }' (6)
metaPlugins : | (7)
{
"type": "tuning",
"capabilities":{
"mac":true
},
"sysctl":{
"net.ipv4.conf.IFNAME.accept_redirects": "1"
}
}
1 | A name for the object. The SR-IOV Network Operator creates a NetworkAttachmentDefinition object with same name. |
2 | The namespace where the SR-IOV Network Operator is installed. |
3 | The value for the spec.resourceName parameter from the SriovNetworkNodePolicy object that defines the SR-IOV hardware for this additional network. |
4 | The target namespace for the SriovNetwork object. Only pods in the target namespace can attach to the additional network. |
5 | A configuration object for the IPAM CNI plugin as a YAML block scalar. The plugin manages IP address assignment for the attachment definition. |
6 | Optional: Set capabilities for the additional network. You can specify "{ "ips": true }" to enable IP address support or "{ "mac": true }" to enable MAC address support. |
7 | Optional: The metaPlugins parameter is used to add additional capabilities to the device. In this use case set the type field to tuning . Specify the interface-level network sysctl you want to set in the sysctl field. |
Create the SriovNetwork
resource:
$ oc create -f sriov-network-interface-sysctl.yaml
NetworkAttachmentDefinition
CR is successfully createdConfirm that the SR-IOV Network Operator created the NetworkAttachmentDefinition
CR by running the following command:
$ oc get network-attachment-definitions -n <namespace> (1)
1 | Replace <namespace> with the value for networkNamespace that you specified in the SriovNetwork object. For example, sysctl-tuning-test . |
NAME AGE
onevalidflag 14m
There might be a delay before the SR-IOV Network Operator creates the CR. |
To verify that the tuning CNI is correctly configured and the additional SR-IOV network attachment is attached, do the following:
Create a Pod
CR. Save the following YAML as the file examplepod.yaml
:
apiVersion: v1
kind: Pod
metadata:
name: tunepod
namespace: sysctl-tuning-test
annotations:
k8s.v1.cni.cncf.io/networks: |-
[
{
"name": "onevalidflag", (1)
"mac": "0a:56:0a:83:04:0c", (2)
"ips": ["10.100.100.200/24"] (3)
}
]
spec:
containers:
- name: podexample
image: centos
command: ["/bin/bash", "-c", "sleep INF"]
securityContext:
runAsUser: 2000
runAsGroup: 3000
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
1 | The name of the SR-IOV network attachment definition CR. |
2 | Optional: The MAC address for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR. To use this feature, you also must specify { "mac": true } in the SriovNetwork object. |
3 | Optional: IP addresses for the SR-IOV device that are allocated from the resource type defined in the SR-IOV network attachment definition CR. Both IPv4 and IPv6 addresses are supported. To use this feature, you also must specify { "ips": true } in the SriovNetwork object. |
Create the Pod
CR:
$ oc apply -f examplepod.yaml
Verify that the pod is created by running the following command:
$ oc get pod -n sysctl-tuning-test
NAME READY STATUS RESTARTS AGE
tunepod 1/1 Running 0 47s
Log in to the pod by running the following command:
$ oc rsh -n sysctl-tuning-test tunepod
Verify the values of the configured sysctl flag. Find the value net.ipv4.conf.IFNAME.accept_redirects
by running the following command::
$ sysctl net.ipv4.conf.net1.accept_redirects
net.ipv4.conf.net1.accept_redirects = 1
You can set interface-level network sysctl
settings for a pod connected to a bonded SR-IOV network device.
In this example, the specific network interface-level sysctl
settings that can be configured are set on the bonded interface.
The sysctl-tuning-test
is a namespace used in this example.
Use the following command to create the sysctl-tuning-test
namespace:
$ oc create namespace sysctl-tuning-test
The SR-IOV Network Operator adds the SriovNetworkNodePolicy.sriovnetwork.openshift.io
custom resource definition (CRD) to OpenShift Container Platform. You can configure an SR-IOV network device by creating a SriovNetworkNodePolicy
custom resource (CR).
When applying the configuration specified in a SriovNetworkNodePolicy object, the SR-IOV Operator might drain the nodes, and in some cases, reboot nodes. It might take several minutes for a configuration change to apply. |
Follow this procedure to create a SriovNetworkNodePolicy
custom resource (CR).
Create an SriovNetworkNodePolicy
custom resource (CR). Save the following YAML as the file policyallflags-sriov-node-network.yaml
. Replace policyallflags
with the name for the configuration.
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: policyallflags (1)
namespace: openshift-sriov-network-operator (2)
spec:
resourceName: policyallflags (3)
nodeSelector: (4)
node.alpha.kubernetes-incubator.io/nfd-network-sriov.capable = `true`
priority: 10 (5)
numVfs: 5 (6)
nicSelector: (7)
pfNames: ["ens1f0"] (8)
deviceType: "netdevice" (9)
isRdma: false (10)
1 | The name for the custom resource object. |
2 | The namespace where the SR-IOV Network Operator is installed. |
3 | The resource name of the SR-IOV network device plugin. You can create multiple SR-IOV network node policies for a resource name. |
4 | The node selector specifies the nodes to configure. Only SR-IOV network devices on the selected nodes are configured. The SR-IOV Container Network Interface (CNI) plugin and device plugin are deployed on selected nodes only. |
5 | Optional: The priority is an integer value between 0 and 99 . A smaller value receives higher priority. For example, a priority of 10 is a higher priority than 99 . The default value is 99 . |
6 | The number of virtual functions (VFs) to create for the SR-IOV physical network device. For an Intel network interface controller (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than 127 . |
7 | The NIC selector identifies the device for the Operator to configure. You do not have to specify values for all the parameters. It is recommended to identify the network device with enough precision to avoid selecting a device unintentionally.
If you specify rootDevices , you must also specify a value for vendor , deviceID , or pfNames . If you specify both pfNames and rootDevices at the same time, ensure that they refer to the same device. If you specify a value for netFilter , then you do not need to specify any other parameter because a network ID is unique. |
8 | Optional: An array of one or more physical function (PF) names for the device. |
9 | Optional: The driver type for the virtual functions. The only allowed value is netdevice .
For a Mellanox NIC to work in DPDK mode on bare metal nodes, set isRdma to true . |
10 | Optional: Configures whether to enable remote direct memory access (RDMA) mode. The default value is false .
If the isRdma parameter is set to true , you can continue to use the RDMA-enabled VF as a normal network device. A device can be used in either mode.
Set isRdma to true and additionally set needVhostNet to true to configure a Mellanox NIC for use with Fast Datapath DPDK applications. |
The |
Create the SriovNetworkNodePolicy object:
$ oc create -f policyallflags-sriov-node-network.yaml
After applying the configuration update, all the pods in sriov-network-operator namespace change to the Running
status.
To verify that the SR-IOV network device is configured, enter the following command. Replace <node_name>
with the name of a node with the SR-IOV network device that you just configured.
$ oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'
Succeeded
You can set interface specific sysctl
settings on a bonded interface created from two SR-IOV interfaces. Do this by adding the tuning configuration to the optional Plugins
parameter of the bond network attachment definition.
Do not edit |
To change specific interface-level network sysctl
settings create the SriovNetwork
custom resource (CR) with the Container Network Interface (CNI) tuning plugin by using the following procedure.
Install the OpenShift Container Platform CLI (oc).
Log in to the OpenShift Container Platform cluster as a user with cluster-admin privileges.
Create the SriovNetwork
custom resource (CR) for the bonded interface as in the following example CR. Save the YAML as the file sriov-network-attachment.yaml
.
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: allvalidflags (1)
namespace: openshift-sriov-network-operator (2)
spec:
resourceName: policyallflags (3)
networkNamespace: sysctl-tuning-test (4)
capabilities: '{ "mac": true, "ips": true }' (5)
1 | A name for the object. The SR-IOV Network Operator creates a NetworkAttachmentDefinition object with same name. |
2 | The namespace where the SR-IOV Network Operator is installed. |
3 | The value for the spec.resourceName parameter from the SriovNetworkNodePolicy object that defines the SR-IOV hardware for this additional network. |
4 | The target namespace for the SriovNetwork object. Only pods in the target namespace can attach to the additional network. |
5 | Optional: The capabilities to configure for this additional network. You can specify "{ "ips": true }" to enable IP address support or "{ "mac": true }" to enable MAC address support. |
Create the SriovNetwork
resource:
$ oc create -f sriov-network-attachment.yaml
Create a bond network attachment definition as in the following example CR. Save the YAML as the file sriov-bond-network-interface.yaml
.
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: bond-sysctl-network
namespace: sysctl-tuning-test
spec:
config: '{
"cniVersion":"0.4.0",
"name":"bound-net",
"plugins":[
{
"type":"bond", (1)
"mode": "active-backup", (2)
"failOverMac": 1, (3)
"linksInContainer": true, (4)
"miimon": "100",
"links": [ (5)
{"name": "net1"},
{"name": "net2"}
],
"ipam":{ (6)
"type":"static"
}
},
{
"type":"tuning", (7)
"capabilities":{
"mac":true
},
"sysctl":{
"net.ipv4.conf.IFNAME.accept_redirects": "0",
"net.ipv4.conf.IFNAME.accept_source_route": "0",
"net.ipv4.conf.IFNAME.disable_policy": "1",
"net.ipv4.conf.IFNAME.secure_redirects": "0",
"net.ipv4.conf.IFNAME.send_redirects": "0",
"net.ipv6.conf.IFNAME.accept_redirects": "0",
"net.ipv6.conf.IFNAME.accept_source_route": "1",
"net.ipv6.neigh.IFNAME.base_reachable_time_ms": "20000",
"net.ipv6.neigh.IFNAME.retrans_time_ms": "2000"
}
}
]
}'
1 | The type is bond . |
2 | The mode attribute specifies the bonding mode. The bonding modes supported are:
|
3 | The failover attribute is mandatory for active-backup mode. |
4 | The linksInContainer=true flag informs the Bond CNI that the required interfaces are to be found inside the container. By default, Bond CNI looks for these interfaces on the host which does not work for integration with SRIOV and Multus. |
5 | The links section defines which interfaces will be used to create the bond. By default, Multus names the attached interfaces as: "net", plus a consecutive number, starting with one. |
6 | A configuration object for the IPAM CNI plugin as a YAML block scalar. The plugin manages IP address assignment for the attachment definition. In this pod example IP addresses are configured manually, so in this case,ipam is set to static. |
7 | Add additional capabilities to the device. For example, set the type field to tuning . Specify the interface-level network sysctl you want to set in the sysctl field. This example sets all interface-level network sysctl settings that can be set. |
Create the bond network attachment resource:
$ oc create -f sriov-bond-network-interface.yaml
NetworkAttachmentDefinition
CR is successfully createdConfirm that the SR-IOV Network Operator created the NetworkAttachmentDefinition
CR by running the following command:
$ oc get network-attachment-definitions -n <namespace> (1)
1 | Replace <namespace> with the networkNamespace that you specified when configuring the network attachment, for example, sysctl-tuning-test . |
NAME AGE
bond-sysctl-network 22m
allvalidflags 47m
There might be a delay before the SR-IOV Network Operator creates the CR. |
To verify that the tuning CNI is correctly configured and the additional SR-IOV network attachment is attached, do the following:
Create a Pod
CR. For example, save the following YAML as the file examplepod.yaml
:
apiVersion: v1
kind: Pod
metadata:
name: tunepod
namespace: sysctl-tuning-test
annotations:
k8s.v1.cni.cncf.io/networks: |-
[
{"name": "allvalidflags"}, (1)
{"name": "allvalidflags"},
{
"name": "bond-sysctl-network",
"interface": "bond0",
"mac": "0a:56:0a:83:04:0c", (2)
"ips": ["10.100.100.200/24"] (3)
}
]
spec:
containers:
- name: podexample
image: centos
command: ["/bin/bash", "-c", "sleep INF"]
securityContext:
runAsUser: 2000
runAsGroup: 3000
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
1 | The name of the SR-IOV network attachment definition CR. |
2 | Optional: The MAC address for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR. To use this feature, you also must specify { "mac": true } in the SriovNetwork object. |
3 | Optional: IP addresses for the SR-IOV device that are allocated from the resource type defined in the SR-IOV network attachment definition CR. Both IPv4 and IPv6 addresses are supported. To use this feature, you also must specify { "ips": true } in the SriovNetwork object. |
Apply the YAML:
$ oc apply -f examplepod.yaml
Verify that the pod is created by running the following command:
$ oc get pod -n sysctl-tuning-test
NAME READY STATUS RESTARTS AGE
tunepod 1/1 Running 0 47s
Log in to the pod by running the following command:
$ oc rsh -n sysctl-tuning-test tunepod
Verify the values of the configured sysctl
flag. Find the value net.ipv6.neigh.IFNAME.base_reachable_time_ms
by running the following command::
$ sysctl net.ipv6.neigh.bond0.base_reachable_time_ms
net.ipv6.neigh.bond0.base_reachable_time_ms = 20000