Configuring PID limits | Cluster administration | Red Hat OpenShift Service on AWS

Understanding process ID limits
Risks of setting higher process ID limits for Red Hat OpenShift Service on AWS pods
Setting a higher process ID limit on an existing Red Hat OpenShift Service on AWS cluster
Removing custom configuration from a cluster

A process identifier (PID) is a unique identifier assigned by the Linux kernel to each process or thread currently running on a system. The number of processes that can run simultaneously on a system is limited to 4,194,304 by the Linux kernel. This number might also be affected by limited access to other system resources such as memory, CPU, and disk space.

In Red Hat OpenShift Service on AWS 4.11 and later, by default, a pod can have a maximum of 4,096 PIDs. If your workload requires more than that, you can increase the allowed maximum number of PIDs by configuring a KubeletConfig object.

Red Hat OpenShift Service on AWS clusters running versions earlier than 4.11 use a default PID limit of 1024.

edit

Understanding process ID limits

In Red Hat OpenShift Service on AWS, consider these two supported limits for process ID (PID) usage before you schedule work on your cluster:

Maximum number of PIDs per pod.

The default value is 4,096 in Red Hat OpenShift Service on AWS 4.11 and later. This value is controlled by the podPidsLimit parameter set on the node.
Maximum number of PIDs per node.

The default value depends on node resources. In Red Hat OpenShift Service on AWS, this value is controlled by the --system-reserved parameter, which reserves PIDs on each node based on the total resources of the node.

When a pod exceeds the allowed maximum number of PIDs per pod, the pod might stop functioning correctly and might be evicted from the node. See the Kubernetes documentation for eviction signals and thresholds for more information.

When a node exceeds the allowed maximum number of PIDs per node, the node can become unstable because new processes cannot have PIDs assigned. If existing processes cannot complete without creating additional processes, the entire node can become unusable and require reboot. This situation can result in data loss, depending on the processes and applications being run. Customer administrators and Red Hat Site Reliability Engineering are notified when this threshold is reached, and a Worker node is experiencing PIDPressure warning will appear in the cluster logs.

edit

Risks of setting higher process ID limits for Red Hat OpenShift Service on AWS pods

The podPidsLimit parameter for a pod controls the maximum number of processes and threads that can run simultaneously in that pod.

You can increase the value for podPidsLimit from the default of 4,096 to a maximum of 16,384. Changing this value might incur downtime for applications, because changing the podPidsLimit requires rebooting the affected node.

If you are running a large number of pods per node, and you have a high podPidsLimit value on your nodes, you risk exceeding the PID maximum for the node.

To find the maximum number of pods that you can run simultaneously on a single node without exceeding the PID maximum for the node, divide 3,650,000 by your podPidsLimit value. For example, if your podPidsLimit value is 16,384, and you expect the pods to use close to that number of process IDs, you can safely run 222 pods on a single node.

Memory, CPU, and available storage can also limit the maximum number of pods that can run simultaneously, even when the podPidsLimit value is set appropriately. For more information, see "Planning your environment" and "Limits and scalability".

Additional resources

edit

Setting a higher process ID limit on an existing Red Hat OpenShift Service on AWS cluster

You can set a higher podPidsLimit on an existing Red Hat OpenShift Service on AWS (ROSA) cluster by creating or editing a KubeletConfig object that changes the --pod-pids-limit parameter.

Changing the podPidsLimit on an existing cluster will trigger non-control plane nodes in the cluster to reboot one at a time. Make this change outside of peak usage hours for your cluster and avoid upgrading or hibernating your cluster until all nodes have rebooted.

Prerequisites

You have a Red Hat OpenShift Service on AWS cluster.
You have installed the ROSA CLI (rosa).
You have installed the OpenShift CLI (oc).
You have logged in to your Red Hat account by using the ROSA CLI.

Procedure

Create or edit the KubeletConfig object to change the PID limit.
- If this is the first time you are changing the default PID limit, create the KubeletConfig object and set the --pod-pids-limit value by running the following command:
  
  $ rosa create kubeletconfig -c <cluster_name> --name <kubeletconfig_name> --pod-pids-limit=<value>
  
  The --name parameter is optional on ROSA Classic clusters, because only one KubeletConfig object is supported per ROSA Classic cluster.
  
  For example, the following command sets a maximum of 16,384 PIDs per pod for cluster my-cluster:
  
  $ rosa create kubeletconfig -c my-cluster --name set-high-pids --pod-pids-limit=16384
- If you previously created a KubeletConfig object, edit the existing KubeletConfig object and set the --pod-pids-limit value by running the following command:
  
  $ rosa edit kubeletconfig -c <cluster_name> --name <kubeletconfig_name> --pod-pids-limit=<value>
A cluster-wide rolling reboot of worker nodes is triggered.

Verify that all of the worker nodes rebooted by running the following command:

$ oc get machineconfigpool

Example output

NAME      CONFIG                    UPDATED  UPDATING   DEGRADED  MACHINECOUNT  READYMACHINECOUNT  UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT  AGE
master    rendered-master-06c9c4…   True     False      False     3             3                  3                   0                     4h42m
worker    rendered-worker-f4b64…    True     False      False     4             4                  4                   0                     4h42m

Verification

When each node in the cluster has rebooted, you can verify that the new setting is in place.

Check the Pod Pids limit in the KubeletConfig object:
```
$ rosa describe kubeletconfig --cluster=<cluster_name>
```
The new PIDs limit appears in the output, as shown in the following example:
Example output
```
Pod Pids Limit:                       16384
```

edit

Removing custom configuration from a cluster

You can remove custom configuration from your cluster by removing the KubeletConfig object that contains the configuration details.

Prerequisites

You have an existing Red Hat OpenShift Service on AWS cluster.
You have installed the ROSA CLI (rosa).
You have logged in to your Red Hat account by using the ROSA CLI.

Procedure

Remove custom configuration from the cluster by deleting the relevant custom KubeletConfig object:
```
$ rosa delete kubeletconfig --cluster <cluster_name> --name <kubeletconfig_name>
```

Verification steps

Confirm that the custom KubeletConfig object is not listed for the cluster:
```
$ rosa describe kubeletconfig --name <cluster_name>
```