This topic provides recommended host practices for OpenShift Container Platform.
The OpenShift Container Platform node configuration file contains important options. For
example, two parameters control the maximum number of pods that can be scheduled
to a node:
When both options are in use, the lower of the two values limits the number of pods on a node. Exceeding these values can result in:
Increased CPU utilization.
Slow pod scheduling.
Potential out-of-memory scenarios, depending on the amount of memory in the node.
Exhausting the pool of IP addresses.
Resource overcommitting, leading to poor user application performance.
In Kubernetes, a pod that is holding a single container actually uses two containers. The second container is used to set up networking prior to the actual container starting. Therefore, a system running 10 pods will actually have 20 containers running.
podsPerCore sets the number of pods the node can run based on the number of
processor cores on the node. For example, if
podsPerCore is set to
10 on a
node with 4 processor cores, the maximum number of pods allowed on the node will
0 disables this limit. The default is
podsPerCore cannot exceed
maxPods sets the number of pods the node can run to a fixed value, regardless
of the properties of the node.
The kubelet configuration is currently serialized as an ignition configuration, so it can be directly edited. However, there is also a new kubelet-config-controller added to the Machine Config Controller (MCC). This allows you to create a KubeletConfig custom resource (CR) to edit the kubelet parameters.
$ oc get machineconfig
This provides a list of the available machine configuration objects you can
select. By default, the two kubelet-related configs are
To check the current value of max Pods per node, run:
# oc describe node <node-ip> | grep Allocatable -A6
value: pods: <value>.
# oc describe node ip-172-31-128-158.us-east-2.compute.internal | grep Allocatable -A6
Allocatable: attachable-volumes-aws-ebs: 25 cpu: 3500m hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 15341844Ki pods: 250
To set the max Pods per node on the worker nodes, create a custom resource file
that contains the kubelet configuration. For example,
The rate at which the kubelet talks to the API server depends on queries per
second (QPS) and burst values. The default values,
kubeAPIBurst, are good enough if there are limited pods running on each
node. Updating the kubelet QPS and burst rates is recommended if there are
enough CPU and memory resources on the node:
$ oc label machineconfigpool worker custom-kubelet=large-pods
$ oc create -f change-maxPods-cr.yaml
$ oc get kubeletconfig
This should return
Depending on the number of worker nodes in the cluster, wait for the worker nodes to be rebooted one by one. For a cluster with 3 worker nodes, this could take about 10 to 15 minutes.
maxPods changing for the worker nodes:
$ oc describe node
Verify the change by running:
$ oc get kubeletconfigs set-max-pods -o yaml
This should show a status of
By default, only one machine is allowed to be unavailable when applying the kubelet-related configuration to the available worker nodes. For a large cluster, it can take a long time for the configuration change to be reflected. At any time, you can adjust the number of machines that are updating to speed up the process.
$ oc edit machineconfigpool worker
maxUnavailable to the desired value.
spec: maxUnavailable: <node_count>
When setting the value, consider the number of worker nodes that can be unavailable without affecting the applications running on the cluster.
The master node resource requirements depend on the number of nodes in the cluster. The following master node size recommendations are based on the results of control plane density focused testing.
|Number of worker nodes
Because you cannot modify the master node size in a running OpenShift Container Platform 4.3 cluster, you must estimate your total node count and use the suggested master size during installation.
In OpenShift Container Platform 4.3, half of a CPU core (500 millicore) is now reserved by the system by default compared to OpenShift Container Platform 3.11 and previous versions. The sizes are determined taking that into consideration.
For large and dense clusters, etcd can suffer from poor performance
if the keyspace grows excessively large and exceeds the space quota.
Periodic maintenance of etcd including defragmentation needs to be done
to free up space in the data store. It is highly recommended that you monitor
Prometheus for etcd metrics and defragment it when needed before etcd raises
a cluster-wide alarm that puts the cluster into a maintenance mode, which
only accepts key reads and deletes. Some of the key metrics to monitor are
etcd_server_quota_backend_bytes which is the current quota limit,
etcd_mvcc_db_total_size_in_use_in_bytes which indicates the actual
database usage after a history compaction, and
etcd_debugging_mvcc_db_total_size_in_bytes which shows the database size
including free space waiting for defragmentation.