kubeletConfig: podsPerCore: 10
This topic provides recommended host practices for OpenShift Container Platform.
The OpenShift Container Platform node configuration file contains important options. For
example, two parameters control the maximum number of pods that can be scheduled
to a node: podsPerCore
and maxPods
.
When both options are in use, the lower of the two values limits the number of pods on a node. Exceeding these values can result in:
Increased CPU utilization.
Slow pod scheduling.
Potential out-of-memory scenarios, depending on the amount of memory in the node.
Exhausting the pool of IP addresses.
Resource overcommitting, leading to poor user application performance.
In Kubernetes, a pod that is holding a single container actually uses two containers. The second container is used to set up networking prior to the actual container starting. Therefore, a system running 10 pods will actually have 20 containers running. |
podsPerCore
sets the number of pods the node can run based on the number of
processor cores on the node. For example, if podsPerCore
is set to 10
on a
node with 4 processor cores, the maximum number of pods allowed on the node will
be 40
.
kubeletConfig: podsPerCore: 10
Setting podsPerCore
to 0
disables this limit. The default is 0
.
podsPerCore
cannot exceed maxPods
.
maxPods
sets the number of pods the node can run to a fixed value, regardless
of the properties of the node.
kubeletConfig: maxPods: 250
The kubelet configuration is currently serialized as an ignition configuration, so it can be directly edited. However, there is also a new kubelet-config-controller added to the Machine Config Controller (MCC). This allows you to create a KubeletConfig custom resource (CR) to edit the kubelet parameters.
Run:
$ oc get machineconfig
This provides a list of the available machine configuration objects you can
select. By default, the two kubelet-related configs are 01-master-kubelet
and
01-worker-kubelet
.
To check the current value of max Pods per node, run:
# oc describe node <node-ip> | grep Allocatable -A6
Look for value: pods: <value>
.
For example:
# oc describe node ip-172-31-128-158.us-east-2.compute.internal | grep Allocatable -A6 Allocatable: attachable-volumes-aws-ebs: 25 cpu: 3500m hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 15341844Ki pods: 250
To set the max Pods per node on the worker nodes, create a custom resource file
that contains the kubelet configuration. For example, change-maxPods-cr.yaml
:
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
name: set-max-pods
spec:
machineConfigPoolSelector:
matchLabels:
custom-kubelet: large-pods
kubeletConfig:
maxPods: 500
The rate at which the kubelet talks to the API server depends on queries per
second (QPS) and burst values. The default values, 5
for kubeAPIQPS
and 10
for kubeAPIBurst
, are good enough if there are limited pods running on each
node. Updating the kubelet QPS and burst rates is recommended if there are
enough CPU and memory resources on the node:
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
name: set-max-pods
spec:
machineConfigPoolSelector:
matchLabels:
custom-kubelet: large-pods
kubeletConfig:
maxPods: <pod_count>
kubeAPIBurst: <burst_rate>
kubeAPIQPS: <QPS>
Run:
$ oc label machineconfigpool worker custom-kubelet=large-pods
Run:
$ oc create -f change-maxPods-cr.yaml
Run:
$ oc get kubeletconfig
This should return set-max-pods
.
Depending on the number of worker nodes in the cluster, wait for the worker nodes to be rebooted one by one. For a cluster with 3 worker nodes, this could take about 10 to 15 minutes.
Check for maxPods
changing for the worker nodes:
$ oc describe node
Verify the change by running:
$ oc get kubeletconfigs set-max-pods -o yaml
This should show a status of True
and type:Success
By default, only one machine is allowed to be unavailable when applying the kubelet-related configuration to the available worker nodes. For a large cluster, it can take a long time for the configuration change to be reflected. At any time, you can adjust the number of machines that are updating to speed up the process.
Run:
$ oc edit machineconfigpool worker
Set maxUnavailable
to the desired value.
spec: maxUnavailable: <node_count>
When setting the value, consider the number of worker nodes that can be unavailable without affecting the applications running on the cluster. |
The master node resource requirements depend on the number of nodes in the cluster. The following master node size recommendations are based on the results of control plane density focused testing.
Number of worker nodes | CPU cores | Memory (GB) |
---|---|---|
25 |
4 |
16 |
100 |
8 |
32 |
250 |
16 |
64 |
Because you cannot modify the master node size in a running OpenShift Container Platform 4.2 cluster, you must estimate your total node count and use the suggested master size during installation. |
In OpenShift Container Platform 4.2, half of a CPU core (500 millicore) is now reserved by the system by default compared to OpenShift Container Platform 3.11 and previous versions. The sizes are determined taking that into consideration. |
For large and dense clusters, etcd can suffer from poor performance
if the keyspace grows excessively large and exceeds the space quota.
Periodic maintenance of etcd including defragmentation needs to be done
to free up space in the data store. It is highly recommended that you monitor
Prometheus for etcd metrics and defragment it when needed before etcd raises
a cluster-wide alarm that puts the cluster into a maintenance mode, which
only accepts key reads and deletes. Some of the key metrics to monitor are
etcd_server_quota_backend_bytes
which is the current quota limit,
etcd_mvcc_db_total_size_in_use_in_bytes
which indicates the actual
database usage after a history compaction, and
etcd_debugging_mvcc_db_total_size_in_bytes
which shows the database size
including free space waiting for defragmentation.