If you use remote worker nodes, consider which objects to use to run your applications.
It is recommend to use daemon sets or static pods based on the behavior you want in the event of network issues or power loss. In addition, you can use Kubernetes zones and tolerations to control or avoid pod evictions if the control plane cannot reach remote worker nodes.
- Daemon sets
Daemon sets are the best approach to managing pods on remote worker nodes for the following reasons:
Daemon sets do not typically need rescheduling behavior. If a node disconnects from the cluster, pods on the node can continue to run. OpenShift Container Platform does not change the state of daemon set pods, and leaves the pods in the state they last reported. For example, if a daemon set pod is in the
Running state, when a node stops communicating, the pod keeps running and is assumed to be running by OpenShift Container Platform.
Daemon set pods, by default, are created with
NoExecute tolerations for the
node.kubernetes.io/not-ready taints with no
tolerationSeconds value. These default values ensure that daemon set pods are never evicted if the control plane cannot reach a node. For example:
Tolerations added to daemon set pods by default
- key: node.kubernetes.io/not-ready
- key: node.kubernetes.io/unreachable
- key: node.kubernetes.io/disk-pressure
- key: node.kubernetes.io/memory-pressure
- key: node.kubernetes.io/pid-pressure
- key: node.kubernetes.io/unschedulable
Daemon sets can use labels to ensure that a workload runs on a matching worker node.
You can use an OpenShift Container Platform service endpoint to load balance daemon set pods.
Daemon sets do not schedule pods after a reboot of the node if OpenShift Container Platform cannot reach the node.
- Static pods
If you want pods restart if a node reboots, after a power loss for example, consider static pods. The kubelet on a node automatically restarts static pods as node restarts.
Static pods cannot use secrets and config maps.
- Kubernetes zones
Kubernetes zones can slow down the rate or, in some cases, completely stop pod evictions.
When the control plane cannot reach a node, the node controller, by default, applies
node.kubernetes.io/unreachable taints and evicts pods at a rate of 0.1 nodes per second. However, in a cluster that uses Kubernetes zones, pod eviction behavior is altered.
If a zone is fully disrupted, where all nodes in the zone have a
Ready condition that is
Unknown, the control plane does not apply the
node.kubernetes.io/unreachable taint to the nodes in that zone.
For partially disrupted zones, where more than 55% of the nodes have a
Unknown condition, the pod eviction rate is reduced to 0.01 nodes per second. Nodes in smaller clusters, with fewer than 50 nodes, are not tainted. Your cluster must have more than three zones for these behavior to take effect.
You assign a node to a specific zone by applying the
topology.kubernetes.io/region label in the node specification.
Sample node labels for Kubernetes zones
You can adjust the amount of time that the kubelet checks the state of each node.
To set the interval that affects the timing of when the on-premise node controller marks nodes with the
Unreachable condition, create a
KubeletConfig object that contains the
The kubelet on each node determines the node status as defined by the
node-status-update-frequency setting and reports that status to the cluster based on the
node-status-report-frequency setting. By default, the kubelet determines the pod status every 10 seconds and reports the status every minute. However, if the node state changes, the kubelet reports the change to the cluster immediately. OpenShift Container Platform uses the
node-status-report-frequency setting only when the Node Lease feature gate is enabled, which is the default state in OpenShift Container Platform clusters. If the Node Lease feature gate is disabled, the node reports its status based on the
Example kubelet config
machineconfiguration.openshift.io/role: worker (1)
||Specify the type of node type to which this
KubeletConfig object applies using the label from the
||Specify the frequency that the kubelet checks the status of a node associated with this
MachineConfig object. The default value is
10s. If you change this default, the
node-status-report-frequency value is changed to the same value.
||Specify the frequency that the kubelet reports the status of a node associated with this
MachineConfig object. The default value is
node-status-update-frequency parameter works with the
node-monitor-grace-period parameter specifies how long OpenShift Container Platform waits after a node associated with a
MachineConfig object is marked
Unhealthy if the controller manager does not receive the node heartbeat. Workloads on the node continue to run after this time. If the remote worker node rejoins the cluster after
node-monitor-grace-period expires, pods continue to run. New pods can be scheduled to that node. The
node-monitor-grace-period interval is
node-status-update-frequency value must be lower than the
pod-eviction-timeout parameter specifies the amount of time OpenShift Container Platform waits after marking a node that is associated with a
MachineConfig object as
Unreachable to start marking pods for eviction. Evicted pods are rescheduled on other nodes. If the remote worker node rejoins the cluster after
pod-eviction-timeout expires, the pods running on the remote worker node are terminated because the node controller has evicted the pods on-premise. Pods can then be rescheduled to that node. The
pod-eviction-timeout interval is
pod-eviction-timeout parameters is not supported.