Understanding how to evacuate pods on nodes

Evacuating pods allows you to migrate all or selected pods from a given node or nodes.

You can only evacuate pods backed by a replication controller. The replication controllers create new pods on other nodes and removes the existing pods from the specified node(s).

Bare pods, meaning those not backed by a replication controller, are unaffected by default. You can evacuate a subset of pods by specifying a pod-selector. Pod selectors are based on labels, so all the pods with the specified label will be evacuated.

Nodes must first be marked unschedulable to perform pod evacuation.

$ oc adm cordon <node1>
NAME        STATUS                        ROLES     AGE       VERSION
<node1>     NotReady,SchedulingDisabled   worker   1d        v1.13.4+b626c2fe1

Use oc uncordon to mark the node as schedulable when done.

$ oc adm cordon <node1>
  • The following command evacuates all or selected pods on one or more nodes:

    $ oc adm drain <node1> <node2> [--pod-selector=<pod_selector>]
  • The following command forces deletion of bare pods using the --force option. When set to true, deletion continues even if there are pods not managed by a replication controller, ReplicaSet, job, daemonset, or StatefulSet:

    $ oc adm drain <node1> <node2> --force=true
  • The following command sets a period of time in seconds for each pod to terminate gracefully, use --grace-period. If negative, the default value specified in the pod will be used:

    $ oc adm drain <node1> <node2> --grace-period=-1
  • The following command ignores DaemonSet-managed pods using the --ignore-daemonsets flag set to true:

    $ oc adm drain <node1> <node2> --ignore-daemonsets=true
  • The following command sets the length of time to wait before giving up using the --timeout flag. A value of 0 sets an infinite length of time:

    $ oc adm drain <node1> <node2> --timeout=5s
  • The following command deletes pods even if there are pods using emptyDir using the --delete-local-data flag set to true. Local data is deleted when the node is drained:

    $ oc adm drain <node1> <node2> --delete-local-data=true
  • The following command lists objects that will be migrated without actually performing the evacuation, using the --dry-run option set to true:

    $ oc adm drain <node1> <node2>  --dry-run=true

    Instead of specifying specific node names (for example, <node1> <node2>), you can use the --selector=<node_selector> option to evacuate pods on selected nodes.

Understanding how to update labels on nodes

You can update any label on a node.

Node labels are not persisted after a node is deleted even if the node is backed up by a Machine. If you must ensure that the labels persist, add the labels to the MachineSet that controls the Machine instead.

  • The following command adds or updates labels on a node:

    $ oc label node <node> <key_1>=<value_1> ... <key_n>=<value_n>

    For example:

    $ oc label nodes webconsole-7f7f6 unhealthy=true
  • The following command updates all pods in the namespace:

    $ oc label pods --all <key_1>=<value_1>

    For example:

    $ oc label pods --all status=unhealthy

Understanding how to marking nodes as unschedulable or schedulable

By default, healthy nodes with a Ready status are marked as schedulable, meaning that new pods are allowed for placement on the node. Manually marking a node as unschedulable blocks any new pods from being scheduled on the node. Existing pods on the node are not affected.

  • The following command marks a node or nodes as unschedulable:

    $ oc adm cordon <node>

    For example:

    $ oc adm cordon node1.example.com
    node/node1.example.com cordoned
    
    NAME                 LABELS                                        STATUS
    node1.example.com    kubernetes.io/hostname=node1.example.com      Ready,SchedulingDisabled
  • The following command marks a currently unschedulable node or nodes as schedulable:

    $ oc uncordon <node1>

    Alternatively, instead of specifying specific node names (for example, <node>), you can use the --selector=<node_selector> option to mark selected nodes as schedulable or unschedulable.

Deleting nodes from a cluster

When you delete a node using the CLI, the node object is deleted in Kubernetes, but the pods that exist on the node are not deleted. Any bare pods not backed by a replication controller become inaccessible to OpenShift Container Platform. Pods backed by replication controllers are rescheduled to other available nodes. You must delete local manifest pods.

Procedure

To delete a node from the OpenShift Container Platform cluster edit the appropriate MachineSet:

  1. View the MachineSets that are in the cluster:

    $ oc get machinesets -n openshift-machine-api

    The MachineSets are listed in the form of <clusterid>-worker-<aws-region-az>.

  2. Scale the MachineSet:

    $ oc scale --replicas=2 machineset <machineset> -n openshift-machine-api

For more information on scaling your cluster using a MachineSet, see Manually scaling a MachineSet.

Additional resources

For more information on scaling your cluster using a MachineSet, see Manually scaling a MachineSet.