×

Understanding how to evacuate pods on nodes

Evacuating pods allows you to migrate all or selected pods from a given node or nodes.

You can only evacuate pods backed by a replication controller. The replication controller creates new pods on other nodes and removes the existing pods from the specified node(s).

Bare pods, meaning those not backed by a replication controller, are unaffected by default. You can evacuate a subset of pods by specifying a pod-selector. Pod selectors are based on labels, so all the pods with the specified label will be evacuated.

Nodes must first be marked unschedulable to perform pod evacuation.

$ oc adm cordon <node1>
NAME        STATUS                        ROLES     AGE       VERSION
<node1>     NotReady,SchedulingDisabled   worker   1d        v1.13.4+b626c2fe1

Use oc adm uncordon to mark the node as schedulable when done.

$ oc adm uncordon <node1>
  • The following command evacuates all or selected pods on one or more nodes:

    $ oc adm drain <node1> <node2> [--pod-selector=<pod_selector>]
  • The following command forces deletion of bare pods using the --force option. When set to true, deletion continues even if there are pods not managed by a replication controller, ReplicaSet, job, daemonset, or StatefulSet:

    $ oc adm drain <node1> <node2> --force=true
  • The following command sets a period of time in seconds for each pod to terminate gracefully, use --grace-period. If negative, the default value specified in the pod will be used:

    $ oc adm drain <node1> <node2> --grace-period=-1
  • The following command ignores DaemonSet-managed pods using the --ignore-daemonsets flag set to true:

    $ oc adm drain <node1> <node2> --ignore-daemonsets=true
  • The following command sets the length of time to wait before giving up using the --timeout flag. A value of 0 sets an infinite length of time:

    $ oc adm drain <node1> <node2> --timeout=5s
  • The following command deletes pods even if there are pods using emptyDir using the --delete-local-data flag set to true. Local data is deleted when the node is drained:

    $ oc adm drain <node1> <node2> --delete-local-data=true
  • The following command lists objects that will be migrated without actually performing the evacuation, using the --dry-run option set to true:

    $ oc adm drain <node1> <node2>  --dry-run=true

    Instead of specifying specific node names (for example, <node1> <node2>), you can use the --selector=<node_selector> option to evacuate pods on selected nodes.

Understanding how to update labels on nodes

You can update any label on a node.

Node labels are not persisted after a node is deleted even if the node is backed up by a Machine.

Any change to a MachineSet is not applied to existing machines owned by the MachineSet. For example, labels edited or added to an existing MachineSet are not propagated to existing machines and Nodes associated with the MachineSet.

  • The following command adds or updates labels on a node:

    $ oc label node <node> <key_1>=<value_1> ... <key_n>=<value_n>

    For example:

    $ oc label nodes webconsole-7f7f6 unhealthy=true
  • The following command updates all pods in the namespace:

    $ oc label pods --all <key_1>=<value_1>

    For example:

    $ oc label pods --all status=unhealthy

Understanding how to marking nodes as unschedulable or schedulable

By default, healthy nodes with a Ready status are marked as schedulable, meaning that new pods are allowed for placement on the node. Manually marking a node as unschedulable blocks any new pods from being scheduled on the node. Existing pods on the node are not affected.

  • The following command marks a node or nodes as unschedulable:

    $ oc adm cordon <node>

    For example:

    $ oc adm cordon node1.example.com
    node/node1.example.com cordoned
    
    NAME                 LABELS                                        STATUS
    node1.example.com    kubernetes.io/hostname=node1.example.com      Ready,SchedulingDisabled
  • The following command marks a currently unschedulable node or nodes as schedulable:

    $ oc adm uncordon <node1>

    Alternatively, instead of specifying specific node names (for example, <node>), you can use the --selector=<node_selector> option to mark selected nodes as schedulable or unschedulable.

Deleting nodes from a cluster

When you delete a node using the CLI, the node object is deleted in Kubernetes, but the pods that exist on the node are not deleted. Any bare pods not backed by a replication controller become inaccessible to OpenShift Container Platform. Pods backed by replication controllers are rescheduled to other available nodes. You must delete local manifest pods.

Procedure

To delete a node from the OpenShift Container Platform cluster edit the appropriate MachineSet:

  1. View the MachineSets that are in the cluster:

    $ oc get machinesets -n openshift-machine-api

    The MachineSets are listed in the form of <clusterid>-worker-<aws-region-az>.

  2. Scale the MachineSet:

    $ oc scale --replicas=2 machineset <machineset> -n openshift-machine-api

For more information on scaling your cluster using a MachineSet, see Manually scaling a MachineSet.

Additional resources

For more information on scaling your cluster using a MachineSet, see Manually scaling a MachineSet.