$ oc edit vm <vm_name> -n <namespace>
Nodes can be placed into maintenance mode by using the oc adm
utility or NodeMaintenance
custom resources (CRs).
The For more information on remediation, fencing, and maintaining nodes, see the Workload Availability for Red Hat OpenShift documentation. |
Virtual machines (VMs) must have a persistent volume claim (PVC) with a shared |
The Node Maintenance Operator watches for new or deleted NodeMaintenance
CRs. When a new NodeMaintenance
CR is detected, no new workloads are scheduled and the node is cordoned off from the rest of the cluster. All pods that can be evicted are evicted from the node. When a NodeMaintenance
CR is deleted, the node that is referenced in the CR is made available for new workloads.
Using a |
Placing a node into maintenance marks the node as unschedulable and drains all the VMs and pods from it.
You can configure eviction strategies for virtual machines (VMs) or for the cluster.
The VM LiveMigrate
eviction strategy ensures that a virtual machine instance (VMI) is not interrupted if the node is placed into maintenance or drained. VMIs with this eviction strategy will be live migrated to another node.
You can configure eviction strategies for virtual machines (VMs) by using the OpenShift Virtualization web console or the command line.
The default eviction strategy is You must set the eviction strategy of non-migratable VMs to |
You can configure an eviction strategy for a virtual machine (VM) by using the command line.
The default eviction strategy is You must set the eviction strategy of non-migratable VMs to |
Edit the VirtualMachine
resource by running the following command:
$ oc edit vm <vm_name> -n <namespace>
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: <vm_name>
spec:
template:
spec:
evictionStrategy: LiveMigrateIfPossible (1)
# ...
1 | Specify the eviction strategy. The default value is LiveMigrate . |
Restart the VM to apply the changes:
$ virtctl restart <vm_name> -n <namespace>
A virtual machine (VM) configured with spec.running: true
is immediately restarted. The spec.runStrategy
key provides greater flexibility for determining how a VM behaves under certain conditions.
The A VM configuration with both keys is invalid. |
The spec.runStrategy
key has four possible values:
Always
The virtual machine instance (VMI) is always present when a virtual machine (VM) is created on another node. A new VMI is created if the original stops for any reason. This is the same behavior as running: true
.
RerunOnFailure
The VMI is re-created on another node if the previous instance fails. The instance is not re-created if the VM stops successfully, such as when it is shut down.
Manual
You control the VMI state manually with the start
, stop
, and restart
virtctl client commands. The VM is not automatically restarted.
Halted
No VMI is present when a VM is created. This is the same behavior as running: false
.
Different combinations of the virtctl start
, stop
and restart
commands affect the run strategy.
The following table describes a VM’s transition between states. The first column shows the VM’s initial run strategy. The remaining columns show a virtctl command and the new run strategy after that command is run.
Initial run strategy | Start | Stop | Restart |
---|---|---|---|
Always |
- |
Halted |
Always |
RerunOnFailure |
- |
Halted |
RerunOnFailure |
Manual |
Manual |
Manual |
Manual |
Halted |
Always |
- |
- |
If a node in a cluster installed by using installer-provisioned infrastructure fails the machine health check and is unavailable, VMs with |
You can configure a run strategy for a virtual machine (VM) by using the command line.
The |
Edit the VirtualMachine
resource by running the following command:
$ oc edit vm <vm_name> -n <namespace>
apiVersion: kubevirt.io/v1
kind: VirtualMachine
spec:
runStrategy: Always
# ...
When you deploy Red Hat OpenShift Service on AWS on bare metal infrastructure, there are additional considerations that must be taken into account compared to deploying on cloud infrastructure. Unlike in cloud environments where the cluster nodes are considered ephemeral, re-provisioning a bare metal node requires significantly more time and effort for maintenance tasks.
When a bare metal node fails, for example, if a fatal kernel error happens or a NIC card hardware failure occurs, workloads on the failed node need to be restarted elsewhere else on the cluster while the problem node is repaired or replaced. Node maintenance mode allows cluster administrators to gracefully power down nodes, moving workloads to other parts of the cluster and ensuring workloads do not get interrupted. Detailed progress and node status details are provided during maintenance.