MachineHealthChecks automatically repairs unhealthy Machines in a particular
MachinePool.
To monitor machine health, you create a resource to define the
configuration for a controller. You set a condition to check for, such as
staying in the NotReady
status for 15 minutes or displaying a permanent condition
in the node-problem-detector, and a label for the set of machines to monitor.
|
You cannot apply a MachineHealthCheck to a machine with the master role.
|
The controller that observes a MachineHealthCheck resource checks for the status
that you defined. If a machine fails the health check, it is automatically deleted
and a new one is created to take its place. When a machine is deleted, you
see a machine deleted
event. To limit disruptive impact of the machine
deletion, the controller drains and deletes only one node at a time.
To stop the check, you remove the resource.