During the lifecycle of an Operator, it is possible that there may be more than
one instance running at any given time, for example when rolling out an upgrade
for the Operator. In such a scenario, it is necessary to avoid contention
between multiple Operator instances using leader election. This ensures only one
leader instance handles the reconciliation while the other instances are
inactive but ready to take over when the leader steps down.
There are two different leader election implementations to choose from, each
with its own trade-off:
-
Leader-for-life: The leader Pod only gives up leadership (using garbage
collection) when it is deleted. This implementation precludes the possibility of
two instances mistakenly running as leaders (split brain). However, this method
can be subject to a delay in electing a new leader. For example, when the leader
Pod is on an unresponsive or partitioned node, the
pod-eviction-timeout
dictates how it takes for the leader Pod to be deleted from the node and step
down (default 5m
). See the Leader-for-life Go documentation for more.
-
Leader-with-lease: The leader Pod periodically renews the leader lease and
gives up leadership when it cannot renew the lease. This implementation allows
for a faster transition to a new leader when the existing leader is isolated,
but there is a possibility of split brain in
certain situations. See the
Leader-with-lease
Go documentation for more.
By default, the Operator SDK enables the Leader-for-life implementation. Consult
the related Go documentation for both approaches to consider the trade-offs that make
sense for your use case,
The following examples illustrate how to use the two options.