Applying autoscaling to an OpenShift Container Platform cluster involves deploying a ClusterAutoscaler and then deploying MachineAutoscalers for each Machine type in your cluster.

About the ClusterAutoscaler

The ClusterAutoscaler adjusts the size of an OpenShift Container Platform cluster to meet its current deployment needs. It uses declarative, Kubernetes-style arguments to provide infrastructure management that does not rely on objects of a specific cloud provider. The ClusterAutoscaler has a cluster scope, and is not associated with a particular namespace.

The ClusterAutoscaler increases the size of the cluster when there are pods that failed to schedule on any of the current nodes due to insufficient resources or when another node is necessary to meet deployment needs. The ClusterAutoscaler does not increase the cluster resources beyond the limits that you specify.

The ClusterAutoscaler decreases the size of the cluster when some nodes are consistently not needed for a significant period, such as when it has low resource use and all of its important pods can fit on other nodes.

If the following types of pods are present on a node, the ClusterAutoscaler will not remove the node:

  • Pods with restrictive PodDisruptionBudgets (PDBs).

  • Kube-system pods that do not run on the node by default.

  • Kube-system pods that do not have a PDBB or have a PDB that is too restrictive.

  • Pods that are not backed by a controller object such as a Deployment, ReplicaSet, or StatefulSet.

  • Pods with local storage.

  • Pods that cannot be moved elsewhere because of a lack of resources, incompatible node selectors or affinity, matching anti-affinity, and so on.

  • Unless they also have a "cluster-autoscaler.kubernetes.io/safe-to-evict": "true" annotation, pods that have a "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" annotation.

If you configure the ClusterAutoscaler, additional usage restrictions apply:

  • Do not modify the nodes that are in autoscaled node groups directly. All nodes within the same node group have the same capacity and labels and run the same system pods.

  • Specify requests for your pods.

  • If you have to prevent pods from being deleted too quickly, configure appropriate PDBs.

  • Confirm that your cloud provider quota is large enough to support the maximum node pools that you configure.

  • Do not run additional node group autoscalers, especially the ones offered by your cloud provider.

The Horizontal Pod Autoscaler (HPA) and the ClusterAutoscaler modify cluster resources in different ways. The HPA changes the deployment’s or ReplicaSet’s number of replicas based on the current CPU load. If the load increases, the HPA creates new replicas, regardless of the amount of resources available to the cluster. If there are not enough resources, the ClusterAutoscaler adds resources so that the HPA-created pods can run. If the load decreases, the HPA stops some replicas. If this action causes some nodes to be underutilized or completely empty, the ClusterAutoscaler deletes the unnecessary nodes.

The ClusterAutoscaler takes pod priorities into account. The Pod Priority and Preemption feature enables scheduling pods based on priorities if the cluster does not have enough resources, but the ClusterAutoscaler ensures that the cluster has resources to run all pods. To honor the intention of both features, the ClusterAutoscaler inclues a priority cutoff function. You can use this cutoff to schedule "best-effort" pods, which do not cause the ClusterAutoscaler to increase resources but instead run only when spare resources are available.

Pods with priority lower than the cutoff value do not cause the cluster to scale up or prevent the cluster from scaling down. No new nodes are added to run the pods, and nodes running these pods might be deleted to free resources.

About the MachineAutoscaler

The MachineAutoscaler adjusts the number of Machines in the MachineSets that you deploy in an OpenShift Container Platform cluster. You can scale both the default worker MachineSet and any other MachineSets that you create. The MachineAutoscaler makes more Machines when the cluster runs out of resources to support more deployments. Any changes to the values in MachineAutoscaler resources, such as the minimum or maximum number of instances, are immediately applied to the MachineSet they target.

You must deploy a MachineAutoscaler for the ClusterAutoscaler to scale your machines. The ClusterAutoscaler uses the annotations on MachineSets that the MachineAutoscaler sets to determine the resources that it can scale. If you define a ClusterAutoscaler without also defining MachineAutoscalers, the ClusterAutoscaler will never scale your cluster.

Configuring the ClusterAutoscaler

First, deploy the ClusterAutoscaler to manage automatic resource scaling in your OpenShift Container Platform cluster.

Because the ClusterAutoscaler is scoped to the entire cluster, you can make only one ClusterAutoscaler for the cluster.

ClusterAutoscaler resource definition

This ClusterAutoscaler resource definition shows the parameters and sample values for the ClusterAutoscaler.

apiVersion: "autoscaling.openshift.io/v1"
kind: "ClusterAutoscaler"
metadata:
  name: "default"
spec:
  podPriorityThreshold: -10 (1)
  resourceLimits:
    maxNodesTotal: 24 (2)
    cores:
      min: 8 (3)
      max: 128 (4)
    memory:
      min: 4 (5)
      max: 256 (6)
    gpus:
      - type: nvidia.com/gpu (7)
        min: 0 (8)
        max: 16 (9)
      - type: amd.com/gpu (7)
        min: 0 (8)
        max: 4 (9)
  scaleDown: (10)
    enabled: true (11)
    delayAfterAdd: 10m (12)
    delayAfterDelete: 5m (13)
    delayAfterFailure: 30s (14)
    unneededTime: 60s (15)
1 Specify the priority that a pod must exceed to cause the ClusterAutoscaler to deploy additional nodes. Enter a 32-bit integer value. The podPriorityThreshold value is compared to the value of the PriorityClass that you assign to each pod.
2 Specify the maximum number of nodes to deploy.
3 Specify the minimum number of cores to deploy.
4 Specify the maximum number of cores to deploy.
5 Specify the minimum amount of memory, in GiB, per node.
6 Specify the maximum amount of memory, in GiB, per node.
7 Optionally, specify the type of GPU node to deploy. Only nvidia.com/gpu and amd.com/gpu are valid types.
8 Specify the minimum number of GPUs to deploy.
9 Specify the maximum number of GPUs to deploy.
10 In this section, you can specify the period to wait for each action by using any valid ParseDuration interval, including ns, us, ms, s, m, and h.
11 Specify whether the ClusterAutoscaler can remove unnecessary nodes.
12 Optionally, specify the period to wait before deleting a node after a node has recently been added. If you do not specify a value, the default value of 10m is used.
13 Specify the period to wait before deleting a node after a node has recently been deleted. If you do not specify a value, the default value of 10s is used.
14 Specify the period to wait before deleting a node after a scale down failure occurred. If you do not specify a value, the default value of 3m is used.
15 Specify the period before an unnecessary node is eligible for deletion. If you do not specify a value, the default value of 10m is used.

Deploying the ClusterAutoscaler

To deploy the ClusterAutoscaler, you create an instance of the ClusterAutoscaler resource.

Procedure
  1. Create a YAML file for the ClusterAutoscaler resource that contains the customized resource definition.

  2. Create the resource in the cluster:

    $ oc create -f <filename>.yaml (1)
    1 <filename> is the name of the resource file that you customized.
Next steps
  • After you configure the ClusterAutoscaler, you must configure at least one MachineAutoscaler.

Configuring the MachineAutoscalers

After you deploy the ClusterAutoscaler, deploy MachineAutoscaler resources that reference the MachineSets that are used to scale the cluster.

You must deploy at least one MachineAutoscaler resource after you deploy the ClusterAutoscaler resource.

You must configure separate resources for each MachineSet. Remember that MachineSets are different in each AWS region, so consider whether you want to enable machine scaling in multiple regions.

MachineAutoscaler resource definition

This MachineAutoscaler resource definition shows the parameters and sample values for the MachineAutoscaler.

apiVersion: "autoscaling.openshift.io/v1beta1"
kind: "MachineAutoscaler"
metadata:
  name: "worker-us-east-1a" (1)
  namespace: "openshift-machine-api"
spec:
  minReplicas: 1 (2)
  maxReplicas: 12 (3)
  scaleTargetRef: (4)
    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet (5)
    name: worker-us-east-1a (6)
1 Specify the MachineAutoscaler name. To make it easier to identify which MachineSet this MachineAutoscaler scales, specify or include the name of the MachineSet to scale. The MachineSet name takes the following form: <clusterid>-<machineset>-<aws-region-az>
2 Specify the minimum number Machines of the specified type to deploy in the specified AWS zone.
3 Specify the maxiumum number Machines of the specified type to deploy in the specified AWS zone.
4 In this section, provide values that describe the existing MachineSet to scale.
5 The kind parameter value is always MachineSet.
6 The name value must match the name of an existing MachineSet, as shown in the metadata.name parameter value.

Deploying the MachineAutoscaler

To deploy the MachineAutoscaler, you create an instance of the MachineAutoscaler resource.

Procedure
  1. Create a YAML file for the MachineAutoscaler resource that contains the customized resource definition.

  2. Create the resource in the cluster:

    $ oc create -f <filename>.yaml (1)
    1 <filename> is the name of the resource file that you customized.

Additional resources