×

As a developer, you can use a horizontal pod autoscaler (HPA) to specify how OpenShift Container Platform should automatically increase or decrease the scale of a replication controller or deployment configuration, based on metrics collected from the pods that belong to that replication controller or deployment configuration.

Understanding horizontal pod autoscalers

You can create a horizontal pod autoscaler to specify the minimum and maximum number of pods you want to run, as well as the CPU utilization or memory utilization your pods should target.

Autoscaling for Memory Utilization is a Technology Preview feature only.

After you create a horizontal pod autoscaler, OpenShift Container Platform begins to query the CPU and/or memory resource metrics on the pods. When these metrics are available, the horizontal pod autoscaler computes the ratio of the current metric utilization with the desired metric utilization, and scales up or down accordingly. The query and scaling occurs at a regular interval, but can take one to two minutes before metrics become available.

For replication controllers, this scaling corresponds directly to the replicas of the replication controller. For deployment configurations, scaling corresponds directly to the replica count of the deployment configuration. Note that autoscaling applies only to the latest deployment in the Complete phase.

OpenShift Container Platform automatically accounts for resources and prevents unnecessary autoscaling during resource spikes, such as during start up. Pods in the unready state have 0 CPU usage when scaling up and the autoscaler ignores the pods when scaling down. Pods without known metrics have 0% CPU usage when scaling up and 100% CPU when scaling down. This allows for more stability during the HPA decision. To use this feature, you must configure readiness checks to determine if a new pod is ready for use.

In order to use horizontal pod autoscalers, your cluster administrator must have properly configured cluster metrics.

Supported metrics

The following metrics are supported by horizontal pod autoscalers:

Table 1. Metrics
Metric Description API version

CPU utilization

Number of CPU cores used. Can be used to calculate a percentage of the pod’s requested CPU.

autoscaling/v1, autoscaling/v2beta2

Memory utilization

Amount of memory used. Can be used to calculate a percentage of the pod’s requested memory.

autoscaling/v2beta2

For memory-based autoscaling, memory usage must increase and decrease proportionally to the replica count. On average:

  • An increase in replica count must lead to an overall decrease in memory (working set) usage per-pod.

  • A decrease in replica count must lead to an overall increase in per-pod memory usage.

Use the OpenShift Container Platform web console to check the memory behavior of your application and ensure that your application meets these requirements before using memory-based autoscaling.

Creating a horizontal pod autoscaler for CPU utilization

You can create a horizontal pod autoscaler (HPA) for an existing DeploymentConfig or ReplicationController object that automatically scales the Pods associated with that object in order to maintain the CPU usage you specify.

The HPA increases and decreases the number of replicas between the minimum and maximum numbers to maintain the specified CPU utilization across all Pods.

When autoscaling for CPU utilization, you can use the oc autoscale command and specify the minimum and maximum number of Pods you want to run at any given time and the average CPU utilization your Pods should target. If you do not specify a minimum, the Pods are given default values from the OpenShift Container Platform server. To autoscale for a specific CPU value, create a HorizontalPodAutoscaler object with the target CPU and Pod limits.

Prerequisites

In order to use horizontal pod autoscalers, your cluster administrator must have properly configured cluster metrics. You can use the oc describe PodMetrics <pod-name> command to determine if metrics are configured. If metrics are configured, the output appears similar to the following, with Cpu and Memory displayed under Usage.

$ oc describe PodMetrics openshift-kube-scheduler-ip-10-0-135-131.ec2.internal
Name:         openshift-kube-scheduler-ip-10-0-135-131.ec2.internal
Namespace:    openshift-kube-scheduler
Labels:       <none>
Annotations:  <none>
API Version:  metrics.k8s.io/v1beta1
Containers:
  Name:  wait-for-host-port
  Usage:
    Memory:  0
  Name:      scheduler
  Usage:
    Cpu:     8m
    Memory:  45440Ki
Kind:        PodMetrics
Metadata:
  Creation Timestamp:  2019-05-23T18:47:56Z
  Self Link:           /apis/metrics.k8s.io/v1beta1/namespaces/openshift-kube-scheduler/pods/openshift-kube-scheduler-ip-10-0-135-131.ec2.internal
Timestamp:             2019-05-23T18:47:56Z
Window:                1m0s
Events:                <none>
Procedure

To create a horizontal pod autoscaler for CPU utilization:

  1. Perform one of the following one of the following:

    • To scale based on the percent of CPU utilization, create a HorizontalPodAutoscaler object for an existing DeploymentConfig:

      $ oc autoscale dc/<dc-name> \(1)
        --min <number> \(2)
        --max <number> \(3)
        --cpu-percent=<percent> (4)
      1 Specify the name of the DeploymentConfig. The object must exist.
      2 Optionally, specify the minimum number of replicas when scaling down.
      3 Specify the maximum number of replicas when scaling up.
      4 Specify the target average CPU utilization over all the Pods, represented as a percent of requested CPU. If not specified or negative, a default autoscaling policy is used.
    • To scale based on the percent of CPU utilization, create a HorizontalPodAutoscaler object for an existing ReplicationController:

      $ oc autoscale rc/<rc-name> (1)
        --min <number> \(2)
        --max <number> \(3)
        --cpu-percent=<percent> (4)
      1 Specify the name of the ReplicationController. The object must exist.
      2 Specify the minimum number of replicas when scaling down.
      3 Specify the maximum number of replicas when scaling up.
      4 Specify the target average CPU utilization over all the Pods, represented as a percent of requested CPU. If not specified or negative, a default autoscaling policy is used.
    • To scale for a specific CPU value, create a YAML file similar to the following for an existing DeploymentConfig or ReplicationController:

      1. Create a YAML file similar to the following:

        apiVersion: autoscaling/v2beta2 (1)
        kind: HorizontalPodAutoscaler
        metadata:
          name: cpu-autoscale (2)
          namespace: default
        spec:
          scaleTargetRef:
            apiVersion: v1 (3)
            kind: ReplicationController (4)
            name: example (5)
          minReplicas: 1 (6)
          maxReplicas: 10 (7)
          metrics: (8)
          - type: Resource
            resource:
              name: cpu (9)
              target:
                type: Utilization (10)
                averageValue: 500m (11)
        1 Use the autoscaling/v2beta2 API.
        2 Specify a name for this horizontal pod autoscaler object.
        3 Specify the API version of the object to scale:
        • For a ReplicationController, use v1,

        • For a DeploymentConfig, use apps.openshift.io/v1.

        4 Specify the kind of object to scale, either ReplicationController or DeploymentConfig.
        5 Specify the name of the object to scale. The object must exist.
        6 Specify the minimum number of replicas when scaling down.
        7 Specify the maximum number of replicas when scaling up.
        8 Use the metrics parameter for memory utilization.
        9 Specify cpu for CPU utilization.
        10 Set to Utilization.
        11 Set the type to averageValue.
      2. Create the horizontal pod autoscaler:

        $ oc create -f <file-name>.yaml
  2. Verify that the horizontal pod autoscaler was created:

    $ oc get hpa cpu-autoscale
    
    NAME            REFERENCE                       TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
    cpu-autoscale   ReplicationController/example   173m/500m       1         10        1          20m

For example, the following command creates a horizontal pod autoscaler that maintains between 3 and 7 replicas of the Pods that are controlled by the image-registry DeploymentConfig in order to maintain an average CPU utilization of 75% across all Pods.

$ oc autoscale dc/image-registry --min 3 --max 7 --cpu-percent=75
deploymentconfig "image-registry" autoscaled

The command creates a horizontal pod autoscaler with the following definition:

$ oc edit hpa frontend -n openshift-image-registry
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  creationTimestamp: "2020-02-21T20:19:28Z"
  name: image-registry
  namespace: default
  resourceVersion: "32452"
  selfLink: /apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers/frontend
  uid: 1a934a22-925d-431e-813a-d00461ad7521
spec:
  maxReplicas: 7
  minReplicas: 3
  scaleTargetRef:
    apiVersion: apps.openshift.io/v1
    kind: DeploymentConfig
    name: image-registry
  targetCPUUtilizationPercentage: 75
status:
  currentReplicas: 5
  desiredReplicas: 0

The following example shows autoscaling for the image-registry DeploymentConfig. The initial deployment requires 3 Pods. The HPA object increased that minimum to 5 and will increase the Pods up to 7 if CPU usage on the Pods reaches 75%:

$ oc get dc image-registry
NAME             REVISION   DESIRED   CURRENT   TRIGGERED BY
image-registry   1          3         3         config

$ oc autoscale dc/image-registry --min=5 --max=7 --cpu-percent=75
horizontalpodautoscaler.autoscaling/image-registry autoscaled

$ oc get dc image-registry
NAME             REVISION   DESIRED   CURRENT   TRIGGERED BY
image-registry   1          5         5         config

Creating a horizontal pod autoscaler object for memory utilization

You can create a horizontal pod autoscaler (HPA) for an existing DeploymentConfig or ReplicationController object that automatically scales the Pods associated with that object in order to maintain the average memory utilization you specify, either a direct value or a percentage of requested memory.

The HPA increases and decreases the number of replicas between the minimum and maximum numbers to maintain the specified memory utilization across all Pods.

For memory utilization, you can specify the minimum and maximum number of Pods and the average memory utilization your Pods should target. If you do not specify a minimum, the Pods are given default values from the OpenShift Container Platform server.

Autoscaling for memory utilization is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs), might not be functionally complete, and Red Hat does not recommend to use them for production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information on Red Hat Technology Preview features support scope, see https://access.redhat.com/support/offerings/techpreview/.

Prerequisites

In order to use horizontal pod autoscalers, your cluster administrator must have properly configured cluster metrics. You can use the oc describe PodMetrics <pod-name> command to determine if metrics are configured. If metrics are configured, the output appears similar to the following, with Cpu and Memory displayed under Usage.

$ oc describe PodMetrics openshift-kube-scheduler-ip-10-0-129-223.compute.internal -n openshift-kube-scheduler
Name:         openshift-kube-scheduler-ip-10-0-129-223.compute.internal
Namespace:    openshift-kube-scheduler
Labels:       <none>
Annotations:  <none>
API Version:  metrics.k8s.io/v1beta1
Containers:
  Name:  scheduler
  Usage:
    Cpu:     2m
    Memory:  41056Ki
  Name:      wait-for-host-port
  Usage:
    Memory:  0
Kind:        PodMetrics
Metadata:
  Creation Timestamp:  2020-02-14T22:21:14Z
  Self Link:           /apis/metrics.k8s.io/v1beta1/namespaces/openshift-kube-scheduler/pods/openshift-kube-scheduler-ip-10-0-129-223.compute.internal
Timestamp:             2020-02-14T22:21:14Z
Window:                5m0s
Events:                <none>
Procedure

To create a horizontal pod autoscaler for memory utilization:

  1. Create a YAML file for one of the following:

    • To scale for a specific memory value, create a HorizontalPodAutoscaler object similar to the following for an existing DeploymentConfig or ReplicationController:

      apiVersion: autoscaling/v2beta2 (1)
      kind: HorizontalPodAutoscaler
      metadata:
        name: hpa-resource-metrics-memory (2)
        namespace: default
      spec:
        scaleTargetRef:
          apiVersion: v1 (3)
          kind: ReplicationController (4)
          name: example (5)
        minReplicas: 1 (6)
        maxReplicas: 10 (7)
        metrics: (8)
        - type: Resource
          resource:
            name: memory (9)
            target:
              type: AverageValue (10)
              averageValue: 500Mi (11)
      1 Use the autoscaling/v2beta2 API.
      2 Specify a name for this horizontal pod autoscaler object.
      3 Specify the API version of the object to scale:
      • For a ReplicationController, use v1,

      • For a DeploymentConfig, use apps.openshift.io/v1.

      4 Specify the kind of object to scale, either ReplicationController or DeploymentConfig.
      5 Specify the name of the object to scale. The object must exist.
      6 Specify the minimum number of replicas when scaling down.
      7 Specify the maximum number of replicas when scaling up.
      8 Use the metrics parameter for memory utilization.
      9 Specify memory for memory utilization.
      10 Set the type to AverageValue.
      11 Specify averageValue and a specific memory value.
    • To scale for a percentage, create a HorizontalPodAutoscaler object similar to the following:

      apiVersion: autoscaling/v2beta2 (1)
      kind: HorizontalPodAutoscaler
      metadata:
        name: memory-autoscale (2)
        namespace: default
      spec:
        scaleTargetRef:
          apiVersion: apps.openshift.io/v1 (3)
          kind: DeploymentConfig (4)
          name: example (5)
        minReplicas: 1 (6)
        maxReplicas: 10 (7)
        metrics: (8)
        - type: Resource
          resource:
            name: memory (9)
            target:
              type: Utilization (10)
              averageUtilization: 50 (11)
      1 Use the autoscaling/v2beta2 API.
      2 Specify a name for this horizontal pod autoscaler object.
      3 Specify the API version of the object to scale:
      • For a ReplicationController, use v1,

      • For a DeploymentConfig, use apps.openshift.io/v1.

      4 Specify the kind of object to scale, either ReplicationController or DeploymentConfig.
      5 Specify the name of the object to scale. The object must exist.
      6 Specify the minimum number of replicas when scaling down.
      7 Specify the maximum number of replicas when scaling up.
      8 Use the metrics parameter for memory utilization.
      9 Specify memory for memory utilization.
      10 Set to Utilization.
      11 Specify averageUtilization and a target average memory utilization over all the Pods, represented as a percent of requested memory. The target pods must have memory requests configured.
  2. Create the horizontal pod autoscaler:

    $ oc create -f <file-name>.yaml

    For example:

    $ oc create -f hpa.yaml
    
    horizontalpodautoscaler.autoscaling/hpa-resource-metrics-memory created
  3. Verify that the horizontal pod autoscaler was created:

    $ oc get hpa hpa-resource-metrics-memory
    
    NAME                          REFERENCE                       TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
    hpa-resource-metrics-memory   ReplicationController/example   2441216/500Mi   1         10        1          20m
    $ oc describe hpa hpa-resource-metrics-memory
    Name:                        hpa-resource-metrics-memory
    Namespace:                   default
    Labels:                      <none>
    Annotations:                 <none>
    CreationTimestamp:           Wed, 04 Mar 2020 16:31:37 +0530
    Reference:                   ReplicationController/example
    Metrics:                     ( current / target )
      resource memory on pods:   2441216 / 500Mi
    Min replicas:                1
    Max replicas:                10
    ReplicationController pods:  1 current / 1 desired
    Conditions:
      Type            Status  Reason              Message
      ----            ------  ------              -------
      AbleToScale     True    ReadyForNewScale    recommended size matches current size
      ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from memory resource
      ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
    Events:
      Type     Reason                   Age                 From                       Message
      ----     ------                   ----                ----                       -------
      Normal   SuccessfulRescale        6m34s               horizontal-pod-autoscaler  New size: 1; reason: All metrics below target

Understanding horizontal pod autoscaler status conditions

You can use the status conditions set to determine whether or not the horizontal pod autoscaler (HPA) is able to scale and whether or not it is currently restricted in any way.

The HPA status conditions are available with the v2beta1 version of the autoscaling API.

The HPA responds with the following status conditions:

  • The AbleToScale condition indicates whether HPA is able to fetch and update metrics, as well as whether any backoff-related conditions could prevent scaling.

    • A True condition indicates scaling is allowed.

    • A False condition indicates scaling is not allowed for the reason specified.

  • The ScalingActive condition indicates whether the HPA is enabled (for example, the replica count of the target is not zero) and is able to calculate desired metrics.

    • A True condition indicates metrics is working properly.

    • A False condition generally indicates a problem with fetching metrics.

  • The ScalingLimited condition indicates that the desired scale was capped by the maximum or minimum of the horizontal pod autoscaler.

    • A True condition indicates that you need to raise or lower the minimum or maximum replica count in order to scale.

    • A False condition indicates that the requested scaling is allowed.

      $ oc describe hpa cm-test
      Name:                           cm-test
      Namespace:                      prom
      Labels:                         <none>
      Annotations:                    <none>
      CreationTimestamp:              Fri, 16 Jun 2017 18:09:22 +0000
      Reference:                      ReplicationController/cm-test
      Metrics:                        ( current / target )
        "http_requests" on pods:      66m / 500m
      Min replicas:                   1
      Max replicas:                   4
      ReplicationController pods:     1 current / 1 desired
      Conditions: (1)
        Type              Status    Reason              Message
        ----              ------    ------              -------
        AbleToScale       True      ReadyForNewScale    the last scale time was sufficiently old as to warrant a new scale
        ScalingActive     True      ValidMetricFound    the HPA was able to successfully calculate a replica count from pods metric http_request
        ScalingLimited    False     DesiredWithinRange  the desired replica count is within the acceptable range
      Events:
      1 The horizontal pod autoscaler status messages.

The following is an example of a pod that is unable to scale:

Conditions:
  Type         Status  Reason          Message
  ----         ------  ------          -------
  AbleToScale  False   FailedGetScale  the HPA controller was unable to get the target's current scale: no matches for kind "ReplicationController" in group "apps"
Events:
  Type     Reason          Age               From                       Message
  ----     ------          ----              ----                       -------
  Warning  FailedGetScale  6s (x3 over 36s)  horizontal-pod-autoscaler  no matches for kind "ReplicationController" in group "apps"

The following is an example of a pod that could not obtain the needed metrics for scaling:

Conditions:
  Type                  Status    Reason                    Message
  ----                  ------    ------                    -------
  AbleToScale           True     SucceededGetScale          the HPA controller was able to get the target's current scale
  ScalingActive         False    FailedGetResourceMetric    the HPA was unable to compute the replica count: unable to get metrics for resource cpu: no metrics returned from heapster

The following is an example of a pod where the requested autoscaling was less than the required minimums:

Conditions:
  Type              Status    Reason              Message
  ----              ------    ------              -------
  AbleToScale       True      ReadyForNewScale    the last scale time was sufficiently old as to warrant a new scale
  ScalingActive     True      ValidMetricFound    the HPA was able to successfully calculate a replica count from pods metric http_request
  ScalingLimited    False     DesiredWithinRange  the desired replica count is within the acceptable range

Viewing horizontal pod autoscaler status conditions

You can view the status conditions set on a pod by the horizontal pod autoscaler (HPA).

The horizontal pod autoscaler status conditions are available with the v2beta1 version of the autoscaling API.

Prerequisites

In order to use horizontal pod autoscalers, your cluster administrator must have properly configured cluster metrics. You can use the oc describe PodMetrics <pod-name> command to determine if metrics are configured. If metrics are configured, the output appears similar to the following, with Cpu and Memory displayed under Usage.

$ oc describe PodMetrics openshift-kube-scheduler-ip-10-0-135-131.ec2.internal

Name:         openshift-kube-scheduler-ip-10-0-135-131.ec2.internal
Namespace:    openshift-kube-scheduler
Labels:       <none>
Annotations:  <none>
API Version:  metrics.k8s.io/v1beta1
Containers:
  Name:  wait-for-host-port
  Usage:
    Memory:  0
  Name:      scheduler
  Usage:
    Cpu:     8m
    Memory:  45440Ki
Kind:        PodMetrics
Metadata:
  Creation Timestamp:  2019-05-23T18:47:56Z
  Self Link:           /apis/metrics.k8s.io/v1beta1/namespaces/openshift-kube-scheduler/pods/openshift-kube-scheduler-ip-10-0-135-131.ec2.internal
Timestamp:             2019-05-23T18:47:56Z
Window:                1m0s
Events:                <none>
Procedure

To view the status conditions on a pod, use the following command with the name of the pod:

$ oc describe hpa <pod-name>

For example:

$ oc describe hpa cm-test

The conditions appear in the Conditions field in the output.

Name:                           cm-test
Namespace:                      prom
Labels:                         <none>
Annotations:                    <none>
CreationTimestamp:              Fri, 16 Jun 2017 18:09:22 +0000
Reference:                      ReplicationController/cm-test
Metrics:                        ( current / target )
  "http_requests" on pods:      66m / 500m
Min replicas:                   1
Max replicas:                   4
ReplicationController pods:     1 current / 1 desired
Conditions: (1)
  Type              Status    Reason              Message
  ----              ------    ------              -------
  AbleToScale       True      ReadyForNewScale    the last scale time was sufficiently old as to warrant a new scale
  ScalingActive     True      ValidMetricFound    the HPA was able to successfully calculate a replica count from pods metric http_request
  ScalingLimited    False     DesiredWithinRange  the desired replica count is within the acceptable range

Additional resources

For more information on replication controllers and deployment controllers, see Understanding Deployments and DeploymentConfigs.