You can use the Knative Serving - Scaling Debugging dashboard to examine detailed and visualized data for Knative Serving autoscaling. The dashboard is useful for several purposes:

  • Troubleshooting your autoscaled workloads

  • Improving understanding of how autoscaling works

  • Determining why an application was autoscaled

  • Evaluating resource footprint of an application, such as number of pods

Currently, this dashboard only supports the Knative pod autoscaler (KPA). It does not support the horizontal pod autoscaler (HPA).

The dashboard demonstrations in this section use an OpenShift Dedicated cluster with the autoscale-go sample application installed. The load is generated using the hey load generator.

The sample application has a concurrency limit of 5 requests. When the limit is exceeded, autoscaling requests additional pods for Knative from Kubernetes.

Navigating to the autoscaling dashboard

You can use the following steps to navigate to the autoscaling dashboard in the OpenShift Dedicated web console.

  • You have logged in to the OpenShift Dedicated web console.

  • You have installed the OpenShift Serverless Operator and Knative Serving.

  1. In the Developer perspective, navigate to the Monitoring → Dashboards page.

  2. In the Dashboard field, select the Knative Serving - Scaling Debugging dashboard.

  3. Use the Namespace, Configuration, and Revision fields to specify the workload you want to examine.

Pod information

The top of the Knative Serving - Scaling Debugging dashboard shows the counts of the requested pods, as well as of the pods in various stages of deployment. The Revision Pod Counts (Timeline) graph shows the same data visualized on the timeline. This information might be useful for general assessment of autoscaling by checking for problems with pod allocation.

Pod information

Observed concurrency

The Observed Concurrency graph shows the timeline of a set of concurrency-related metrics, including:

  • request concurrency

  • panic concurrency

  • target concurrency

  • excess burst capacity

Note that ExcessBurstCapacity is a negative number, -200 by default, that increases when a bursty load appears. It is equal to the difference between spare capacity and the configured target burst capacity. If ExcessBurstCapacity is negative, then the activator is threaded in the request path by the PodAutoscaler controller.

serverless autoscaling dashboard observed concurrency

Scrape time

The Scrape Time graph shows the timeline of scrape times for each revision. Since autoscaling makes scaling decisions based on the metrics coming from service pods, high scrape times might cause delays in autoscaling when workload changes.

Scrape time

Panic mode

The Panic Mode graph shows the timeline of times when the Knative service faces a bursty load, which causes the autoscaler to quickly adapt the service pod number.

Panic mode

Activator metrics

The Activator graphs Request Concurrency, Request Count by Response Code (last minute), and Response Time (last minute) show the timeline of requests going through the activator until the activator is removed from the request path. These graphs can be used, for example, to evaluate whether response count and the returned HTTP codes match expectations.

Activator: Request Concurrency
Activator: Request Count by Response Code (last minute)
Activator: Response Time (last minute)

Requests per second

For requests-per-second (RPS) services, an additional Observed RPS dashboard is available, which visualizes different types of requests per second:

  • stable_requests_per_second

  • panic_requests_per_second

  • target_requests_per_second