You can use the Knative Serving - Scaling Debugging dashboard to examine detailed and visualized data for Knative Serving autoscaling. The dashboard is useful for several purposes:
Troubleshooting your autoscaled workloads
Improving understanding of how autoscaling works
Determining why an application was autoscaled
Evaluating resource footprint of an application, such as number of pods
Currently, this dashboard only supports the Knative pod autoscaler (KPA). It does not support the horizontal pod autoscaler (HPA). |
The dashboard demonstrations in this section use an OpenShift Dedicated cluster with the autoscale-go
sample application installed. The load is generated using the hey
load generator.
The sample application has a concurrency limit of 5 requests. When the limit is exceeded, autoscaling requests additional pods for Knative from Kubernetes.
You can use the following steps to navigate to the autoscaling dashboard in the OpenShift Dedicated web console.
You have logged in to the OpenShift Dedicated web console.
You have installed the OpenShift Serverless Operator and Knative Serving.
In the Developer perspective, navigate to the Monitoring → Dashboards page.
In the Dashboard field, select the Knative Serving - Scaling Debugging dashboard.
Use the Namespace, Configuration, and Revision fields to specify the workload you want to examine.
The top of the Knative Serving - Scaling Debugging dashboard shows the counts of the requested pods, as well as of the pods in various stages of deployment. The Revision Pod Counts (Timeline) graph shows the same data visualized on the timeline. This information might be useful for general assessment of autoscaling by checking for problems with pod allocation.
The Observed Concurrency graph shows the timeline of a set of concurrency-related metrics, including:
request concurrency
panic concurrency
target concurrency
excess burst capacity
Note that ExcessBurstCapacity
is a negative number, -200
by default, that increases when a bursty load appears. It is equal to the difference between spare capacity and the configured target burst capacity. If ExcessBurstCapacity
is negative, then the activator is threaded in the request path by the PodAutoscaler
controller.
The Scrape Time graph shows the timeline of scrape times for each revision. Since autoscaling makes scaling decisions based on the metrics coming from service pods, high scrape times might cause delays in autoscaling when workload changes.
The Panic Mode graph shows the timeline of times when the Knative service faces a bursty load, which causes the autoscaler to quickly adapt the service pod number.
The Activator graphs Request Concurrency, Request Count by Response Code (last minute), and Response Time (last minute) show the timeline of requests going through the activator until the activator is removed from the request path. These graphs can be used, for example, to evaluate whether response count and the returned HTTP codes match expectations.