After OpenShift Container Platform is installed, core platform monitoring components immediately begin collecting metrics, which you can query and view. The default in-cluster monitoring stack includes the core platform Prometheus instance that collects metrics from your cluster and the core Alertmanager instance that routes alerts, among other components. Depending on who will use the monitoring stack and for what purposes, as a cluster administrator, you can further configure these monitoring components to suit the needs of different users in various scenarios.
After OpenShift Container Platform is installed, cluster administrators typically configure core platform monitoring to suit their needs. These activities include setting up storage and configuring options for Prometheus, Alertmanager, and other monitoring components.
By default, in a newly installed OpenShift Container Platform system, users can query and view collected metrics. You need only configure an alert receiver if you want users to receive alert notifications. Any other configuration options listed here are optional. |
Create the cluster-monitoring-config
ConfigMap
object if it does not exist.
Configure notifications for default platform alerts so that Alertmanager can send alerts to an external notification system such as email, Slack, or PagerDuty.
For shorter term data retention, configure persistent storage for Prometheus and Alertmanager to store metrics and alert data. Specify the metrics data retention parameters for Prometheus and Thanos Ruler.
|
For longer term data retention, configure the remote write feature to enable Prometheus to send ingested metrics to remote systems for storage.
Be sure to add cluster ID labels to metrics for use with your remote write storage configuration. |
Grant monitoring cluster roles to any non-administrator users that need to access certain monitoring features.
Assign tolerations to monitoring stack components so that administrators can move them to tainted nodes.
Set the body size limit for metrics collection to help avoid situations in which Prometheus consumes excessive amounts of memory when scraped targets return a response that contains a large amount of data.
Modify or create alerting rules for your cluster. These rules specify the conditions that trigger alerts, such as high CPU or memory usage, network latency, and so forth.
Specify resource limits and requests for monitoring components to ensure that the containers that run monitoring components have enough CPU and memory resources.
With the monitoring stack configured to suit your needs, Prometheus collects metrics from the specified services and stores these metrics according to your settings. You can go to the Observe pages in the OpenShift Container Platform web console to view and query collected metrics, manage alerts, identify performance bottlenecks, and scale resources as needed:
View dashboards to visualize collected metrics, troubleshoot alerts, and monitor other information about your cluster.
Query collected metrics by creating PromQL queries or using predefined queries.