$ oc label nodes <node_name> <node_label> (1)
You can configure the monitoring stack to optimize the performance and scale of your clusters. The following documentation provides information about how to distribute the monitoring components and control the impact of the monitoring stack on CPU and memory resources.
You can move the monitoring stack components to specific nodes:
Use the nodeSelector
constraint with labeled nodes to move any of the monitoring stack components to specific nodes.
Assign tolerations to enable moving components to tainted nodes.
By doing so, you control the placement and distribution of the monitoring components across a cluster.
By controlling placement and distribution of monitoring components, you can optimize system resource use, improve performance, and separate workloads based on specific requirements or policies.
To specify the nodes in your cluster on which monitoring stack components will run, configure the nodeSelector
constraint for the components in the cluster-monitoring-config
config map to match labels assigned to the nodes.
You cannot add a node selector constraint directly to an existing scheduled pod. |
You have access to the cluster as a user with the cluster-admin
cluster role.
You have created the cluster-monitoring-config
ConfigMap
object.
You have installed the OpenShift CLI (oc
).
If you have not done so yet, add a label to the nodes on which you want to run the monitoring components:
$ oc label nodes <node_name> <node_label> (1)
1 | Replace <node_name> with the name of the node where you want to add the label.
Replace <node_label> with the name of the wanted label. |
Edit the cluster-monitoring-config
ConfigMap
object in the openshift-monitoring
project:
$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
Specify the node labels for the nodeSelector
constraint for the component under data/config.yaml
:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
# ...
<component>: (1)
nodeSelector:
<node_label_1> (2)
<node_label_2> (3)
# ...
1 | Substitute <component> with the appropriate monitoring stack component name. |
2 | Substitute <node_label_1> with the label you added to the node. |
3 | Optional: Specify additional labels. If you specify additional labels, the pods for the component are only scheduled on the nodes that contain all of the specified labels. |
If monitoring components remain in a |
Save the file to apply the changes. The components specified in the new configuration are automatically moved to the new nodes, and the pods affected by the new configuration are redeployed.
You can assign tolerations to any of the monitoring stack components to enable moving them to tainted nodes.
You have access to the cluster as a user with the cluster-admin
cluster role.
You have created the cluster-monitoring-config
ConfigMap
object.
You have installed the OpenShift CLI (oc
).
Edit the cluster-monitoring-config
config map in the openshift-monitoring
project:
$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
Specify tolerations
for the component:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
<component>:
tolerations:
<toleration_specification>
Substitute <component>
and <toleration_specification>
accordingly.
For example, oc adm taint nodes node1 key1=value1:NoSchedule
adds a taint to node1
with the key key1
and the value value1
. This prevents monitoring components from deploying pods on node1
unless a toleration is configured for that taint. The following example configures the alertmanagerMain
component to tolerate the example taint:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
alertmanagerMain:
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"
Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.
By default, no limit exists for the uncompressed body size for data returned from scraped metrics targets. You can set a body size limit to help avoid situations in which Prometheus consumes excessive amounts of memory when scraped targets return a response that contains a large amount of data. In addition, by setting a body size limit, you can reduce the impact that a malicious target might have on Prometheus and on the cluster as a whole.
After you set a value for enforcedBodySizeLimit
, the alert PrometheusScrapeBodySizeLimitHit
fires when at least one Prometheus scrape target replies with a response body larger than the configured value.
If metrics data scraped from a target has an uncompressed body size exceeding the configured size limit, the scrape fails.
Prometheus then considers this target to be down and sets its |
You have access to the cluster as a user with the cluster-admin
cluster role.
You have installed the OpenShift CLI (oc
).
Edit the cluster-monitoring-config
ConfigMap
object in the openshift-monitoring
namespace:
$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
Add a value for enforcedBodySizeLimit
to data/config.yaml/prometheusK8s
to limit the body size that can be accepted per target scrape:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |-
prometheusK8s:
enforcedBodySizeLimit: 40MB (1)
1 | Specify the maximum body size for scraped metrics targets.
This enforcedBodySizeLimit example limits the uncompressed size per target scrape to 40 megabytes.
Valid numeric values use the Prometheus data size format: B (bytes), KB (kilobytes), MB (megabytes), GB (gigabytes), TB (terabytes), PB (petabytes), and EB (exabytes).
The default value is 0 , which specifies no limit.
You can also set the value to automatic to calculate the limit automatically based on cluster capacity. |
Save the file to apply the changes. The new configuration is applied automatically.
scrape_config configuration (Prometheus documentation)
You can ensure that the containers that run monitoring components have enough CPU and memory resources by specifying values for resource limits and requests for those components.
You can configure these limits and requests for core platform monitoring components in the openshift-monitoring
namespace.
To configure CPU and memory resources, specify values for resource limits and requests in the cluster-monitoring-config
ConfigMap
object in the openshift-monitoring
namespace.
You have access to the cluster as a user with the cluster-admin
cluster role.
You have created the ConfigMap
object named cluster-monitoring-config
.
You have installed the OpenShift CLI (oc
).
Edit the cluster-monitoring-config
config map in the openshift-monitoring
project:
$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
Add values to define resource limits and requests for each component you want to configure.
Ensure that the value set for a limit is always higher than the value set for a request. Otherwise, an error will occur, and the container will not run. |
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
alertmanagerMain:
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 200m
memory: 500Mi
prometheusK8s:
resources:
limits:
cpu: 500m
memory: 3Gi
requests:
cpu: 200m
memory: 500Mi
thanosQuerier:
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 200m
memory: 500Mi
prometheusOperator:
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 200m
memory: 500Mi
metricsServer:
resources:
requests:
cpu: 10m
memory: 50Mi
limits:
cpu: 50m
memory: 500Mi
kubeStateMetrics:
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 200m
memory: 500Mi
telemeterClient:
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 200m
memory: 500Mi
openshiftStateMetrics:
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 200m
memory: 500Mi
nodeExporter:
resources:
limits:
cpu: 50m
memory: 150Mi
requests:
cpu: 20m
memory: 50Mi
monitoringPlugin:
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 200m
memory: 500Mi
prometheusOperatorAdmissionWebhook:
resources:
limits:
cpu: 50m
memory: 100Mi
requests:
cpu: 20m
memory: 50Mi
Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.
Kubernetes requests and limits documentation (Kubernetes documentation)
Metrics collection profile is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process. For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope. |
To choose a metrics collection profile for core OpenShift Container Platform monitoring components, edit the cluster-monitoring-config
ConfigMap
object.
You have installed the OpenShift CLI (oc
).
You have enabled Technology Preview features by using the FeatureGate
custom resource (CR).
You have created the cluster-monitoring-config
ConfigMap
object.
You have access to the cluster as a user with the cluster-admin
cluster role.
Edit the cluster-monitoring-config
ConfigMap
object in the openshift-monitoring
project:
$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
Add the metrics collection profile setting under data/config.yaml/prometheusK8s
:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
prometheusK8s:
collectionProfile: <metrics_collection_profile_name> (1)
1 | The name of the metrics collection profile.
The available values are full or minimal .
If you do not specify a value or if the collectionProfile key name does not exist in the config map, the default setting of full is used. |
The following example sets the metrics collection profile to minimal
for the core platform instance of Prometheus:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
prometheusK8s:
collectionProfile: minimal
Save the file to apply the changes. The new configuration is applied automatically.
You can configure pod topology spread constraints for all the pods deployed by the Cluster Monitoring Operator to control how pod replicas are scheduled to nodes across zones. This ensures that the pods are highly available and run more efficiently, because workloads are spread across nodes in different data centers or hierarchical infrastructure zones.
You can configure pod topology spread constraints for monitoring pods by using the cluster-monitoring-config
config map.
You have access to the cluster as a user with the cluster-admin
cluster role.
You have created the cluster-monitoring-config
ConfigMap
object.
You have installed the OpenShift CLI (oc
).
Edit the cluster-monitoring-config
config map in the openshift-monitoring
project:
$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
Add the following settings under the data/config.yaml
field to configure pod topology spread constraints:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
<component>: (1)
topologySpreadConstraints:
- maxSkew: <n> (2)
topologyKey: <key> (3)
whenUnsatisfiable: <value> (4)
labelSelector: (5)
<match_option>
1 | Specify a name of the component for which you want to set up pod topology spread constraints. |
2 | Specify a numeric value for maxSkew , which defines the degree to which pods are allowed to be unevenly distributed. |
3 | Specify a key of node labels for topologyKey .
Nodes that have a label with this key and identical values are considered to be in the same topology.
The scheduler tries to put a balanced number of pods into each domain. |
4 | Specify a value for whenUnsatisfiable .
Available options are DoNotSchedule and ScheduleAnyway .
Specify DoNotSchedule if you want the maxSkew value to define the maximum difference allowed between the number of matching pods in the target topology and the global minimum.
Specify ScheduleAnyway if you want the scheduler to still schedule the pod but to give higher priority to nodes that might reduce the skew. |
5 | Specify labelSelector to find matching pods.
Pods that match this label selector are counted to determine the number of pods in their corresponding topology domain. |
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
prometheusK8s:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: monitoring
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app.kubernetes.io/name: prometheus
Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.