apiVersion: flows.netobserv.io/v1beta2
kind: FlowCollector
metadata:
name: cluster
spec:
processor:
metrics:
disableAlerts: [NetObservLokiError, NetObservNoFlows] (1)
You can use the web console to monitor alerts related to the health of the Network Observability Operator.
You can access metrics about health and resource usage of the Network Observability Operator from the Dashboards page in the web console. A health alert banner that directs you to the dashboard can appear on the Network Traffic and Home pages in the event that an alert is triggered. Alerts are generated in the following cases:
The NetObservLokiError
alert occurs if the flowlogs-pipeline
workload is dropping flows because of Loki errors, such as if the Loki ingestion rate limit has been reached.
The NetObservNoFlows
alert occurs if no flows are ingested for a certain amount of time.
You can also view metrics about the health of the Operator in the following categories:
Flows
Flows Overhead
Top flow rates per source and destination nodes
Top flow rates per source and destination namespaces
Top flow rates per source and destination workloads
Agents
Processor
Operator
You have the Network Observability Operator installed.
You have access to the cluster as a user with the cluster-admin
role or with view permissions for all projects.
From the Administrator perspective in the web console, navigate to Observe → Dashboards.
From the Dashboards dropdown, select Netobserv/Health.
View the metrics about the health of the Operator that are displayed on the page.
You can opt out of health alerting by editing the FlowCollector
resource:
In the web console, navigate to Operators → Installed Operators.
Under the Provided APIs heading for the NetObserv Operator, select Flow Collector.
Select cluster then select the YAML tab.
Add spec.processor.metrics.disableAlerts
to disable health alerts, as in the following YAML sample:
apiVersion: flows.netobserv.io/v1beta2
kind: FlowCollector
metadata:
name: cluster
spec:
processor:
metrics:
disableAlerts: [NetObservLokiError, NetObservNoFlows] (1)
1 | You can specify one or a list with both types of alerts to disable. |
You can create custom alerting rules for the Netobserv dashboard metrics to trigger alerts when Loki rate limits have been reached.
You have access to the cluster as a user with the cluster-admin role or with view permissions for all projects.
You have the Network Observability Operator installed.
Create a YAML file by clicking the import icon, +.
Add an alerting rule configuration to the YAML file. In the YAML sample that follows, an alert is created for when Loki rate limits have been reached:
apiVersion: monitoring.openshift.io/v1
kind: AlertingRule
metadata:
name: loki-alerts
namespace: openshift-monitoring
spec:
groups:
- name: LokiRateLimitAlerts
rules:
- alert: LokiTenantRateLimit
annotations:
message: |-
{{ $labels.job }} {{ $labels.route }} is experiencing 429 errors.
summary: "At any number of requests are responded with the rate limit error code."
expr: sum(irate(loki_request_duration_seconds_count{status_code="429"}[1m])) by (job, namespace, route) / sum(irate(loki_request_duration_seconds_count[1m])) by (job, namespace, route) * 100 > 0
for: 10s
labels:
severity: warning
Click Create to apply the configuration file to the cluster.
For more information about creating alerts that you can see on the dashboard, see Creating alerting rules for user-defined projects.