OpenShift Container Platform uses Fluentd to collect operations and application logs from your cluster and enriches the data with Kubernetes Pod and Namespace metadata.

You can configure the CPU and memory limits for the log collector and move the log collector pods to specific nodes. All supported modifications to the log collector can be performed though the spec.collection.log.fluentd stanza in the Cluster Logging Custom Resource CR.

About unsupported configurations

The supported way of configuring cluster logging is by configuring it using the options described in this documentation. Do not use other configurations, as they are unsupported. Configuration paradigms might change across OpenShift Container Platform releases, and such cases can only be handled gracefully if all configuration possibilities are controlled. If you use configurations other than those described in this documentation, your changes will disappear because the Elasticsearch Operator and Cluster Logging Operator reconcile any differences. The Operators reverse everything to the defined state by default and by design.

If you must perform configurations not described in the OpenShift Container Platform documentation, you must set your Cluster Logging Operator or Elasticsearch Operator to Unmanaged. An unmanaged cluster logging environment is not supported and does not receive updates until you return cluster logging to Managed.

Viewing logging collector pods

You can use the oc get pods --all-namespaces -o wide command to see the nodes where the Fluentd are deployed.

Procedure

Run the following command in the openshift-logging project:

$oc get pods --selector component=fluentd -o wide -n openshift-logging

NAME           READY  STATUS    RESTARTS   AGE     IP            NODE                  NOMINATED NODE   READINESS GATES
fluentd-8d69v  1/1    Running   0          134m    10.130.2.30   master1.example.com   <none>           <none>
fluentd-bd225  1/1    Running   0          134m    10.131.1.11   master2.example.com   <none>           <none>
fluentd-cvrzs  1/1    Running   0          134m    10.130.0.21   master3.example.com   <none>           <none>
fluentd-gpqg2  1/1    Running   0          134m    10.128.2.27   worker1.example.com   <none>           <none>
fluentd-l9j7j  1/1    Running   0          134m    10.129.2.31   worker2.example.com   <none>           <none>

Configure log collector CPU and memory limits

The log collector allows for adjustments to both the CPU and memory limits.

Procedure
  1. Edit the Cluster Logging Custom Resource (CR) in the openshift-logging project:

    $ oc edit ClusterLogging instance
    $ oc edit ClusterLogging instance
    
    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      name: "instance"
    
    ....
    
    spec:
      collection:
        logs:
          fluentd:
            resources:
              limits: (1)
                cpu: 250m
                memory: 1Gi
              requests:
                cpu: 250m
                memory: 1Gi
    1 Specify the CPU and memory limits and requests as needed. The values shown are the default values.

About logging collector alerts

The following alerts are generated by the logging collector. You can view these alerts in the OpenShift Container Platform web console, on the Alerts page of the Alerting UI.

Table 1. Fluentd Prometheus alerts
Alert Message Description Severity

FluentdErrorsHigh

In the last minute, <value> errors reported by fluentd <instance>.

Fluentd is reporting a higher number of issues than the specified number, default 10.

Critical

FluentdNodeDown

Prometheus could not scrape fluentd <instance> for more than 10m.

Fluentd is reporting that Prometheus could not scrape a specific Fluentd instance.

Critical

FluentdQueueLengthBurst

In the last minute, fluentd <instance> buffer queue length increased more than 32. Current value is <value>.

Fluentd is reporting that it is overwhelmed.

Warning

FluentdQueueLengthIncreasing

In the last 12h, fluentd <instance> buffer queue length constantly increased more than 1. Current value is <value>.

Fluentd is reporting queue usage issues.

Critical