OpenShift Container Platform uses Fluentd to collect operations and application logs from your cluster and enriches the data with Kubernetes Pod and Namespace metadata.

You can configure log location, use an external log aggregator, and make other configurations for the log collector.

You must set cluster logging to Unmanaged state before performing these configurations, unless otherwise noted. For more information, see Changing the cluster logging management state. Operators in an unmanaged state are unsupported and the cluster administrator assumes full control of the individual component configurations and upgrades. For more information, see Support policy for unmanaged Operators.

Viewing logging collector pods

You can use the oc get pods --all-namespaces -o wide command to see the nodes where the Fluentd are deployed.

Procedure

Run the following command in the openshift-logging project:

$ oc get pods --all-namespaces -o wide | grep fluentd

NAME                         READY     STATUS    RESTARTS   AGE     IP            NODE                           NOMINATED NODE   READINESS GATES
fluentd-5mr28                1/1       Running   0          4m56s   10.129.2.12   ip-10-0-164-233.ec2.internal   <none>           <none>
fluentd-cnc4c                1/1       Running   0          4m56s   10.128.2.13   ip-10-0-155-142.ec2.internal   <none>           <none>
fluentd-nlp8z                1/1       Running   0          4m56s   10.131.0.13   ip-10-0-138-77.ec2.internal    <none>           <none>
fluentd-rknlk                1/1       Running   0          4m56s   10.128.0.33   ip-10-0-128-130.ec2.internal   <none>           <none>
fluentd-rsm49                1/1       Running   0          4m56s   10.129.0.37   ip-10-0-163-191.ec2.internal   <none>           <none>
fluentd-wjt8s                1/1       Running   0          4m56s   10.130.0.42   ip-10-0-156-251.ec2.internal   <none>           <none>

Configure log collector CPU and memory limits

The log collector allows for adjustments to both the CPU and memory limits.

Procedure
  1. Edit the Cluster Logging Custom Resource (CR) in the openshift-logging project:

    $ oc edit ClusterLogging instance
    $ oc edit ClusterLogging instance
    
    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      name: "instance"
    
    ....
    
    spec:
      collection:
        logs:
          fluentd:
            resources:
              limits: (1)
                cpu: 250m
                memory: 1Gi
              requests:
                cpu: 250m
                memory: 1Gi
    1 Specify the CPU and memory limits and requests as needed. The values shown are the default values.

Configuring Buffer Chunk Limiting for Fluentd

If the Fluentd log collector is unable to keep up with a high number of logs, Fluentd performs file buffering to reduce memory usage and prevent data loss.

Fluentd file buffering stores records in chunks. Chunks are stored in buffers.

You can tune file buffering in your cluster by editing environment variables in the Fluentd Daemonset:

To modify the FILE_BUFFER_LIMIT or BUFFER_SIZE_LIMIT parameters in the Fluentd Daemonset, you must set cluster logging to the unmanaged state. Operators in an unmanaged state are unsupported and the cluster administrator assumes full control of the individual component configurations and upgrades.

  • BUFFER_SIZE_LIMIT. This parameter determines the maximum size of each chunk file before Fluentd creates a new chunk. The default is 8M. This parameter sets the Fluentd chunk_limit_size variable.

    A high BUFFER_SIZE_LIMIT can collect more records per chunk file. However, bigger records take longer to be sent to the logstore.

  • FILE_BUFFER_LIMIT. This parameter determines the file buffer size per logging output. This value is only a request based on the available space on the node where a Fluentd pod is scheduled. OpenShift Container Platform does not allow Fluentd to exceed the node capacity. The default is 256Mi.

    A high FILE_BUFFER_LIMIT could translate to a higher BUFFER_QUEUE_LIMIT based the number of outputs. However, if the node’s space is under pressure, Fluentd can fail.

    By default, the number_of_outputs is 1 if all the logs are sent to a single resource, and is incremented by 1 for each additional resource. You might have multiple outputs if you use the Log Forwarding API, the Fluentd Forward protocol, or syslog protocol to forward logs to external locations.

    The permanent volume size must be larger than FILE_BUFFER_LIMIT multiplied by the number of outputs.

  • BUFFER_QUEUE_LIMIT. This parameter is the maximum number of buffer chunks allowed. The BUFFER_QUEUE_LIMIT parameter is not directly tunable. OpenShift Container Platform calculates this value based on the number of logging outputs, the chunk size, and the filesystem space available. The default is 32 chunks. To change the BUFFER_QUEUE_LIMIT, you must change the value of FILE_BUFFER_LIMIT. The BUFFER_QUEUE_LIMIT parameter sets the Fluentd queue_limit_length parameter.

    OpenShift Container Platform calculates the BUFFER_QUEUE_LIMIT as (FILE_BUFFER_LIMIT / (number_of_outputs * BUFFER_SIZE_LIMIT)).

    Using the default set of values, the value of BUFFER_QUEUE_LIMIT is 32:

    • FILE_BUFFER_LIMIT = 256Mi

    • number_of_outputs = 1

    • BUFFER_SIZE_LIMIT = 8Mi

OpenShift Container Platform uses the Fluentd file buffer plug-in to configure how the chunks are stored. You can see the location of the buffer file using the following command:

$ oc get cm fluentd -o json | jq -r '.data."fluent.conf"'
<buffer>
   @type file (1)
   path '/var/lib/flunetd/retry-elasticseach' (2)
1 The Fluentd file buffer plugin. Do not change this value.
2 The path where buffer chunks are stored.
Prerequisite
  • Set cluster logging to the unmanaged state. Operators in an unmanaged state are unsupported and the cluster administrator assumes full control of the individual component configurations and upgrades.

Procedure

To configure Buffer Chunk Limiting:

  1. Edit either of the following parameters in the fluentd Daemonset.

    spec:
      template:
        spec:
          containers:
              env:
              - name: FILE_BUFFER_LIMIT (1)
                value: "256"
              - name: BUFFER_SIZE_LIMIT (2)
                value: 8Mi
    1 Specify the Fluentd file buffer size per output.
    2 Specify the maximum size of each Fluentd buffer chunk.

Configuring the logging collector using environment variables

You can use environment variables to modify the configuration of the Fluentd log collector.

See the Fluentd README in Github for lists of the available environment variables.

Prerequisite
  • Set cluster logging to the unmanaged state. Operators in an unmanaged state are unsupported and the cluster administrator assumes full control of the individual component configurations and upgrades.

Procedure

Set any of the Fluentd environment variables as needed:

oc set env ds/fluentd <env-var>=<value>

For example:

oc set env ds/fluentd LOGGING_FILE_AGE=30

About logging collector alerts

The following alerts are generated by the logging collector and can be viewed on the Alerts tab of the Prometheus UI.

All the logging collector alerts are listed on the MonitoringAlerts page of the OpenShift Container Platform web console. Alerts are in one of the following states:

  • Firing. The alert condition is true for the duration of the timeout. Click the Options menu at the end of the firing alert to view more information or silence the alert.

  • Pending The alert condition is currently true, but the timeout has not been reached.

  • Not Firing. The alert is not currently triggered.

Table 1. Fluentd Prometheus alerts
Alert Message Description Severity

FluentdErrorsHigh

In the last minute, <value> errors reported by fluentd <instance>.

Fluentd is reporting a higher number of issues than the specified number, default 10.

Critical

FluentdNodeDown

Prometheus could not scrape fluentd <instance> for more than 10m.

Fluentd is reporting that Prometheus could not scrape a specific Fluentd instance.

Critical

FluentdQueueLengthBurst

In the last minute, fluentd <instance> buffer queue length increased more than 32. Current value is <value>.

Fluentd is reporting that it is overwhelmed.

Warning

FluentdQueueLengthIncreasing

In the last 12h, fluentd <instance> buffer queue length constantly increased more than 1. Current value is <value>.

Fluentd is reporting queue usage issues.

Critical