OpenShift Container Platform uses Fluentd to collect operations and application logs from your cluster and enriches the data with Kubernetes Pod and Namespace metadata.

You can configure log location, use an external log aggregator, and make other configurations for the log collector.

You must set cluster logging to Unmanaged state before performing these configurations, unless otherwise noted. For more information, see Changing the cluster logging management state. Operators in an unmanaged state are unsupported and the cluster administrator assumes full control of the individual component configurations and upgrades. For more information, see Support policy for unmanaged Operators.

Viewing logging collector pods

You can use the oc get pods --all-namespaces -o wide command to see the nodes where the Fluentd are deployed.

Procedure

Run the following command in the openshift-logging project:

$ oc get pods --all-namespaces -o wide | grep fluentd

NAME                         READY     STATUS    RESTARTS   AGE     IP            NODE                           NOMINATED NODE   READINESS GATES
fluentd-5mr28                1/1       Running   0          4m56s   10.129.2.12   ip-10-0-164-233.ec2.internal   <none>           <none>
fluentd-cnc4c                1/1       Running   0          4m56s   10.128.2.13   ip-10-0-155-142.ec2.internal   <none>           <none>
fluentd-nlp8z                1/1       Running   0          4m56s   10.131.0.13   ip-10-0-138-77.ec2.internal    <none>           <none>
fluentd-rknlk                1/1       Running   0          4m56s   10.128.0.33   ip-10-0-128-130.ec2.internal   <none>           <none>
fluentd-rsm49                1/1       Running   0          4m56s   10.129.0.37   ip-10-0-163-191.ec2.internal   <none>           <none>
fluentd-wjt8s                1/1       Running   0          4m56s   10.130.0.42   ip-10-0-156-251.ec2.internal   <none>           <none>

Configure log collector CPU and memory limits

The log collector allows for adjustments to both the CPU and memory limits.

Procedure
  1. Edit the Cluster Logging Custom Resource (CR) in the openshift-logging project:

    $ oc edit ClusterLogging instance
    $ oc edit ClusterLogging instance
    
    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      name: "instance"
    
    ....
    
    spec:
      collection:
        logs:
          fluentd:
            resources:
              limits: (1)
                cpu: 250m
                memory: 1Gi
              requests:
                cpu: 250m
                memory: 1Gi
    1 Specify the CPU and memory limits and requests as needed. The values shown are the default values.

Configuring the collected log location

The log collector writes logs to a specified file or to the default location, /var/log/fluentd/fluentd.log based on the LOGGING_FILE_PATH environment variable.

Prerequisite
  • Set cluster logging to the unmanaged state. Operators in an unmanaged state are unsupported and the cluster administrator assumes full control of the individual component configurations and upgrades.

Procedure

To set the output location for the Fluentd logs:

  1. Edit the LOGGING_FILE_PATH parameter in the fluentd daemonset. You can specify a particular file or console:

    spec:
      template:
        spec:
          containers:
              env:
                - name: LOGGING_FILE_PATH
                  value: console (1)
    1 Specify the log output method:
    • use console to use the Fluentd default location. Retrieve the logs with the oc logs [-f] <pod_name> command.

    • use <path-to-log/fluentd.log> to send the log output to the specified file. Retrieve the logs with the oc exec <pod_name> — logs command. This is the default setting.

      Or, use the CLI:

      $ oc -n openshift-logging set env daemonset/fluentd LOGGING_FILE_PATH=/logs/fluentd.log

Understanding Buffer Chunk Limiting for Fluentd

If the Fluentd logger is unable to keep up with a high number of logs, it will need to switch to file buffering to reduce memory usage and prevent data loss.

Fluentd file buffering stores records in chunks. Chunks are stored in buffers.

To modify the FILE_BUFFER_LIMIT or BUFFER_SIZE_LIMIT parameters in the Fluentd daemonset as described below, you must set cluster logging to the unmanaged state. Operators in an unmanaged state are unsupported and the cluster administrator assumes full control of the individual component configurations and upgrades.

The Fluentd buffer_chunk_limit is determined by the environment variable BUFFER_SIZE_LIMIT, which has the default value 8m. The file buffer size per output is determined by the environment variable FILE_BUFFER_LIMIT, which has the default value 256Mi. The permanent volume size must be larger than FILE_BUFFER_LIMIT multiplied by the output.

On the Fluentd pods, permanent volume /var/lib/fluentd should be prepared by the PVC or hostmount, for example. That area is then used for the file buffers.

The buffer_type and buffer_path are configured in the Fluentd configuration files as follows:

$ egrep "buffer_type|buffer_path" *.conf
output-es-config.conf:
  buffer_type file
  buffer_path `/var/lib/fluentd/buffer-output-es-config`
output-es-ops-config.conf:
  buffer_type file
  buffer_path `/var/lib/fluentd/buffer-output-es-ops-config`

The Fluentd buffer_queue_limit is the value of the variable BUFFER_QUEUE_LIMIT. This value is 32 by default.

The environment variable BUFFER_QUEUE_LIMIT is calculated as (FILE_BUFFER_LIMIT / (number_of_outputs * BUFFER_SIZE_LIMIT)).

If the BUFFER_QUEUE_LIMIT variable has the default set of values:

  • FILE_BUFFER_LIMIT = 256Mi

  • number_of_outputs = 1

  • BUFFER_SIZE_LIMIT = 8Mi

The value of buffer_queue_limit will be 32. To change the buffer_queue_limit, you must change the value of FILE_BUFFER_LIMIT.

In this formula, number_of_outputs is 1 if all the logs are sent to a single resource, and it is incremented by 1 for each additional resource. For example, the value of number_of_outputs is:

  • 1 - if all logs are sent to a single Elasticsearch pod

  • 2 - if application logs are sent to an Elasticsearch pod and ops logs are sent to another Elasticsearch pod

  • 4 - if application logs are sent to an Elasticsearch pod, ops logs are sent to another Elasticsearch pod, and both of them are forwarded to other Fluentd instances

Configuring the logging collector using environment variables

You can use environment variables to modify the configuration of the Fluentd log collector.

See the Fluentd README in Github for lists of the available environment variables.

Prerequisite
  • Set cluster logging to the unmanaged state. Operators in an unmanaged state are unsupported and the cluster administrator assumes full control of the individual component configurations and upgrades.

Procedure

Set any of the Fluentd environment variables as needed:

oc set env ds/fluentd <env-var>=<value>

For example:

oc set env ds/fluentd LOGGING_FILE_AGE=30

About logging collector alerts

The following alerts are generated by the logging collector and can be viewed on the Alerts tab of the Prometheus UI.

All the logging collector alerts are listed on the MonitoringAlerts page of the OpenShift Container Platform web console. Alerts are in one of the following states:

  • Firing. The alert condition is true for the duration of the timeout. Click the Options menu at the end of the firing alert to view more information or silence the alert.

  • Pending The alert condition is currently true, but the timeout has not been reached.

  • Not Firing. The alert is not currently triggered.

Table 1. Fluentd Prometheus alerts
Alert Message Description Severity

FluentdErrorsHigh

In the last minute, <value> errors reported by fluentd <instance>.

Fluentd is reporting a higher number of issues than the specified number, default 10.

Critical

FluentdNodeDown

Prometheus could not scrape fluentd <instance> for more than 10m.

Fluentd is reporting that Prometheus could not scrape a specific Fluentd instance.

Critical

FluentdQueueLengthBurst

In the last minute, fluentd <instance> buffer queue length increased more than 32. Current value is <value>.

Fluentd is reporting that it is overwhelmed.

Warning

FluentdQueueLengthIncreasing

In the last 12h, fluentd <instance> buffer queue length constantly increased more than 1. Current value is <value>.

Fluentd is reporting queue usage issues.

Critical