OpenShift Container Platform uses Fluentd to collect operations and application logs from your cluster which OpenShift Container Platform enriches with Kubernetes Pod and Namespace metadata.

You can configure log rotation, log location, use an external log aggregator, and make other configurations for the log collector.

You must set cluster logging to Unmanaged state before performing these configurations, unless otherwise noted. For more information, see Changing the cluster logging management state.

Viewing logging collector pods

You can use the oc get pods -o wide command to see the nodes where the Fluentd are deployed.

Procedure

Run the following command in the openshift-logging project:

$ oc get pods -o wide | grep fluentd

NAME                         READY     STATUS    RESTARTS   AGE     IP            NODE                           NOMINATED NODE   READINESS GATES
fluentd-5mr28                1/1       Running   0          4m56s   10.129.2.12   ip-10-0-164-233.ec2.internal   <none>           <none>
fluentd-cnc4c                1/1       Running   0          4m56s   10.128.2.13   ip-10-0-155-142.ec2.internal   <none>           <none>
fluentd-nlp8z                1/1       Running   0          4m56s   10.131.0.13   ip-10-0-138-77.ec2.internal    <none>           <none>
fluentd-rknlk                1/1       Running   0          4m56s   10.128.0.33   ip-10-0-128-130.ec2.internal   <none>           <none>
fluentd-rsm49                1/1       Running   0          4m56s   10.129.0.37   ip-10-0-163-191.ec2.internal   <none>           <none>
fluentd-wjt8s                1/1       Running   0          4m56s   10.130.0.42   ip-10-0-156-251.ec2.internal   <none>           <none>

Configure log collector CPU and memory limits

The log collector allows for adjustments to both the CPU and memory limits.

Procedure
  1. Edit the Cluster Logging Custom Resource (CR) in the openshift-logging project:

    $ oc edit ClusterLogging instance
    $ oc edit ClusterLogging instance
    
    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      name: "instance"
    
    ....
    
    spec:
      collection:
        logs:
          fluentd:
            resources:
              limits: (1)
                cpu: 250m
                memory: 1Gi
              requests:
                cpu: 250m
                memory: 1Gi
    1 Specify the CPU and memory limits and requests as needed. The values shown are the default values.

Configuring the collected log location

The log collector writes logs to a specified file or to the default location, /var/log/fluentd/fluentd.log based on the LOGGING_FILE_PATH environment variable.

Prerequisite

Set cluster logging to the unmanaged state.

Procedure

To set the output location for the Fluentd logs:

  1. Edit the LOGGING_FILE_PATH parameter in the fluentd daemonset. You can specify a particular file or console:

    spec:
      template:
        spec:
          containers:
              env:
                - name: LOGGING_FILE_PATH
                  value: console (1)
    1 Specify the log output method:
    • use console to use the Fluentd default location. Retrieve the logs with the oc logs [-f] <pod_name> command.

    • use <path-to-log/fluentd.log> to sends the log output to the specified file. Retrieve the logs with the `oc exec <pod_name> — logs command. This is the default setting.

      Or, use the CLI:

      oc -n openshift-logging set env daemonset/fluentd LOGGING_FILE_PATH=/logs/fluentd.log

Throttling log collection

For projects that are especially verbose, an administrator can throttle down the rate at which the logs are read in by the log collector before being processed. By throttling, you deliberately slow down the rate at which you are reading logs, so Kibana might take longer to display records.

Throttling can contribute to log aggregation falling behind for the configured projects; log entries can be lost if a pod is deleted before Fluentd catches up.

Throttling does not work when using the systemd journal as the log source. The throttling implementation depends on being able to throttle the reading of the individual log files for each project. When reading from the journal, there is only a single log source, no log files, so no file-based throttling is available. There is not a method of restricting the log entries that are read into the Fluentd process.

Prerequisite

Set cluster logging to the unmanaged state.

Procedure
  1. To configure Fluentd to restrict specific projects, edit the throttle configuration in the Fluentd ConfigMap after deployment:

    $ oc edit configmap/fluentd

    The format of the throttle-config.yaml key is a YAML file that contains project names and the desired rate at which logs are read in on each node. The default is 1000 lines at a time per node. For example:

    throttle-config.yaml: |
      - opensift-logging:
          read_lines_limit: 10
      - .operations:
          read_lines_limit: 100

Configuring log collection JSON parsing

You can configure the Fluentd log collector to determine if a log message is in JSON format and merge the message into the JSON payload document posted to Elasticsearch. This feature is disabled by default.

You can enable or disable this feature by editing the MERGE_JSON_LOG environment variable in the fluentd daemonset.

Enabling this feature comes with risks, including:

  • Possible log loss due to Elasticsearch rejecting documents due to inconsistent type mappings.

  • Potential buffer storage leak caused by rejected message cycling.

  • Overwrite of data for field with same names.

The features in this topic should be used by only experienced Fluentd and Elasticsearch users.

Prerequisites

Set cluster logging to the unmanaged state.

Procedure

Use the following command to enable this feature:

oc set env ds/fluentd MERGE_JSON_LOG=true (1)
1 Set this to false to disable this feature or true to enable this feature.

Setting MERGE_JSON_LOG and CDM_UNDEFINED_TO_STRING

If you set the MERGE_JSON_LOG and CDM_UNDEFINED_TO_STRING enviroment variables to true, you might receive an Elasticsearch 400 error. The error occurs because when`MERGE_JSON_LOG=true`, Fluentd adds fields with data types other than string. When you set CDM_UNDEFINED_TO_STRING=true, Fluentd attempts to add those fields as a string value resulting in the Elasticsearch 400 error. The error clears when the indices roll over for the next day.

When Fluentd rolls over the indices for the next day’s logs, it will create a brand new index. The field definitions are updated and you will not get the 400 error.

Records that have hard errors, such as schema violations, corrupted data, and so forth, cannot be retried. The log collector sends the records for error handling. If you add a <label @ERROR> section to your Fluentd config, as the last <label>, you can handle these records as needed.

For example:

data:
  fluent.conf:

....

    <label @ERROR>
      <match **>
        @type file
        path /var/log/fluent/dlq
        time_slice_format %Y%m%d
        time_slice_wait 10m
        time_format %Y%m%dT%H%M%S%z
        compress gzip
      </match>
    </label>

This section writes error records to the Elasticsearch dead letter queue (DLQ) file. See the fluentd documentation for more information about the file output.

Then you can edit the file to clean up the records manually, edit the file to use with the Elasticsearch /_bulk index API and use cURL to add those records. For more information on Elasticsearch Bulk API, see the Elasticsearch documentation.

Configuring how the log collector normalizes logs

Cluster Logging uses a specific data model, like a database schema, to store log records and their metadata in the logging store. There are some restrictions on the data:

  • There must be a "message" field containing the actual log message.

  • There must be a "@timestamp" field containing the log record timestamp in RFC 3339 format, preferably millisecond or better resolution.

  • There must be a "level" field with the log level, such as err, info, unknown, and so forth.

For more information on the data model, see Exported Fields.

Because of these requirements, conflicts and inconsistencies can arise with log data collected from different subsystems.

For example, if you use the MERGE_JSON_LOG feature (MERGE_JSON_LOG=true), it can be extremely useful to have your applications log their output in JSON, and have the log collector automatically parse and index the data in Elasticsearch. However, this leads to several problems, including:

  • field names can be empty, or contain characters that are illegal in Elasticsearch;

  • different applications in the same namespace might output the same field name with different value data types;

  • applications might emit too many fields;

  • fields may conflict with the cluster logging built-in fields.

You can configure how cluster logging treats fields from disparate sources by editing the Fluentd log collector daemonset and setting environment variables in the table below.

  • Undefined fields. One of the problems with log data from disparate systems is that some fields might be unknown to the ViaQ data model. Such fields are called undefined. ViaQ requires all top-level fields to be defined and described.

    Use the parameters to configure how OpenShift Container Platform moves any undefined fields under a top-level field called undefined to avoid conflicting with the well known ViaQ top-level fields. You can add undefined fields to the top-level fields and move others to an undefined container.

    You can also replace special characters in undefined fields and convert undefined fields to their JSON string representation. Coverting to JSON string preserves the structure of the value, so that you can retrieve the value later and convert it back to a map or an array.

    • Simple scalar values like numbers and booleans are changed to a quoted string. For example: 10 becomes "10", 3.1415 becomes "3.1415", false becomes "false".

    • Map/dict values and array values are converted to their JSON string representation: "mapfield":{"key":"value"} becomes "mapfield":"{\"key\":\"value\"}" and "arrayfield":[1,2,"three"] becomes "arrayfield":"[1,2,\"three\"]".

  • Defined fields. You can also configure which defined fields appear in the top levels of the logs.

    The default top-level fields, defined through the CDM_DEFAULT_KEEP_FIELDS parameter, are CEE, time, @timestamp, aushape, ci_job, collectd, docker, fedora-ci, file, foreman, geoip, hostname, ipaddr4, ipaddr6, kubernetes, level, message, namespace_name, namespace_uuid, offset, openstack, ovirt, pid, pipeline_metadata, service, systemd, tags, testcase, tlog, viaq_msg_id.

    Any fields not included in ${CDM_DEFAULT_KEEP_FIELDS} or ${CDM_EXTRA_KEEP_FIELDS} are moved to ${CDM_UNDEFINED_NAME} if CDM_USE_UNDEFINED is true.

    The CDM_DEFAULT_KEEP_FIELDS parameter is for only advanced users, or if you are instructed to do so by Red Hat support.

  • Empty fields. You can determine which empty fields to retain from disparate logs.

Table 1. Environment parameters for log normalization
Parameters Definition Example

CDM_EXTRA_KEEP_FIELDS

Specify an extra set of defined fields to be kept at the top level of the logs in addition to the CDM_DEFAULT_KEEP_FIELDS. The default is "".

CDM_EXTRA_KEEP_FIELDS="broker"

CDM_KEEP_EMPTY_FIELDS

Specify fields to retain even if empty in CSV format. Empty defined fields not specified are dropped. The default is "message", keep empty messages.

CDM_KEEP_EMPTY_FIELDS="message"

CDM_USE_UNDEFINED

Set to true to move undefined fields to the undefined top level field. The default is false. If true, values in CDM_DEFAULT_KEEP_FIELDS and CDM_EXTRA_KEEP_FIELDS are not moved to undefined.

CDM_USE_UNDEFINED=true

CDM_UNDEFINED_NAME

Specify a name for the undefined top level field if using CDM_USE_UNDEFINED. The default is`undefined`. Enabled only when CDM_USE_UNDEFINED is true.

CDM_UNDEFINED_NAME="undef"

CDM_UNDEFINED_MAX_NUM_FIELDS

If the number of undefined fields is greater than this number, all undefined fields are converted to their JSON string representation and stored in the CDM_UNDEFINED_NAME field. If the record contains more than this value of undefined fields, no further processing takes place on these fields. Instead, the fields will be converted to a single string JSON value, stored in the top-level CDM_UNDEFINED_NAME field. Keeping the default of -1 allows for an unlimited number of undefined fields, which is not recommended.

NOTE: This parameter is honored even if CDM_USE_UNDEFINED is false.

CDM_UNDEFINED_MAX_NUM_FIELDS=4

CDM_UNDEFINED_TO_STRING

Set to true to convert all undefined fields to their JSON string representation. The default is false.

CDM_UNDEFINED_TO_STRING=true

CDM_UNDEFINED_DOT_REPLACE_CHAR

Specify a character to use in place of a dot character '.' in an undefined field. MERGE_JSON_LOG must be true. The default is UNUSED. If you set the MERGE_JSON_LOG parameter to true, see the Note below.

CDM_UNDEFINED_DOT_REPLACE_CHAR="_"

If you set the MERGE_JSON_LOG parameter in the Fluentd log collector daemonset and CDM_UNDEFINED_TO_STRING environment variables to true, you might receive an Elasticsearch 400 error. The error occurs because when`MERGE_JSON_LOG=true`, the log collector adds fields with data types other than string. When you set CDM_UNDEFINED_TO_STRING=true, the log collector attempts to add those fields as a string value resulting in the Elasticsearch 400 error. The error clears when the log collector rolls over the indices for the next day’s logs

When the log collector rolls over the indices, it creates a brand new index. The field definitions are updated and you will not get the 400 error.

Procedure

Use the CDM_* parameters to configure undefined and empty field processing.

  1. Configure how to process fields, as needed:

    1. Specify the fields to move using CDM_EXTRA_KEEP_FIELDS.

    2. Specify any empty fields to retain in the CDM_KEEP_EMPTY_FIELDS parameter in CSV format.

  2. Configure how to process undefined fields, as needed:

    1. Set CDM_USE_UNDEFINED to true to move undefined fields to the top-level undefined field:

    2. Specify a name for the undefined fields using the CDM_UNDEFINED_NAME parameter.

    3. Set CDM_UNDEFINED_MAX_NUM_FIELDS to a value other than the default -1, to set an upper bound on the number of undefined fields in a single record.

  3. Specify CDM_UNDEFINED_DOT_REPLACE_CHAR to change any dot . characters in an undefined field name to another character. For example, if CDM_UNDEFINED_DOT_REPLACE_CHAR=@@@ and there is a field named foo.bar.baz the field is transformed into foo@@@bar@@@baz.

  4. Set UNDEFINED_TO_STRING to true to convert undefined fields to their JSON string representation.

If you configure the CDM_UNDEFINED_TO_STRING or CDM_UNDEFINED_MAX_NUM_FIELDS parameters, you use the CDM_UNDEFINED_NAME to change the undefined field name. This field is needed because CDM_UNDEFINED_TO_STRING or CDM_UNDEFINED_MAX_NUM_FIELDS could change the value type of the undefined field. When CDM_UNDEFINED_TO_STRING or CDM_UNDEFINED_MAX_NUM_FIELDS is set to true and there are more undefined fields in a log, the value type becomes string. Elasticsearch stops accepting records if the value type is changed, for example, from JSON to JSON string.

For example, when CDM_UNDEFINED_TO_STRING is false or CDM_UNDEFINED_MAX_NUM_FIELDS is the default, -1, the value type of the undefined field is json. If you change CDM_UNDEFINED_MAX_NUM_FIELDS to a value other than default and there are more undefined fields in a log, the value type becomes string (json string). Elasticsearch stops accepting records if the value type is changed.

Configuring the logging collector using environment variables

You can use environment variables to modify the configuration of the Fluentd log collector.

See the Fluentd README in Github for lists of the available environment variables.

Prerequisite

Set cluster logging to the unmanaged state.

Procedure

Set any of the Fluentd environment variables as needed:

oc set env ds/fluentd <env-var>=<value>

For example:

oc set env ds/fluentd LOGGING_FILE_AGE=30

About logging collector alerts

The following alerts are generated by the logging collector and can be viewed on the Alerts tab of the Prometheus UI.

All the logging collector alerts are listed on the MonitoringAlerts page of the OpenShift Container Platform web console. Alerts are in one of the following states:

  • Firing. The alert condition is true for the duration of the timeout. Click the Options menu at the end of the firing alert to view more information or silence the alert.

  • Pending The alert condition is currently true, but the timeout has not been reached.

  • Not Firing. The alert is not currently triggered.

Table 2. Fluentd Prometheus alerts
Alert Message Description Severity

FluentdErrorsHigh

In the last minute, <value> errors reported by fluentd <instance>.

Fluentd is reporting a higher number of issues than the specified number, default 10.

Critical

FluentdNodeDown

Prometheus could not scrape fluentd <instance> for more than 10m.

Fluentd is reporting that Prometheus could not scrape a specific Fluentd instance.

Critical

FluentdQueueLengthBurst

In the last minute, fluentd <instance> buffer queue length increased more than 32. Current value is <value>.

Fluentd is reporting that it is overwhelmed.

Warning

FluentdQueueLengthIncreasing

In the last 12h, fluentd <instance> buffer queue length constantly increased more than 1. Current value is <value>.

Fluentd is reporting queue usage issues.

Critical