Aggregating Container Logs | Installation and Configuration

Overview
Pre-deployment Configuration
Specifying Deployer Parameters
Deploying the EFK Stack
Understanding and Adjusting the Deployment
- Ops Cluster
- Elasticsearch
- Fluentd
- Kibana
- Curator
Cleanup
Upgrading
Troubleshooting Kibana
Sending Logs to an External Elasticsearch Instance
Performing Administrative Elasticsearch Operations

Overview

As an OpenShift Container Platform cluster administrator, you can deploy the EFK stack to aggregate logs for a range of OpenShift Container Platform services. Application developers can view the logs of the projects for which they have view access. The EFK stack aggregates logs from hosts and applications, whether coming from multiple containers or even deleted pods.

The EFK stack is a modified version of the ELK stack and is comprised of:

Elasticsearch: An object store where all logs are stored.
Fluentd: Gathers logs from nodes and feeds them to Elasticsearch.
Kibana: A web UI for Elasticsearch.

Once deployed in a cluster, the stack aggregates logs from all nodes and projects into Elasticsearch, and provides a Kibana UI to view any logs. Cluster administrators can view all logs, but application developers can only view logs for projects they have permission to view. The stack components communicate securely.

Managing Docker Container Logs discusses the use of json-file logging driver options to manage container logs and prevent filling node disks.

Pre-deployment Configuration

Ensure that you have deployed a router for the cluster.
Ensure that you have the necessary storage for Elasticsearch. Note that each Elasticsearch replica requires its own storage volume. See Elasticsearch for more information.
Ansible-based installs should create the logging-deployer-template template in the openshift project. Otherwise you can create it with the following command:
$ oc apply -n openshift -f \ /usr/share/ansible/openshift-ansible/roles/openshift_hosted_templates/files/v1.3/enterprise/logging-deployer.yaml

Create a new project. Once implemented in a single project, the EFK stack collects logs for every project within your OpenShift Container Platform cluster. The examples in this topic use logging as an example project:

$ oadm new-project logging --node-selector=""
$ oc project logging

Specifying an empty node selector on the project is recommended, as Fluentd should be deployed throughout the cluster and any selector would restrict where it is deployed. To control component placement, specify node selectors per component to be applied to their deployment configurations.

Create the logging service accounts and custom roles:
```
$ oc new-app logging-deployer-account-template
```
If you deployed logging previously, for example in a different project, then it is normal for the cluster roles to fail to be created because they already exist.
Enable the deployer service account to create an OAuthClient (normally a cluster administrator privilege) for Kibana to use later when authenticating against the master.
$ oadm policy add-cluster-role-to-user oauth-editor \ system:serviceaccount:logging:logging-deployer (1)
1 Use the project you created earlier (for example, logging) when specifying this service account.
Enable the Fluentd service account to mount and read system logs by adding it to the privileged security context, and also enable it to read pod metadata by giving it the cluster-reader role:
$ oadm policy add-scc-to-user privileged \ system:serviceaccount:logging:aggregated-logging-fluentd (1) $ oadm policy add-cluster-role-to-user cluster-reader \ system:serviceaccount:logging:aggregated-logging-fluentd (1)
1 Use the project you created earlier (for example, logging) when specifying this service account.
Ensure that port 9300 is open. By default the Elasticsearch service uses port 9300 for TCP communication between nodes in a cluster.

Specifying Deployer Parameters

Parameters for the EFK deployment may be specified in the form of a ConfigMap, a secret, or template parameters (which are passed to the deployer in environment variables). The deployer looks for each value first in a logging-deployer ConfigMap, then a logging-deployer secret, then as an environment variable. Any or all may be omitted if not needed. If you are specifying values within a ConfigMap, the values will not be reflected in the output from oc new-app, but will still precede the corresponding template values within the deployer pod.

The available parameters are outlined below. Typically, you should at least specify the host name at which Kibana should be exposed to client browsers, and also the master URL where client browsers will be directed to for authenticating to OpenShift Container Platform.

Create a ConfigMap to provide most deployer parameters. An invocation supplying the most important parameters might be:

$ oc create configmap logging-deployer \
   --from-literal kibana-hostname=kibana.example.com \
   --from-literal public-master-url=https://master.example.com:8443 \
   --from-literal es-cluster-size=3 \
   --from-literal es-instance-ram=8G

Edit the ConfigMap YAML file after creating it:

$ oc edit configmap logging-deployer

Other parameters are available. Read the ElasticSearch section before choosing ElasticSearch parameters for the deployer, and the Fluentd section for some possible parameters:

Parameter Description

Parameter	Description
*kibana-hostname*	The external host name for web clients to reach Kibana.
*public-master-url*	The external URL for the master; used for OAuth purposes.
*es-cluster-size* (default: 1)	The number of instances of Elasticsearch to deploy. Redundancy requires at least three, and more can be used for scaling.
*es-instance-ram* (default: 8G)	Amount of RAM to reserve per Elasticsearch instance. The default is 8G (for 8GB), and it must be at least 512M. Possible suffixes are G,g,M,m.
*es-pvc-prefix* (default: logging-es-)	Prefix for the names of persistent volume claims to be used as storage for Elasticsearch instances; a number will be appended per instance (for example, logging-es-1). If they do not already exist, they will be created with size *es-pvc-size*.
*es-pvc-size*	Size of the persistent volume claim to create per ElasticSearch instance, 100G, for example. If omitted, no PVCs are created and ephemeral volumes are used instead.
*es-pvc-dynamic*	Set to `true` to have created persistent volume claims annotated so that their backing storage can be dynamically provisioned (if that is available for your cluster).
*storage-group*	Number of a supplemental group ID for access to Elasticsearch storage volumes; backing volumes should allow access by this group ID (defaults to 65534).
*fluentd-nodeselector* (default: logging-infra-fluentd=true)	A node selector that specifies which nodes are eligible targets for deploying Fluentd instances. All nodes where Fluentd should run (typically, all) must have this label before Fluentd will be able to run and collect logs.
*es-nodeselector*	A node selector that specifies which nodes are eligible targets for deploying Elasticsearch instances. This can be used to place these instances on nodes reserved and/or optimized for running them. For example, the selector could be `node-type=infrastructure`. At least one active node must have this label before Elasticsearch will deploy.
*kibana-nodeselector*	A node selector that specifies which nodes are eligible targets for deploying Kibana instances.
*curator-nodeselector*	A node selector that specifies which nodes are eligible targets for deploying Curator instances.
*enable-ops-cluster*	If set to `true`, configures a second Elasticsearch cluster and Kibana for operations logs. Fluentd splits logs between the main cluster and a cluster reserved for operations logs (which consists of */var/log/messages* on nodes and the logs from the projects default, openshift, and openshift-infra). This means a second Elasticsearch and Kibana are deployed. The deployments are distinguishable by the -ops included in their names and have parallel deployment options listed below.
*kibana-ops-hostname, es-ops-instance-ram, es-ops-pvc-size, es-ops-pvc-prefix, es-ops-cluster-size, es-ops-nodeselector, kibana-ops-nodeselector, curator-ops-nodeselector*	Parallel parameters for the ops log cluster.
*image-pull-secret*	Specify the name of an existing pull secret to be used for pulling component images from an authenticated registry.

kibana-hostname

The external host name for web clients to reach Kibana.

public-master-url

The external URL for the master; used for OAuth purposes.

es-cluster-size (default: 1)

The number of instances of Elasticsearch to deploy. Redundancy requires at least three, and more can be used for scaling.

es-instance-ram (default: 8G)

Amount of RAM to reserve per Elasticsearch instance. The default is 8G (for 8GB), and it must be at least 512M. Possible suffixes are G,g,M,m.

es-pvc-prefix (default: logging-es-)

Prefix for the names of persistent volume claims to be used as storage for Elasticsearch instances; a number will be appended per instance (for example, logging-es-1). If they do not already exist, they will be created with size es-pvc-size.

es-pvc-size

Size of the persistent volume claim to create per ElasticSearch instance, 100G, for example. If omitted, no PVCs are created and ephemeral volumes are used instead.

es-pvc-dynamic

Set to true to have created persistent volume claims annotated so that their backing storage can be dynamically provisioned (if that is available for your cluster).

storage-group

Number of a supplemental group ID for access to Elasticsearch storage volumes; backing volumes should allow access by this group ID (defaults to 65534).

fluentd-nodeselector (default: logging-infra-fluentd=true)

A node selector that specifies which nodes are eligible targets for deploying Fluentd instances. All nodes where Fluentd should run (typically, all) must have this label before Fluentd will be able to run and collect logs.

es-nodeselector

A node selector that specifies which nodes are eligible targets for deploying Elasticsearch instances. This can be used to place these instances on nodes reserved and/or optimized for running them. For example, the selector could be node-type=infrastructure. At least one active node must have this label before Elasticsearch will deploy.

kibana-nodeselector

A node selector that specifies which nodes are eligible targets for deploying Kibana instances.

curator-nodeselector

A node selector that specifies which nodes are eligible targets for deploying Curator instances.

enable-ops-cluster

If set to true, configures a second Elasticsearch cluster and Kibana for operations logs. Fluentd splits logs between the main cluster and a cluster reserved for operations logs (which consists of /var/log/messages on nodes and the logs from the projects default, openshift, and openshift-infra). This means a second Elasticsearch and Kibana are deployed. The deployments are distinguishable by the -ops included in their names and have parallel deployment options listed below.

kibana-ops-hostname, es-ops-instance-ram, es-ops-pvc-size, es-ops-pvc-prefix, es-ops-cluster-size, es-ops-nodeselector, kibana-ops-nodeselector, curator-ops-nodeselector

Parallel parameters for the ops log cluster.

image-pull-secret

Specify the name of an existing pull secret to be used for pulling component images from an authenticated registry.

Create a secret to provide security-related files to the deployer. Providing the secret is optional, and the objects will be randomly generated if not supplied.

You can supply the following files when creating a new secret, for example:

$ oc create secret generic logging-deployer \
   --from-file kibana.crt=/path/to/cert \
   --from-file kibana.key=/path/to/key

File Name	Description
*kibana.crt*	A browser-facing certificate for the Kibana server.
*kibana.key*	A key to be used with the Kibana certificate.
*kibana-ops.crt*	A browser-facing certificate for the Ops Kibana server.
*kibana-ops.key*	A key to be used with the Ops Kibana certificate.
*server-tls.json*	JSON TLS options to override the Kibana server defaults. Refer to Node.JS docs for available options.
*ca.crt*	A certificate for a CA that will be used to sign all certificates generated by the deployer.
*ca.key*	A matching CA key.

File Name

Description

kibana.crt

A browser-facing certificate for the Kibana server.

kibana.key

A key to be used with the Kibana certificate.

kibana-ops.crt

A browser-facing certificate for the Ops Kibana server.

kibana-ops.key

A key to be used with the Ops Kibana certificate.

server-tls.json

JSON TLS options to override the Kibana server defaults. Refer to Node.JS docs for available options.

ca.crt

A certificate for a CA that will be used to sign all certificates generated by the deployer.

ca.key

A matching CA key.

Deploying the EFK Stack

The EFK stack is deployed using a template to create a deployer pod that reads the deployment parameters and manages the deployment.

Run the deployer, optionally specifying parameters (described in the table below), for example:

Without template parameters:

$ oc new-app logging-deployer-template

With parameters:

$ oc new-app logging-deployer-template \
             --param IMAGE_VERSION=<tag> \
             --param MODE=install

Replace <tag> with v3.3 for the latest version.

Parameter Name Description

Parameter Name	Description
*IMAGE_PREFIX*	The prefix for logging component images. For example, setting the prefix to registry.access.redhat.com/openshift3/ creates registry.access.redhat.com/openshift3/logging-deployer:latest.
*IMAGE_VERSION*	The version for logging component images. For example, setting the version to v3.3 creates registry.access.redhat.com/openshift3/logging-deployer:v3.3.
*MODE* (default: install)	Mode to run the deployer in; one of `install`, `uninstall`, `reinstall`, `upgrade`, `migrate`, `start`, `stop`.

IMAGE_PREFIX

The prefix for logging component images. For example, setting the prefix to registry.access.redhat.com/openshift3/ creates registry.access.redhat.com/openshift3/logging-deployer:latest.

IMAGE_VERSION

The version for logging component images. For example, setting the version to v3.3 creates registry.access.redhat.com/openshift3/logging-deployer:v3.3.

MODE (default: install)

Mode to run the deployer in; one of install, uninstall, reinstall, upgrade, migrate, start, stop.

Running the deployer creates a deployer pod and prints its name. Wait until the pod is running. This can take up to a few minutes for OpenShift Container Platform to retrieve the deployer image from the registry. Watch its process with:

$ oc get pod/<pod_name> -w

It will eventually enter Running status and end in Complete status. If takes too long to start, retrieve more details about the pod and any associated events with:

$ oc describe pod/<pod_name>

Check the logs if the deployment does not complete successfully:

$ oc logs -f <pod_name>

Once deployment completes successfully, you may need to label the nodes for Fluentd to deploy on, and may have other adjustments to make to the deployed components. These tasks are described in the next section.

Understanding and Adjusting the Deployment

This section describes adjustments that you can make to deployed components.

Ops Cluster

The logs for the default, openshift, and openshift-infra projects are automatically aggregated and grouped into the .operations item in the Kibana interface.

The project where you have deployed the EFK stack (logging, as documented here) is not aggregated into .operations and is found under its ID.

If you set enable-ops-cluster to true for the deployer, Fluentd is configured to split logs between the main ElasticSearch cluster and another cluster reserved for operations logs (which are defined as node system logs and the projects default, openshift, and openshift-infra). Therefore, a separate Elasticsearch cluster, a separate Kibana, and a separate Curator are deployed to index, access, and manage operations logs. These deployments are set apart with names that include -ops. Keep these separate deployments in mind if you enabled this option. Most of the following discussion also applies to the operations cluster if present, just with the names changed to include -ops.

Elasticsearch

A highly-available environment requires at least three replicas of Elasticsearch; each on a different host. Elasticsearch replicas require their own storage, but an OpenShift Container Platform deployment configuration shares storage volumes between all its pods. So, when scaled up, the EFK deployer ensures each replica of Elasticsearch has its own deployment configuration.

It is possible to scale your cluster up after creation by adding more deployments from a template; however, scaling up (or down) requires the correct procedure and an awareness of clustering parameters (to be described in a separate section). It is best to indicate the desired scale at first deployment.

Refer to Elastic’s documentation for considerations involved in choosing storage and network location as directed below.

Viewing all Elasticsearch Deployments

To view all current Elasticsearch deployments:

$ oc get dc --selector logging-infra=elasticsearch

Node Selector

Because Elasticsearch can use a lot of resources, all members of a cluster should have low latency network connections to each other and to any remote storage. Ensure this by directing the instances to dedicated nodes, or a dedicated region within your cluster, using a node selector.

To configure a node selector, specify the es-nodeselector configuration option at deployment. This applies to all Elasticsearch deployments; if you need to individualize the node selectors, you must manually edit each deployment configuration after deployment.

Persistent Elasticsearch Storage

By default, the deployer creates an ephemeral deployment in which all of a pod’s data is lost upon restart. For production usage, specify a persistent storage volume for each Elasticsearch deployment configuration. You can create the necessary persistent volume claims before deploying or have them created for you. The PVCs must be named based on the es-pvc-prefix setting, which defaults to logging-es-; each PVC name will have a sequence number added to it, so logging-es-1, logging-es-2, and so on. If a PVC needed for the deployment exists already, it is used; if not, and es-pvc-size has been specified, it is created with a request for that size.

Using NFS storage as a volume or a persistent volume (or via NAS such as Gluster) is not supported for Elasticsearch storage, as Lucene relies on file system behavior that NFS does not supply. Data corruption and other problems can occur. If NFS storage is a requirement, you can allocate a large file on a volume to serve as a storage device and mount it locally on one host. For example, if your NFS storage volume is mounted at /nfs/storage:

$ truncate -s 1T /nfs/storage/elasticsearch-1
$ mkfs.xfs /nfs/storage/elasticsearch-1
$ mount -o loop /nfs/storage/elasticsearch-1 /usr/local/es-storage
$ chown 1000:1000 /usr/local/es-storage

Then, use /usr/local/es-storage as a host-mount as described below. Use a different backing file as storage for each Elasticsearch replica.

This loopback must be maintained manually outside of OpenShift Container Platform, on the node. You must not maintain it from inside a container.

It is possible to use a local disk volume (if available) on each node host as storage for an Elasticsearch replica. Doing so requires some preparation as follows.

The relevant service account must be given the privilege to mount and edit a local volume:
$ oadm policy add-scc-to-user privileged \ system:serviceaccount:logging:aggregated-logging-elasticsearch (1)
1 Use the project you created earlier (for example, logging) when specifying this service account.

Each Elasticsearch replica definition must be patched to claim that privilege, for example:

$ for dc in $(oc get deploymentconfig --selector logging-infra=elasticsearch -o name); do
    oc scale $dc --replicas=0
    oc patch $dc \
       -p '{"spec":{"template":{"spec":{"containers":[{"name":"elasticsearch","securityContext":{"privileged": true}}]}}}}'
  done

The Elasticsearch replicas must be located on the correct nodes to use the local storage, and should not move around even if those nodes are taken down for a period of time. This requires giving each Elasticsearch replica a node selector that is unique to a node where an administrator has allocated storage for it. To configure a node selector, edit each Elasticsearch deployment configuration and add or edit the nodeSelector section to specify a unique label that you have applied for each desired node:
apiVersion: v1 kind: DeploymentConfig spec: template: spec: nodeSelector: logging-es-node: "1" (1)
1 This label should uniquely identify a replica with a single node that bears that label, in this case logging-es-node=1. Use the oc label command to apply labels to nodes as needed.

To automate applying the node selector you can instead use the oc patch command:
$ oc patch dc/logging-es-<suffix> \ -p '{"spec":{"template":{"spec":{"nodeSelector":{"logging-es-node":"1"}}}}}'

Once these steps are taken, a local host mount can be applied to each replica as in this example (where we assume storage is mounted at the same path on each node):

$ for dc in $(oc get deploymentconfig --selector logging-infra=elasticsearch -o name); do
    oc set volume $dc \
          --add --overwrite --name=elasticsearch-storage \
          --type=hostPath --path=/usr/local/es-storage
    oc deploy --latest $dc
    oc scale $dc --replicas=1
  done

Changing the Scale of Elasticsearch

If you need to scale up the number of Elasticsearch instances your cluster uses, it is not as simple as scaling up an Elasticsearch deployment configuration. This is due to the nature of persistent volumes and how Elasticsearch is configured to store its data and recover the cluster. Instead, scaling up requires creating a deployment configuration for each Elasticsearch cluster node.

By far the simplest way to change the scale of Elasticsearch is to reinstall the whole deployment. Assuming you have supplied persistent storage for the deployment, this should not be very disruptive. Simply re-run the deployer with the updated es-cluster-size configuration value and the MODE=reinstall template parameter. For example:

$ oc edit configmap logging-deployer
  [change es-cluster-size value to 5]
$ oc new-app logging-deployer-template --param MODE=reinstall

If you previously deployed using template parameters rather than a ConfigMap, this would be a good time to create a ConfigMap instead for future deployer execution.

If you do not wish to reinstall, for instance because you have made customizations that you would like to preserve, then it is possible to add new Elasticsearch deployment configurations to the cluster using a template supplied by the deployer. This requires a more complicated procedure however.

During installation, the deployer creates templates with the Elasticsearch configurations provided to it: logging-es-template (and logging-es-ops-template if the deployer was run with ENABLE_OPS_CLUSTER=true). You can use these for scaling, but you need to adjust the size-related parameters in the templates:

Parameter Description

Parameter	Description
`NODE_QUORUM`	The quorum required to elect a new master. Should be more than half the intended cluster size.
`RECOVER_AFTER_NODES`	When restarting the cluster, require this many nodes to be present before starting recovery. Defaults to one less than the cluster size to allow for one missing node.
`RECOVER_EXPECTED_NODES`	When restarting the cluster, wait for this number of nodes to be present before starting recovery. By default, the same as the cluster size.

NODE_QUORUM

The quorum required to elect a new master. Should be more than half the intended cluster size.

RECOVER_AFTER_NODES

When restarting the cluster, require this many nodes to be present before starting recovery. Defaults to one less than the cluster size to allow for one missing node.

RECOVER_EXPECTED_NODES

When restarting the cluster, wait for this number of nodes to be present before starting recovery. By default, the same as the cluster size.

The node quorum and recovery settings in the template were set based on the es-[ops-]cluster-size value initially provided to the deployer. Since the cluster size is changing, those values need to be overridden.

The existing deployment configurations for that cluster also need to have the three environment variable values above updated. To edit each of the configurations for the cluster in series, you may use the following command:
$ oc edit $(oc get dc -l component=es[-ops] -o name)
Edit the environment variables supplied so that the next time they restart, they will begin with the correct values. For example, for a cluster of size 5, you would set NODE_QUORUM to 3, RECOVER_AFTER_NODES to 4, and RECOVER_EXPECTED_NODES to 5.
Create additional deployment configurations by running the following command against the Elasticsearch cluster you want to to scale up for (logging-es-template or logging-es-ops-template), overriding the parameters as above.
$ oc new-app logging-es[-ops]-template \ --param NODE_QUORUM=3 \ --param RECOVER_AFTER_NODES=4 \ --param RECOVER_EXPECTED_NODES=5
These deployments will be named differently, but all will have the logging-es prefix.
Each new deployment configuration is created without a persistent volume. If you want to attach a persistent volume to it, after creation you can use the oc set volume command to do so, for example:
```
$ oc volume dc/logging-es-<suffix> \
          --add --overwrite --name=elasticsearch-storage \
          --type=persistentVolumeClaim --claim-name=<your_pvc>
```
After the intended number of deployment configurations are created, scale up each new one to deploy it:
```
$ oc scale --replicas=1 dc/logging-es-<suffix>
```

Fluentd

Fluentd is deployed as a DaemonSet that deploys replicas according to a node label selector (which you can specify with the deployer parameter fluentd-nodeselector; the default is logging-infra-fluentd).

Once you have ElasticSearch running as desired, label the nodes intended for Fluentd deployment to feed their logs into ES. The example below would label a node named node.example.com using the default Fluentd node selector:

$ oc label node/node.example.com logging-infra-fluentd=true

Alternatively, you can label all nodes with:

$ oc label node --all logging-infra-fluentd=true

Labeling nodes requires cluster administrator capability.

Having Fluentd Use the Systemd Journal as the Log Source

By default, Fluentd reads from /var/log/messages and /var/log/containers/<container>.log for system logs and container logs, respectively. You can instead use the systemd journal as the log source. There are three deployer configuration parameters available in the deployer ConfigMap:

Parameter Description

Parameter	Description
`use-journal`	The default is empty, which tells the deployer to have Fluentd check which log driver Docker is using. If Docker is using `--log-driver=journald`, Fluentd reads from the systemd journal, otherwise, it assumes docker is using the `json-file` log driver and reads from the */var/log* file sources. You can specify the `use-journal` option as `true` or `false` to be explicit about which log source to use. Using the systemd journal requires `docker-1.10` or later, and Docker must be configured to use `--log-driver=journald`.
`journal-source`	The default is empty, so that when using the systemd journal, Fluentd first looks for */var/log/journal, and if that is not available, uses /run/log/journal* as the journal source. You can specify `journal-source` with an explicit journal path. For example, if you want Fluentd to always read logs from the transient in-memory journal, set `journal-source`=*/run/log/journal*.
`journal-read-from-head`	If this setting is false, Fluentd starts reading from the end of the journal, ignoring historical logs. If this setting is true, Fluentd starts reading logs from the beginning of the journal.

use-journal

The default is empty, which tells the deployer to have Fluentd check which log driver Docker is using. If Docker is using --log-driver=journald, Fluentd reads from the systemd journal, otherwise, it assumes docker is using the json-file log driver and reads from the /var/log file sources. You can specify the use-journal option as true or false to be explicit about which log source to use. Using the systemd journal requires docker-1.10 or later, and Docker must be configured to use --log-driver=journald.

journal-source

The default is empty, so that when using the systemd journal, Fluentd first looks for /var/log/journal, and if that is not available, uses /run/log/journal as the journal source. You can specify journal-source with an explicit journal path. For example, if you want Fluentd to always read logs from the transient in-memory journal, set journal-source=/run/log/journal.

journal-read-from-head

If this setting is false, Fluentd starts reading from the end of the journal, ignoring historical logs. If this setting is true, Fluentd starts reading logs from the beginning of the journal.

As of OpenShift Container Platform 3.3, Fluentd no longer reads historical log files when using the JSON file log driver. In situations where clusters have a large number of log files and are older than the EFK deployment, this avoids delays when pushing the most recent logs into Elasticsearch. Curator deleting logs are migrated soon after they are added to Elasticsearch.

It may require several minutes, or hours, depending on the size of your journal, before any new log entries are available in Elasticsearch, when using journal-read-from-head=true.

Having Fluentd Send Logs to Another Elasticsearch

The use of ES_COPY is being deprecated. To configure FluentD to send a copy of its logs to an external aggregator, use Fluentd Secure Forward instead.

You can configure Fluentd to send a copy of each log message to both the Elasticsearch instance included with OpenShift Container Platform aggregated logging, and to an external Elasticsearch instance. For example, if you already have an Elasticsearch instance set up for auditing purposes, or data warehousing, you can send a copy of each log message to that Elasticsearch.

This feature is controlled via environment variables on Fluentd, which can be modified as described below.

If its environment variable ES_COPY is true, Fluentd sends a copy of the logs to another Elasticsearch. The names for the copy variables are just like the current ES_HOST, OPS_HOST, and other variables, except that they add _COPY: ES_COPY_HOST, OPS_COPY_HOST, and so on. There are some additional parameters added:

ES_COPY_SCHEME, OPS_COPY_SCHEME - can use either http or https - defaults to https
ES_COPY_USERNAME, OPS_COPY_USERNAME - user name to use to authenticate to Elasticsearch using username/password auth
ES_COPY_PASSWORD, OPS_COPY_PASSWORD - password to use to authenticate to Elasticsearch using username/password auth

Sending logs directly to an AWS Elasticsearch instance is not supported. Use Fluentd Secure Forward to direct logs to an instance of Fluentd that you control and that is configured with the fluent-plugin-aws-elasticsearch-service plug-in.

To set the parameters:

Edit the template for the Fluentd daemonset:
```
$ oc edit -n logging template logging-fluentd-template
```
Add or edit the environment variable ES_COPY to have the value "true" (with the quotes), and add or edit the COPY variables listed above.

Recreate the Fluentd daemonset from the template:

$ oc delete daemonset logging-fluentd
$ oc new-app logging-fluentd-template

Configuring Fluentd to Send Logs to an External Log Aggregator

You can configure Fluentd to send a copy of its logs to an external log aggregator, and not the default Elasticsearch, using the secure-forward plug-in. From there, you can further process log records after the locally hosted Fluentd has processed them.

The deployer provides a secure-forward.conf section in the Fluentd configmap for configuring the external aggregator:

<store>
@type secure_forward
self_hostname pod-${HOSTNAME}
shared_key thisisasharedkey
secure yes
enable_strict_verification yes
ca_cert_path /etc/fluent/keys/your_ca_cert
ca_private_key_path /etc/fluent/keys/your_private_key
ca_private_key_passphrase passphrase
<server>
  host ose1.example.com
  port 24284
</server>
<server>
  host ose2.example.com
  port 24284
  standby
</server>
<server>
  host ose3.example.com
  port 24284
  standby
</server>
</store>

This can be updated using the oc edit command:

$ oc edit configmap/logging-fluentd

Certificates to be used in secure-forward.conf can be added to the existing secret that is mounted on the Fluentd pods. The your_ca_cert and your_private_key values must match what is specified in secure-forward.conf in configmap/logging-fluentd:

$ oc patch secrets/logging-fluentd --type=json \
  --patch "[{'op':'add','path':'/data/your_ca_cert','value':'$(base64 /path/to/your_ca_cert.pem)'}]"
$ oc patch secrets/logging-fluentd --type=json \
  --patch "[{'op':'add','path':'/data/your_private_key','value':'$(base64 /path/to/your_private_key.pem)'}]"

Avoid using secret names such as 'cert', 'key', and 'ca' so that the values do not conflict with the keys generated by the Deployer pod for Fluentd to talk to the OpenShift Container Platform hosted Elasticsearch.

When configuring the external aggregator, it must be able to accept messages securely from Fluentd.

If the external aggregator is another Fluentd process, it must have the fluent-plugin-secure-forward plug-in installed and make use of the input plug-in it provides:

<source>
  @type secure_forward

  self_hostname ${HOSTNAME}
  bind 0.0.0.0
  port 24284

  shared_key thisisasharedkey

  secure yes
  cert_path        /path/for/certificate/cert.pem
  private_key_path /path/for/certificate/key.pem
  private_key_passphrase secret_foo_bar_baz
</source>

Further explanation of how to set up the fluent-plugin-secure-forward plug-in can be found here.

Throttling logs in Fluentd

For projects that are especially verbose, an administrator can throttle down the rate at which the logs are read in by Fluentd before being processed.

Throttling can contribute to log aggregation falling behind for the configured projects; log entries can be lost if a pod is deleted before Fluentd catches up.

Throttling does not work when using the systemd journal as the log source. The throttling implementation depends on being able to throttle the reading of the individual log files for each project. When reading from the journal, there is only a single log source, no log files, so no file-based throttling is available. There is not a method of restricting the log entries that are read into the Fluentd process.

To tell Fluentd which projects it should be restricting, edit the throttle configuration in its ConfigMap after deployment:

$ oc edit configmap/logging-fluentd

The format of the throttle-config.yaml key is a YAML file that contains project names and the desired rate at which logs are read in on each node. The default is 1000 lines at a time per node. For example:

logging:
  read_lines_limit: 500

test-project:
  read_lines_limit: 10

.operations:
  read_lines_limit: 100

When you make changes to any part of the EFK stack, specifically Elasticsearch or Fluentd, you should first scale Elasicsearch down to zero and scale Fluentd so it does not match any other nodes. Then, make the changes and scale Elasicsearch and Fluentd back.

To scale Elasicsearch to zero:

$ oc scale --replicas=0 dc/<ELASTICSEARCH_DC>

Change nodeSelector in the daemonset configuration to match zero:

Get the fluentd node selector:

$ oc get ds logging-fluentd -o yaml |grep -A 1 Selector
     nodeSelector:
       logging-infra-fluentd: "true"

Use the oc patch command to modify the daemonset nodeSelector:

$ oc patch ds logging-fluentd -p '{"spec":{"template":{"spec":{"nodeSelector":{"nonexistlabel":"true"}}}}}'

Get the fluentd node selector:

$ oc get ds logging-fluentd -o yaml |grep -A 1 Selector
     nodeSelector:
       "nonexistlabel: "true"

Scale Elastcsearch back up from zero:

$ oc scale --replicas=# dc/<ELASTICSEARCH_DC>

Change nodeSelector in the daemonset configuration back to logging-infra-fluentd: "true".

Use the oc patch command to modify the daemonset nodeSelector:

oc patch ds logging-fluentd -p '{"spec":{"template":{"spec":{"nodeSelector":{"logging-infra-fluentd":"true"}}}}}'

Kibana

To access the Kibana console from the OpenShift Container Platform web console, add the loggingPublicURL parameter in the /etc/origin/master/master-config.yaml file, with the URL of the Kibana console (the kibana-hostname parameter). The value must be an HTTPS URL:

...
assetConfig:
  ...
  loggingPublicURL: "https://kibana.example.com"
...

Setting the loggingPublicURL parameter creates a View Archive button on the OpenShift Container Platform web console under the Browse → Pods → <pod_name> → Logs tab. This links to the Kibana console.

You can scale the Kibana deployment as usual for redundancy:

$ oc scale dc/logging-kibana --replicas=2

You can see the user interface by visiting the site specified at the KIBANA_HOSTNAME variable.

See the Kibana documentation for more information on Kibana.

Curator

Curator allows administrators to configure scheduled Elasticsearch maintenance operations to be performed automatically on a per-project basis. It is scheduled to perform actions daily based on its configuration. Only one Curator pod is recommended per Elasticsearch cluster. Curator is configured via a YAML configuration file with the following structure:

$PROJECT_NAME:
  $ACTION:
    $UNIT: $VALUE

$PROJECT_NAME:
  $ACTION:
    $UNIT: $VALUE
 ...

The available parameters are:

Variable Name Description

Variable Name	Description
`$PROJECT_NAME`	The actual name of a project, such as myapp-devel. For OpenShift Container Platform operations logs, use the name `.operations` as the project name.
`$ACTION`	The action to take, currently only `delete` is allowed.
`$UNIT`	One of `days`, `weeks`, or `months`.
`$VALUE`	An integer for the number of units.
`.defaults`	Use `.defaults` as the `$PROJECT_NAME` to set the defaults for projects that are not specified.
`runhour`	(Number) the hour of the day in 24-hour format at which to run the Curator jobs. For use with `.defaults`.
`runminute`	(Number) the minute of the hour at which to run the Curator jobs. For use with `.defaults`.

$PROJECT_NAME

The actual name of a project, such as myapp-devel. For OpenShift Container Platform operations logs, use the name .operations as the project name.

$ACTION

The action to take, currently only delete is allowed.

$UNIT

One of days, weeks, or months.

$VALUE

An integer for the number of units.

.defaults

Use .defaults as the $PROJECT_NAME to set the defaults for projects that are not specified.

runhour

(Number) the hour of the day in 24-hour format at which to run the Curator jobs. For use with .defaults.

runminute

(Number) the minute of the hour at which to run the Curator jobs. For use with .defaults.

For example, to configure Curator to:

delete indices in the myapp-dev project older than 1 day
delete indices in the myapp-qe project older than 1 week
delete operations logs older than 8 weeks
delete all other projects indices after they are 30 days old
run the Curator jobs at midnight every day

Use:

myapp-dev:
 delete:
   days: 1

myapp-qe:
  delete:
    weeks: 1

.operations:
  delete:
    weeks: 8

.defaults:
  delete:
    days: 30
  runhour: 0
  runminute: 0

When you use month as the $UNIT for an operation, Curator starts counting at the first day of the current month, not the current day of the current month. For example, if today is April 15, and you want to delete indices that are 2 months older than today (delete: months: 2), Curator does not delete indices that are dated older than February 15; it deletes indices older than February 1. That is, it goes back to the first day of the current month, then goes back two whole months from that date. If you want to be exact with Curator, it is best to use days (for example, delete: days: 30).

Creating the Curator Configuration

The deployer provides a ConfigMap from which Curator reads its configuration. You may edit or replace this ConfigMap to reconfigure Curator. Currently the logging-curator ConfigMap is used to configure both your ops and non-ops Curator instances. Any .operations configurations will be in the same location as your application logs configurations.

To edit the provided ConfigMap to configure your Curator instances:
```
$ oc edit configmap/logging-curator
```

To replace the provided ConfigMap instead:

$ create /path/to/mycuratorconfig.yaml
$ oc create configmap logging-curator -o yaml \
  --from-file=config.yaml=/path/to/mycuratorconfig.yaml | \
  oc replace -f -

After you make your changes, redeploy Curator:

$ oc deploy --latest dc/logging-curator
$ oc deploy --latest dc/logging-curator-ops

Cleanup

Remove everything generated during the deployment while leaving other project contents intact:

$ oc new-app logging-deployer-template --param MODE=uninstall

Upgrading

To upgrade the EFK logging stack, see Manual Upgrades.

Troubleshooting Kibana

Using the Kibana console with OpenShift Container Platform can cause problems that are easily solved, but are not accompanied with useful error messages. Check the following troubleshooting sections if you are experiencing any problems when deploying Kibana on OpenShift Container Platform:

Login Loop

The OAuth2 proxy on the Kibana console must share a secret with the master host’s OAuth2 server. If the secret is not identical on both servers, it can cause a login loop where you are continuously redirected back to the Kibana login page.

To fix this issue, delete the current OAuthClient, and create a new one, using the same template as before:

$ oc delete oauthclient/kibana-proxy
$ oc new-app logging-support-template

Cryptic Error When Viewing the Console

When attempting to visit the Kibana console, you may receive a browser error instead:

{"error":"invalid_request","error_description":"The request is missing a required parameter,
 includes an invalid parameter value, includes a parameter more than once, or is otherwise malformed."}

This can be caused by a mismatch between the OAuth2 client and server. The return address for the client must be in a whitelist so the server can securely redirect back after logging in.

Fix this issue by replacing the OAuthClient entry:

$ oc delete oauthclient/kibana-proxy
$ oc new-app logging-support-template

If the problem persists, check that you are accessing Kibana at a URL listed in the OAuth client. This issue can be caused by accessing the URL at a forwarded port, such as 1443 instead of the standard 443 HTTPS port. You can adjust the server whitelist by editing the OAuth client:

$ oc edit oauthclient/kibana-proxy

503 Error When Viewing the Console

If you receive a proxy error when viewing the Kibana console, it could be caused by one of two issues.

First, Kibana may not be recognizing pods. If Elasticsearch is slow in starting up, Kibana may timeout trying to reach it. Check whether the relevant service has any endpoints:

$ oc describe service logging-kibana
Name:                   logging-kibana
[...]
Endpoints:              <none>

If any Kibana pods are live, endpoints will be listed. If they are not, check the state of the Kibana pods and deployment. You may need to scale the deployment down and back up again.

The second possible issue may be caused if the route for accessing the Kibana service is masked. This can happen if you perform a test deployment in one project, then deploy in a different project without completely removing the first deployment. When multiple routes are sent to the same destination, the default router will only route to the first created. Check the problematic route to see if it is defined in multiple places:

$ oc get route  --all-namespaces --selector logging-infra=support

F-5 Load Balancer and X-Forwarded-For Enabled

If you are attempting to use a F-5 load balancer in front of Kibana with X-Forwarded-For enabled, this can cause an issue in which the Elasticsearch Searchguard plug-in is unable to correctly accept connections from Kibana.

Example Kibana Error Message

Kibana: Unknown error while connecting to Elasticsearch

Error: Unknown error while connecting to Elasticsearch
Error: UnknownHostException[No trusted proxies]

To configure Searchguard to ignore the extra header:

Scale down all Fluentd pods.
Scale down Elasticsearch after the Fluentd pods have terminated.
Add searchguard.http.xforwardedfor.header: DUMMY to the Elasticsearch configuration section.
```
$ oc edit configmap/logging-elasticsearch (1)
```
1 This approach requires that Elasticsearch’s configurations are within a ConfigMap.
Scale Elasticsearch back up.
Scale up all Fluentd pods.

Sending Logs to an External Elasticsearch Instance

Fluentd sends logs to the value of the ES_HOST, ES_PORT, OPS_HOST, and OPS_PORT environment variables of the Elasticsearch deployment configuration. The application logs are directed to the ES_HOST destination, and operations logs to OPS_HOST.

To direct logs to a specific Elasticsearch instance, edit the deployment configuration and replace the value of the above variables with the desired instance:

$ oc edit dc/<deployment_configuration>

For an external Elasticsearch instance to contain both application and operations logs, you can set ES_HOST and OPS_HOST to the same destination, while ensuring that ES_PORT and OPS_PORT also have the same value.

If your externally hosted Elasticsearch instance does not use TLS, update the _CLIENT_CERT, _CLIENT_KEY, and _CA variables to be empty. If it does use TLS, but not mutual TLS, update the _CLIENT_CERT and _CLIENT_KEY variables to be empty and patch or recreate the logging-fluentd secret with the appropriate _CA value for communicating with your Elasticsearch instance. If it uses Mutual TLS as the provided Elasticsearch instance does, patch or recreate the logging-fluentd secret with your client key, client cert, and CA.

Since Fluentd is deployed by a DaemonSet, update the logging-fluentd-template template, delete your current DaemonSet, and recreate it with oc new-app logging-fluentd-template after seeing all previous Fluentd pods have terminated.

If you are not using the provided Kibana and Elasticsearch images, you will not have the same multi-tenant capabilities and your data will not be restricted by user access to a particular project.

Performing Administrative Elasticsearch Operations

As of the Deployer version 3.2.0, an administrator certificate, key, and CA that can be used to communicate with and perform administrative operations on Elasticsearch are provided within the logging-elasticsearch secret.

To confirm whether or not your EFK installation provides these, run:

$ oc describe secret logging-elasticsearch

If they are not available, refer to Manual Upgrades to ensure you are on the latest version first.

Connect to an Elasticsearch pod that is in the cluster on which you are attempting to perform maintenance.

To find a pod in a cluster use either:

$ oc get pods -l component=es -o name | head -1
$ oc get pods -l component=es-ops -o name | head -1

Connect to a pod:
$ oc rsh <your_Elasticsearch_pod>
Once connected to an Elasticsearch container, you can use the certificates mounted from the secret to communicate with Elasticsearch per its 1.5 Document APIs.

Fluentd sends its logs to Elasticsearch using the index format {project_name}.{project_uuid}.YYYY.MM.DD where YYYY.MM.DD is the date of the log record.

For example, to delete all logs for the logging project with uuid 3b3594fa-2ccd-11e6-acb7-0eb6b35eaee3 from June 15, 2016, we can run:
$ curl --key /etc/elasticsearch/secret/admin-key \ --cert /etc/elasticsearch/secret/admin-cert \ --cacert /etc/elasticsearch/secret/admin-ca -XDELETE \ "https://localhost:9200/logging.3b3594fa-2ccd-11e6-acb7-0eb6b35eaee3.2016.06.15"