$ oc adm new-project logging --node-selector="" $ oc project logging
As an OpenShift Container Platform cluster administrator, you can deploy the EFK stack to aggregate logs for a range of OpenShift Container Platform services. Application developers can view the logs of the projects for which they have view access. The EFK stack aggregates logs from hosts and applications, whether coming from multiple containers or even deleted pods.
The EFK stack is a modified version of the ELK stack and is comprised of:
Elasticsearch (ES): An object store where all logs are stored.
Fluentd: Gathers logs from nodes and feeds them to Elasticsearch.
Kibana: A web UI for Elasticsearch.
After deployment in a cluster, the stack aggregates logs from all nodes and projects into Elasticsearch, and provides a Kibana UI to view any logs. Cluster administrators can view all logs, but application developers can only view logs for projects they have permission to view. The stack components communicate securely.
Aggregated logging is supported using the json-file
or journald
driver in Docker.
The Docker log driver is set to journald
as the default for all nodes. See
Updating Fluentd’s Log Source After a Docker Log
Driver Update for more information about switching between json-file
and journald
.
Fluentd automatically determines which log driver (journald
or json-file
) the container runtime is using.
When the log driver is set to journald
, Fluentd reads journald logs. When set to json-file
Fluentd reads from /var/log/containers.
See Managing
Docker Container Logs for information on json-file
logging driver options
to manage container logs and prevent filling node disks.
If Docker log-driver is set to journald, there is no log rate throttling with the |
An Ansible playbook is available to deploy and upgrade aggregated logging. You should familiarize yourself with the advanced installation and configuration section. This provides information for preparing to use Ansible and includes information about configuration. Parameters are added to the Ansible inventory file to configure various areas of the EFK stack.
Review the sizing guidelines to determine how best to configure your deployment.
Ensure that you have deployed a router for the cluster.
Ensure that you have the necessary storage for Elasticsearch. Note that each Elasticsearch replica requires its own storage volume. See Elasticsearch for more information.
Determine if you need highly-available Elasticsearch. A highly-available environment requires
multiple replicas of each shard. By default, OpenShift Container Platform creates one shard for each index and
zero replicas of those shards. To create high availability, set the openshift_logging_es_number_of_replicas
Ansible variable
to a value higher than 1
. High availability also requires at least three Elasticsearch nodes,
each on a different host. See Elasticsearch for more information.
Choose a project. Once deployed, the EFK stack collects logs for every
project within your OpenShift Container Platform cluster. The examples in this section use the
default project logging. The Ansible playbook creates the project for you
if it does not already exist. You will only need to create a project if you want
to specify a node-selector on it. Otherwise, the openshift-logging
role will
create a project.
$ oc adm new-project logging --node-selector="" $ oc project logging
Specifying an empty node selector on the project is recommended, as Fluentd should be deployed throughout the cluster and any selector would restrict where it is deployed. To control component placement, specify node selectors per component to be applied to their deployment configurations. |
Parameters for the EFK deployment may be specified to the inventory host file to override the default parameter values. Read the Elasticsearch and the Fluentd sections before choosing parameters:
By default the Elasticsearch service uses port 9300 for TCP communication between nodes in a cluster. |
Parameter | Description | ||
---|---|---|---|
|
The prefix for logging component images. For example, setting the prefix to registry.access.redhat.com/openshift3/ creates registry.access.redhat.com/openshift3/logging-fluentd:latest. |
||
|
The version for logging component images. For example, setting the version to v3.7 creates registry.access.redhat.com/openshift3/logging-fluentd:v3.7. |
||
|
If set to |
||
|
The URL for the Kubernetes master, this does not need to be public facing but should be accessible from within the cluster. For example, https://<PRIVATE-MASTER-URL>:8443. |
||
|
The public facing URL for the Kubernetes master. This is used for Authentication redirection by the Kibana proxy. For example, https://<CONSOLE-PUBLIC-URL-MASTER>:8443. |
||
|
The namespace where Aggregated Logging is deployed. |
||
|
Set to |
||
|
The common uninstall keeps PVC to prevent unwanted data loss during
reinstalls. To ensure that the Ansible playbook completely and irreversibly
removes all logging persistent data including PVC, set
|
||
|
Coupled with |
||
|
The prefix for the eventrouter logging image. The default is set to
|
||
|
The image version for the logging eventrouter. The default is set to 'latest'. |
||
|
Select a sink for eventrouter, supported |
||
|
A map of labels, such as |
||
|
The default is set to '1'. |
||
|
The minimum amount of CPU to allocate to eventrouter. The default is set to '100m'. |
||
|
The memory limit for eventrouter pods. The default is set to '128Mi'. |
||
|
The project where eventrouter is deployed. The default is set to 'default'.
|
||
|
Specify the name of an existing pull secret to be used for pulling component images from an authenticated registry. |
||
|
The default minimum age (in days) Curator uses for deleting log records. |
||
|
The hour of the day Curator will run. |
||
|
The minute of the hour Curator will run. |
||
|
The timezone Curator uses for figuring out its run time. Provide a the
timezone as a string in the tzselect(8) or timedatectl(1) "Region/Locality"
format, for example |
||
|
The script log level for Curator. |
||
|
The log level for the Curator process. |
||
|
The amount of CPU to allocate to Curator. |
||
|
The amount of memory to allocate to Curator. |
||
|
A node selector that specifies which nodes are eligible targets for deploying Curator instances. |
||
|
Equivalent to |
||
|
Equivalent to |
||
|
The external host name for web clients to reach Kibana. |
||
|
The amount of CPU to allocate to Kibana. |
||
|
The amount of memory to allocate to Kibana. |
||
|
When |
||
|
The amount of CPU to allocate to Kibana proxy. |
||
|
The amount of memory to allocate to Kibana proxy. |
||
|
The number of replicas to which Kibana should be scaled up. |
||
|
A node selector that specifies which nodes are eligible targets for deploying Kibana instances. |
||
|
A map of environment variables to add to the Kibana deployment configuration. For example, {"ELASTICSEARCH_REQUESTTIMEOUT":"30000"}. |
||
|
The public facing key to use when creating the Kibana route. |
||
|
The cert that matches the key when creating the Kibana route. |
||
|
Optional. The CA to goes with the key and cert used when creating the Kibana route. |
||
|
Equivalent to |
||
|
Equivalent to |
||
|
Equivalent to |
||
|
Equivalent to |
||
|
Equivalent to |
||
|
Equivalent to |
||
|
Equivalent to |
||
|
Set to |
||
|
The external-facing hostname to use for the route and the TLS server
certificate. The default is set to For example, if |
||
|
The location of the certificate Elasticsearch uses for the external TLS server cert. The default is a generated cert. |
||
|
The location of the key Elasticsearch uses for the external TLS server cert. The default is a generated key. |
||
|
The location of the CA cert Elasticsearch uses for the external TLS server cert. The default is the internal CA. |
||
|
Set to |
||
|
The external-facing hostname to use for the route and the TLS server certificate.
The default is set to For example, if |
||
|
The location of the certificate Elasticsearch uses for the external TLS server cert. The default is a generated cert. |
||
|
The location of the key Elasticsearch uses for the external TLS server cert. The default is a generated key. |
||
|
The location of the CA cert Elasticsearch uses for the external TLS server cert. The default is the internal CA. |
||
|
A node selector that specifies which nodes are eligible targets for deploying Fluentd instances. Any node where Fluentd should run (typically, all) must have this label before Fluentd is able to run and collect logs. When scaling up the Aggregated Logging cluster after installation,
the As part of the installation, it is recommended that you add the Fluentd node selector label to the list of persisted node labels. |
||
|
The CPU limit for Fluentd pods. |
||
|
The memory limit for Fluentd pods. |
||
|
Set to |
||
|
List of nodes that should be labeled for Fluentd to be deployed. The default is
to label all nodes with ['--all']. The null value is
|
||
|
When |
||
|
Location of audit log file. The default is |
||
|
Location of the Fluentd |
||
|
The name of the Elasticsearch service where Fluentd should send logs. |
||
|
The port for the Elasticsearch service where Fluentd should send logs. |
||
|
The location of the CA Fluentd uses to communicate with |
||
|
The location of the client certificate Fluentd uses for |
||
|
The location of the client key Fluentd uses for |
||
|
Elasticsearch nodes to deploy. High availability requires at least three or more. |
||
|
The amount of CPU limit for the Elasticsearch cluster. |
||
|
Amount of RAM to reserve per Elasticsearch instance. It must be at least 512M. Possible suffixes are G,g,M,m. |
||
|
The number of replicas per primary shard for each new index. Defaults to '0'. A minimum of |
||
|
The number of primary shards for every new index created in ES. Defaults to '1'. |
||
|
A key/value map added to a PVC in order to select specific PVs. |
||
|
Set to |
||
|
To use a non-default storage class, set the variable with the storage class
name. For example, set to one of the following,
|
||
|
Size of the persistent volume claim to
create per Elasticsearch instance. For example, 100G. If omitted, no PVCs are
created and ephemeral volumes are used instead. If you set this parameter, the logging installer sets |
||
|
Sets the Elasticsearch storage type. If you are using Persistent Elasticsearch Storage, the logging installer sets this to |
||
|
Prefix for the names of persistent volume claims to be used as storage for
Elasticsearch nodes. A number is appended per node, such as
logging-es-1. If they do not already exist, they are created with size
When
|
||
|
The amount of time Elasticsearch will wait before it tries to recover. |
||
|
Number of a supplemental group ID for access to Elasticsearch storage volumes. Backing volumes should allow access by this group ID. |
||
|
A node selector specified as a map that determines which nodes are eligible targets
for deploying Elasticsearch nodes. Use this map to place these instances on nodes that are reserved or optimized for running them.
For example, the selector could be |
||
|
Equivalent to |
||
|
Equivalent to |
||
|
Equivalent to |
||
|
Equivalent to |
||
|
Equivalent to |
||
|
Equivalent to |
||
|
Equivalent to |
||
|
Equivalent to |
||
|
Equivalent to |
||
|
Equivalent to |
||
|
Equivalent to |
||
|
Equivalent to |
||
|
Equivalent to |
||
|
Equivalent to |
||
|
A node selector that specifies which nodes are eligible targets
for deploying Elasticsearch nodes. This can be used to place
these instances on nodes reserved or optimized for running them.
For example, the selector could be |
||
|
The default value, You may also set the value |
||
|
A node selector that specifies which nodes are eligible targets for deploying Kibana instances. |
||
|
A node selector that specifies which nodes are eligible targets for deploying Curator instances. |
Custom Certificates
You can specify custom certificates using the following inventory variables instead of relying on those generated during the deployment process. These certificates are used to encrypt and secure communication between a user’s browser and Kibana. The security-related files will be generated if they are not supplied.
File Name | Description |
---|---|
|
A browser-facing certificate for the Kibana server. |
|
A key to be used with the browser-facing Kibana certificate. |
|
The absolute path on the control node to the CA file to use for the browser facing Kibana certs. |
|
A browser-facing certificate for the Ops Kibana server. |
|
A key to be used with the browser-facing Ops Kibana certificate. |
|
The absolute path on the control node to the CA file to use for the browser facing ops Kibana certs. |
The EFK stack is deployed using an Ansible playbook to the EFK components. Run the playbook from the default OpenShift Ansible location using the default inventory file.
$ ansible-playbook [-i </path/to/inventory>] \ /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml
Running the playbook deploys all resources needed to support the stack; such as Secrets, ServiceAccounts, and DeploymentConfigs. The playbook waits to deploy the component pods until the stack is running. If the wait steps fail, the deployment could still be successful; it may be retrieving the component images from the registry which can take up to a few minutes. You can watch the process with:
$ oc get pods -w
They will eventually enter Running status. For additional details about the status of the pods during deployment by retrieving associated events:
$ oc describe pods/<pod_name>
Check the logs if the pods do not run successfully:
$ oc logs -f <pod_name>
This section describes adjustments that you can make to deployed components.
The logs for the default, openshift, and openshift-infra projects are automatically aggregated and grouped into the .operations item in the Kibana interface. The project where you have deployed the EFK stack (logging, as documented here) is not aggregated into .operations and is found under its ID. |
If you set openshift_logging_use_ops
to true in your inventory file, Fluentd is
configured to split logs between the main Elasticsearch cluster and another
cluster reserved for operations logs, which are defined as node system logs and
the projects default, openshift, and openshift-infra. Therefore, a
separate Elasticsearch cluster, a separate Kibana, and a separate Curator are
deployed to index, access, and manage operations logs. These deployments are set
apart with names that include -ops
. Keep these separate deployments in mind if
you enable this option. Most of the following discussion also applies to the
operations cluster if present, just with the names changed to include -ops
.
Elasticsearch (ES) is an object store where all logs are stored.
Elasticsearch organizes the log data into datastores, each called an index. Elasticsearch subdivides each index into multiple pieces called shards, which it spreads across a set of Elasticsearch nodes in your cluster. You can configure Elasticsearch to make copies of the shards, called replicas. Elasticsearch also spreads replicas across the Elactisearch nodes. The combination of shards and replicas is intended to provide redundancy and resilience to failure. For example, if you configure three shards for the index with one replica, Elasticsearch generates a total of six shards for that index: three primary shards and three replicas as a backup.
The OpenShift Container Platform logging installer ensures each Elasticsearch node is deployed using a unique deployment configuration that includes its own storage volume.
You can create an additional deployment configuration for each Elasticsearch node you add to the logging system.
During installation, you can use the openshift_logging_es_cluster_size
Ansible variable to specify the number of Elasticsearch nodes.
Alternatively, you can scale up your existing cluster by modifying the
openshift_logging_es_cluster_size
in the inventory file and re-running the
logging playbook. Additional clustering parameters can be modified and are
described in Specifying Logging Ansible Variables.
Refer to Elastic’s documentation for considerations involved in choosing storage and network location as directed below.
A highly-available Elasticsearch environment requires at least three Elasticsearch nodes,
each on a different host, and setting the |
Viewing all Elasticsearch Deployments
To view all current Elasticsearch deployments:
$ oc get dc --selector logging-infra=elasticsearch
Configuring Elasticsearch for High Availability
A highly-available Elasticsearch environment requires at least three Elasticsearch nodes,
each on a different host, and setting the openshift_logging_es_number_of_replicas
Ansible variable
to a value of 1
or higher to create replicas.
Use the following scenarios as a guide for an OpenShift Container Platform cluster with three Elasticsearch nodes:
If you can tolerate one Elasticsearch node going down,
set openshift_logging_es_number_of_replicas
to 1
. This ensures
that two nodes have a copy of all of the Elasticsearch data in the cluster.
If you must tolerate two Elasticsearch nodes going down,
set openshift_logging_es_number_of_replicas
to 2
. This ensures that
every node has a copy of all of the Elasticsearch data in the cluster.
Note that there is a trade-off between high availability and performance.
For example, having openshift_logging_es_number_of_replicas=2
and
openshift_logging_es_number_of_shards=3
requires Elasticsearch to spend
significant resources replicating the shard data among the nodes in the cluster.
Also, using a higher number of replicas requires doubling or tripling the data storage
requirements on each node, so you must take that into account when planning
persistent storage for Elasticsearch.
Considerations when Configuring the Number of Shards
For the openshift_logging_es_number_of_shards
parameter, consider:
For higher performance, increase the number of shards. For example, in a three
node cluster, set openshift_logging_es_number_of_shards=3
. This will cause
each index to be split into three parts (shards), and the load for processing the
index will be spread out over all 3 nodes.
If you have a large number of projects, you might see performance degradation if you have more than a few thousand shards in the cluster. Either reduce the number of shards or reduce the curation time.
If you have a small number of very large indices, you might want to configure
openshift_logging_es_number_of_shards=3
or higher. Elasticsearch recommends
using a maximum shard size of less than 50 GB.
Node Selector
Because Elasticsearch can use a lot of resources, all members of a cluster should have low latency network connections to each other and to any remote storage. Ensure this by directing the instances to dedicated nodes, or a dedicated region within your cluster, using a node selector.
To configure a node selector, specify the openshift_logging_es_nodeselector
configuration option in the inventory file. This applies to all Elasticsearch
deployments; if you need to individualize the node selectors, you must manually
edit each deployment configuration after deployment. The node selector is
specified as a python compatible dict. For example, {"node-type":"infra",
"region":"east"}
.
By default, the openshift_logging
Ansible role creates an ephemeral
deployment in which all data in a pod is lost upon pod restart.
For production environments, each Elasticsearch deployment configuration requires a persistent storage volume. You can specify an existing persistent volume claim or allow OpenShift Container Platform to create one.
Use existing PVCs. If you create your own PVCs for the deployment, OpenShift Container Platform uses those PVCs.
Name the PVCs to match the openshift_logging_es_pvc_prefix
setting, which defaults to
logging-es
. Assign each PVC a name with a sequence number added to it: logging-es-0
,
logging-es-1
, logging-es-2
, and so on.
Allow OpenShift Container Platform to create a PVC. If a PVC for Elsaticsearch does not exist, OpenShift Container Platform creates the PVC based on parameters in the Ansible inventory file.
Parameter | Description | ||
---|---|---|---|
|
Specify the size of the PVC request. |
||
|
Specify the storage type as
|
||
|
Optionally, specify a custom prefix for the PVC. |
For example:
openshift_logging_elasticsearch_storage_type=pvc
openshift_logging_es_pvc_size=104802308Ki
openshift_logging_es_pvc_prefix=es-logging
If using dynamically provisioned PVs, the OpenShift Container Platform logging installer creates PVCs that use the default storage class or the PVC specified with the openshift_logging_elasticsearch_pvc_storage_class_name
parameter.
If using NFS storage, the OpenShift Container Platform installer creates the persistent volumes, based on the openshift_logging_storage_*
parameters
and the OpenShift Container Platform logging installer creates PVCs, using the openshift_logging_es_pvc_*
paramters.
Make sure you specify the correct parameters in order to use persistent volumes with EFK.
Also set the openshift_enable_unsupported_configurations=true
parameter in the Ansible inventory file, as the logging installer blocks the installation of NFS with core infrastructure by default.
Using NFS storage as a volume or a persistent volume, or using NAS such as Gluster, is not supported for Elasticsearch storage, as Lucene relies on file system behavior that NFS does not supply. Data corruption and other problems can occur. |
If your environment requires NFS storage, use one of the following methods:
You can deploy NFS as an automatically provisioned persistent volume or using a predefined NFS volume.
For more information, see Sharing an NFS mount across two persistent volume claims to leverage shared storage for use by two separate containers.
Using automatically provisioned NFS
To use NFS as a persistent volume where NFS is automatically provisioned:
Add the following lines to the Ansible inventory file to create an NFS auto-provisioned storage class and dynamically provision the backing storage:
openshift_logging_es_pvc_storage_class_name=$nfsclass openshift_logging_es_pvc_dynamic=true
Use the following command to deploy the NFS volume using the logging playbook:
ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-logging/config.yml
Use the following steps to create a PVC:
Edit the Ansible inventory file to set the PVC size:
openshift_logging_es_pvc_size=50Gi
The logging playbook selects a volume based on size and might use an unexpected volume if any other persistent volume has same size. |
Use the following command to rerun the Ansible deploy_cluster.yml playbook:
ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml
The installer playbook creates the NFS volume based on the openshift_logging_storage
variables.
Using a predefined NFS volume
To deploy logging alongside the OpenShift Container Platform cluster using an existing NFS volume:
Edit the Ansible inventory file to configure the NFS volume and set the PVC size:
openshift_logging_storage_kind=nfs openshift_enable_unsupported_configurations=true openshift_logging_storage_access_modes=["ReadWriteOnce"] openshift_logging_storage_nfs_directory=/srv/nfs openshift_logging_storage_nfs_options=*(rw,root_squash) openshift_logging_storage_volume_name=logging openshift_logging_storage_volume_size=100Gi openshift_logging_storage_labels={:storage=>"logging"} openshift_logging_install_logging=true
Use the following command to redeploy the EFK stack:
ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml
You can allocate a large file on an NFS server and mount the file to the nodes. You can then use the file as a host path device.
$ mount -F nfs nfserver:/nfs/storage/elasticsearch-1 /usr/local/es-storage $ chown 1000:1000 /usr/local/es-storage
Then, use /usr/local/es-storage as a host-mount as described below. Use a different backing file as storage for each Elasticsearch replica.
This loopback must be maintained manually outside of OpenShift Container Platform, on the node. You must not maintain it from inside a container.
It is possible to use a local disk volume (if available) on each node host as storage for an Elasticsearch replica. Doing so requires some preparation as follows.
The relevant service account must be given the privilege to mount and edit a local volume:
$ oc adm policy add-scc-to-user privileged \ system:serviceaccount:logging:aggregated-logging-elasticsearch (1)
1 | Use the project you created earlier (for example, logging) when running the logging playbook. |
Each Elasticsearch replica definition must be patched to claim that privilege, for example (change to --selector component=es-ops
for Ops cluster):
$ for dc in $(oc get deploymentconfig --selector component=es -o name); do oc scale $dc --replicas=0 oc patch $dc \ -p '{"spec":{"template":{"spec":{"containers":[{"name":"elasticsearch","securityContext":{"privileged": true}}]}}}}' done
The Elasticsearch replicas must be located on the correct nodes to use the local storage, and should not move around even if those nodes are taken down for a period of time. This requires giving each Elasticsearch replica a node selector that is unique to a node where an administrator has allocated storage for it. To configure a node selector, edit each Elasticsearch deployment configuration and add or edit the nodeSelector section to specify a unique label that you have applied for each desired node:
apiVersion: v1 kind: DeploymentConfig spec: template: spec: nodeSelector: logging-es-node: "1" (1)
1 | This label should uniquely identify a replica with a single node that bears that
label, in this case logging-es-node=1 . Use the oc label command to apply
labels to nodes as needed. |
To automate applying the node selector you can instead use the oc patch
command:
$ oc patch dc/logging-es-<suffix> \ -p '{"spec":{"template":{"spec":{"nodeSelector":{"logging-es-node":"1"}}}}}'
Once these steps are taken, a local host mount can be applied to each replica as in this example
(where we assume storage is mounted at the same path on each node) (change to --selector component=es-ops
for Ops cluster):
$ for dc in $(oc get deploymentconfig --selector component=es -o name); do oc set volume $dc \ --add --overwrite --name=elasticsearch-storage \ --type=hostPath --path=/usr/local/es-storage oc rollout latest $dc oc scale $dc --replicas=1 done
If you need to scale up the number of Elasticsearch nodes in your cluster, you can create a deployment configuration for each Elasticsearch node you want to add.
Due to the nature of persistent volumes and how Elasticsearch is configured to store its data and recover the cluster, you cannot simply increase the replicas in an Elasticsearch deployment configuration.
The simplest way to change the scale of Elasticsearch is to modify the inventory host file and re-run the logging playbook as described previously. If you have supplied persistent storage for the deployment, this should not be disruptive.
Resizing an Elasticsearch cluster using the logging playbook is only possible when
the new |
By default, Elasticsearch deployed with OpenShift aggregated logging is not accessible from outside the logging cluster. You can enable a route for external access to Elasticsearch for those tools that want to access its data.
You have access to Elasticsearch using your OpenShift token, and you can provide the external Elasticsearch and Elasticsearch Ops hostnames when creating the server certificate (similar to Kibana).
To access Elasticsearch as a reencrypt route, define the following variables:
openshift_logging_es_allow_external=True openshift_logging_es_hostname=elasticsearch.example.com
Run the openshift-logging.yml
Ansible playbook:
$ ansible-playbook [-i </path/to/inventory>] \ /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml
To log in to Elasticsearch remotely, the request must contain three HTTP headers:
Authorization: Bearer $token X-Proxy-Remote-User: $username X-Forwarded-For: $ip_address
You must have access to the project in order to be able to access to the logs. For example:
$ oc login <user1> $ oc new-project <user1project> $ oc new-app <httpd-example>
You need to get the token of this ServiceAccount to be used in the request:
$ token=$(oc whoami -t)
Using the token previously configured, you should be able access Elasticsearch through the exposed route:
$ curl -k -H "Authorization: Bearer $token" -H "X-Proxy-Remote-User: $(oc whoami)" -H "X-Forwarded-For: 127.0.0.1" https://es.example.test/_cat/indices
Fluentd is deployed as a DaemonSet that deploys replicas according to a node
label selector, which you can specify with the inventory parameter
openshift_logging_fluentd_nodeselector
and the default is logging-infra-fluentd
.
As part of the OpenShift cluster installation, it is recommended that you add the
Fluentd node selector to the list of persisted
node labels.
Fluentd uses journald
as the system log source. These are log
messages from the operating system, Docker, and OpenShift. For
container logs, Fluentd determines which log driver Docker is using,
json-file
or journald
, and automatically reads the logs from that source.
As of OpenShift Container Platform 3.3, Fluentd no longer reads historical log files when using the JSON file log driver. In situations where clusters have a large number of log files and are older than the EFK deployment, this avoids delays when pushing the most recent logs into Elasticsearch. Curator deleting logs are migrated soon after they are added to Elasticsearch. |
It may require several minutes, or hours, depending on the size of your
journal, before any new log entries are available in Elasticsearch, when using
|
It is highly recommended that you use the default value for See Updating Fluentd’s Log Source After a Docker Log Driver Update for more information. |
Configuring Fluentd to Send Logs to an External Log Aggregator
You can configure Fluentd to send a copy of its logs to an external log
aggregator, and not the default Elasticsearch, using the secure-forward
plug-in. From there, you can further process log records after the locally
hosted Fluentd has processed them.
The logging deployment provides a secure-forward.conf
section in the Fluentd configmap
for configuring the external aggregator:
<store> @type secure_forward self_hostname pod-${HOSTNAME} shared_key thisisasharedkey secure yes enable_strict_verification yes ca_cert_path /etc/fluent/keys/your_ca_cert ca_private_key_path /etc/fluent/keys/your_private_key ca_private_key_passphrase passphrase <server> host ose1.example.com port 24284 </server> <server> host ose2.example.com port 24284 standby </server> <server> host ose3.example.com port 24284 standby </server> </store>
This can be updated using the oc edit
command:
$ oc edit configmap/logging-fluentd
Certificates to be used in secure-forward.conf
can be added to the existing
secret that is mounted on the Fluentd pods. The your_ca_cert
and
your_private_key
values must match what is specified in secure-forward.conf
in configmap/logging-fluentd
:
$ oc patch secrets/logging-fluentd --type=json \ --patch "[{'op':'add','path':'/data/your_ca_cert','value':'$(base64 /path/to/your_ca_cert.pem)'}]" $ oc patch secrets/logging-fluentd --type=json \ --patch "[{'op':'add','path':'/data/your_private_key','value':'$(base64 /path/to/your_private_key.pem)'}]"
Replace |
When configuring the external aggregator, it must be able to accept messages securely from Fluentd.
If the external aggregator is another Fluentd server, it must have the
fluent-plugin-secure-forward
plug-in installed and make use of the input
plug-in it provides:
<source> @type secure_forward self_hostname ${HOSTNAME} bind 0.0.0.0 port 24284 shared_key thisisasharedkey secure yes cert_path /path/for/certificate/cert.pem private_key_path /path/for/certificate/key.pem private_key_passphrase secret_foo_bar_baz </source>
Further explanation of how to set up the fluent-plugin-secure-forward
plug-in
can be found
here.
Reducing the Number of Connections from Fluentd to the API Server
With mux
, you can deploy N
number of mux
services, where N
is fewer than
the number of nodes. Each Fluentd is configured with USE_MUX_CLIENT=1
. This
tells Fluentd to send the raw logs to mux
with no filtering and no Kubernetes
metadata filtering, which involves connections to the API server. You can
perform all of the processing and Kubernetes metadata filtering with mux
.
The For more information on Red Hat Technology Preview features support scope, see https://access.redhat.com/support/offerings/techpreview/. |
Parameter | Description |
---|---|
|
The default is set to |
|
The default is set to |
|
The default is set to |
|
The default is |
|
24284 |
|
500M |
|
1Gi |
|
The default is |
|
The default value is empty, allowing for additional namespaces to create for
external |
Throttling logs in Fluentd
For projects that are especially verbose, an administrator can throttle down the rate at which the logs are read in by Fluentd before being processed.
Throttling can contribute to log aggregation falling behind for the configured projects; log entries can be lost if a pod is deleted before Fluentd catches up. |
Throttling does not work when using the systemd journal as the log source. The throttling implementation depends on being able to throttle the reading of the individual log files for each project. When reading from the journal, there is only a single log source, no log files, so no file-based throttling is available. There is not a method of restricting the log entries that are read into the Fluentd process. |
To tell Fluentd which projects it should be restricting, edit the throttle configuration in its ConfigMap after deployment:
$ oc edit configmap/logging-fluentd
The format of the throttle-config.yaml key is a YAML file that contains project names and the desired rate at which logs are read in on each node. The default is 1000 lines at a time per node. For example:
logging: read_lines_limit: 500 test-project: read_lines_limit: 10 .operations: read_lines_limit: 100
When you make changes to any part of the EFK stack, specifically Elasticsearch or Fluentd, you should first scale Elasticsearch down to zero and scale Fluentd so it does not match any other nodes. Then, make the changes and scale Elasticsearch and Fluentd back.
To scale Elasticsearch to zero:
$ oc scale --replicas=0 dc/<ELASTICSEARCH_DC>
Change nodeSelector in the daemonset configuration to match zero:
$ oc get ds logging-fluentd -o yaml |grep -A 1 Selector nodeSelector: logging-infra-fluentd: "true"
oc patch
command to modify the daemonset nodeSelector:$ oc patch ds logging-fluentd -p '{"spec":{"template":{"spec":{"nodeSelector":{"nonexistlabel":"true"}}}}}'
$ oc get ds logging-fluentd -o yaml |grep -A 1 Selector nodeSelector: "nonexistlabel: "true"
Scale Elasticsearch back up from zero:
$ oc scale --replicas=# dc/<ELASTICSEARCH_DC>
Change nodeSelector in the daemonset configuration back to logging-infra-fluentd: "true".
Use the oc patch
command to modify the daemonset nodeSelector:
oc patch ds logging-fluentd -p '{"spec":{"template":{"spec":{"nodeSelector":{"logging-infra-fluentd":"true"}}}}}'
To access the Kibana console from the OpenShift Container Platform web console, add the
loggingPublicURL
parameter in the /etc/origin/master/master-config.yaml
file, with the URL of the Kibana console (the kibana-hostname
parameter).
The value must be an HTTPS URL:
... assetConfig: ... loggingPublicURL: "https://kibana.example.com" ...
Setting the loggingPublicURL
parameter creates a View Archive button on the
OpenShift Container Platform web console under the Browse → Pods → <pod_name> →
Logs tab. This links to the Kibana console.
You can scale the Kibana deployment as usual for redundancy:
$ oc scale dc/logging-kibana --replicas=2
To ensure the scale persists across multiple executions of the logging playbook,
make sure to update the |
You can see the user interface by visiting the site specified by the
openshift_logging_kibana_hostname
variable.
See the Kibana documentation for more information on Kibana.
Kibana Visualize
Kibana Visualize enables you to create visualizations and dashboards for
monitoring container and pod logs allows administrator users (cluster-admin
or
cluster-reader
) to view logs by deployment, namespace, pod, and container.
Kibana Visualize exists inside the Elasticsearch and ES-OPS pod, and must be run inside those pods. To load dashboards and other Kibana UI objects, you must first log into Kibana as the user you want to add the dashboards to, then log out. This will create the necessary per-user configuration that the next step relies on. Then, run:
$ oc exec <$espod> -- es_load_kibana_ui_objects <user-name>
Where $espod
is the name of any one of your Elasticsearch pods.
Curator allows administrators to configure scheduled Elasticsearch maintenance operations to be performed automatically on a per-project basis. It is scheduled to perform actions daily based on its configuration. Only one Curator pod is recommended per Elasticsearch cluster. Curator is configured via a YAML configuration file with the following structure:
$PROJECT_NAME: $ACTION: $UNIT: $VALUE $PROJECT_NAME: $ACTION: $UNIT: $VALUE ...
The available parameters are:
Variable Name | Description |
---|---|
|
The actual name of a project, such as myapp-devel. For OpenShift Container Platform operations
logs, use the name |
|
The action to take, currently only |
|
One of |
|
An integer for the number of units. |
|
Use |
|
(Number) the hour of the day in 24-hour format at which to run the Curator jobs. For
use with |
|
(Number) the minute of the hour at which to run the Curator jobs. For use with |
For example, to configure Curator to:
delete indices in the myapp-dev project older than 1 day
delete indices in the myapp-qe project older than 1 week
delete operations logs older than 8 weeks
delete all other projects indices after they are 30 days
old
run the Curator jobs at midnight every day
Use:
myapp-dev: delete: days: 1 myapp-qe: delete: weeks: 1 .operations: delete: weeks: 8 .defaults: delete: days: 30 runhour: 0 runminute: 0
When you use |
The openshift_logging
Ansible role provides a ConfigMap from which Curator
reads its configuration. You may edit or replace this ConfigMap to reconfigure
Curator. Currently the logging-curator
ConfigMap is used to configure both
your ops and non-ops Curator instances. Any .operations
configurations are
in the same location as your application logs configurations.
To edit the provided ConfigMap to configure your Curator instances:
$ oc edit configmap/logging-curator
To replace the provided ConfigMap instead:
$ create /path/to/mycuratorconfig.yaml $ oc create configmap logging-curator -o yaml \ --from-file=config.yaml=/path/to/mycuratorconfig.yaml | \ oc replace -f -
After you make your changes, redeploy Curator:
$ oc rollout latest dc/logging-curator $ oc rollout latest dc/logging-curator-ops
Remove everything generated during the deployment.
$ ansible-playbook [-i </path/to/inventory>] \ /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml \ -e openshift_logging_install_logging=False
Using the Kibana console with OpenShift Container Platform can cause problems that are easily solved, but are not accompanied with useful error messages. Check the following troubleshooting sections if you are experiencing any problems when deploying Kibana on OpenShift Container Platform:
Login Loop
The OAuth2 proxy on the Kibana console must share a secret with the master host’s OAuth2 server. If the secret is not identical on both servers, it can cause a login loop where you are continuously redirected back to the Kibana login page.
To fix this issue, delete the current OAuthClient, and use openshift-ansible
to re-run the openshift_logging
role:
$ oc delete oauthclient/kibana-proxy $ ansible-playbook [-i </path/to/inventory>] \ /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml
Cryptic Error When Viewing the Console
When attempting to visit the Kibana console, you may receive a browser error instead:
{"error":"invalid_request","error_description":"The request is missing a required parameter, includes an invalid parameter value, includes a parameter more than once, or is otherwise malformed."}
This can be caused by a mismatch between the OAuth2 client and server. The return address for the client must be in a whitelist so the server can securely redirect back after logging in.
Fix this issue by replacing the OAuthClient entry:
$ oc delete oauthclient/kibana-proxy $ ansible-playbook [-i </path/to/inventory>] \ /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml
If the problem persists, check that you are accessing Kibana at a URL listed in the OAuth client. This issue can be caused by accessing the URL at a forwarded port, such as 1443 instead of the standard 443 HTTPS port. You can adjust the server whitelist by editing the OAuth client:
$ oc edit oauthclient/kibana-proxy
503 Error When Viewing the Console
If you receive a proxy error when viewing the Kibana console, it could be caused by one of two issues.
First, Kibana may not be recognizing pods. If Elasticsearch is slow in starting up, Kibana may timeout trying to reach it. Check whether the relevant service has any endpoints:
$ oc describe service logging-kibana Name: logging-kibana [...] Endpoints: <none>
If any Kibana pods are live, endpoints are listed. If they are not, check the state of the Kibana pods and deployment. You may need to scale the deployment down and back up again.
The second possible issue may be caused if the route for accessing the Kibana service is masked. This can happen if you perform a test deployment in one project, then deploy in a different project without completely removing the first deployment. When multiple routes are sent to the same destination, the default router will only route to the first created. Check the problematic route to see if it is defined in multiple places:
$ oc get route --all-namespaces --selector logging-infra=support
F-5 Load Balancer and X-Forwarded-For Enabled
If you are attempting to use a F-5 load balancer in front of Kibana with
X-Forwarded-For
enabled, this can cause an issue in which the Elasticsearch
Searchguard
plug-in is unable to correctly accept connections from Kibana.
Kibana: Unknown error while connecting to Elasticsearch Error: Unknown error while connecting to Elasticsearch Error: UnknownHostException[No trusted proxies]
To configure Searchguard to ignore the extra header:
Scale down all Fluentd pods.
Scale down Elasticsearch after the Fluentd pods have terminated.
Add searchguard.http.xforwardedfor.header: DUMMY
to the Elasticsearch
configuration section.
$ oc edit configmap/logging-elasticsearch (1)
1 | This approach requires that Elasticsearch’s configurations are within a ConfigMap. |
Scale Elasticsearch back up.
Scale up all Fluentd pods.
Fluentd sends logs to the value of the ES_HOST
, ES_PORT
, OPS_HOST
,
and OPS_PORT
environment variables of the Elasticsearch deployment
configuration. The application logs are directed to the ES_HOST
destination,
and operations logs to OPS_HOST
.
Sending logs directly to an AWS Elasticsearch instance is not supported. Use
Fluentd Secure Forward to direct logs to
an instance of Fluentd that you control and that is configured with the
|
To direct logs to a specific Elasticsearch instance, edit the deployment configuration and replace the value of the above variables with the desired instance:
$ oc edit dc/<deployment_configuration>
For an external Elasticsearch instance to contain both application and
operations logs, you can set ES_HOST
and OPS_HOST
to the same destination,
while ensuring that ES_PORT
and OPS_PORT
also have the same value.
If your externally hosted Elasticsearch instance does not use TLS, update the
_CLIENT_CERT
, _CLIENT_KEY
, and _CA
variables to be empty. If it does
use TLS, but not mutual TLS, update the _CLIENT_CERT
and _CLIENT_KEY
variables to be empty and patch or recreate the logging-fluentd secret with
the appropriate _CA
value for communicating with your Elasticsearch instance.
If it uses Mutual TLS as the provided Elasticsearch instance does, patch or
recreate the logging-fluentd secret with your client key, client cert, and CA.
If you are not using the provided Kibana and Elasticsearch images, you will not have the same multi-tenant capabilities and your data will not be restricted by user access to a particular project. |
Use the fluent-plugin-remote-syslog
plug-in on the host to send logs to an
external syslog server.
Set environment variables in the logging-fluentd
or logging-mux
deployment
configurations:
- name: REMOTE_SYSLOG_HOST (1)
value: host1
- name: REMOTE_SYSLOG_HOST_BACKUP
value: host2
- name: REMOTE_SYSLOG_PORT_BACKUP
value: 5555
1 | The desired remote syslog host. Required for each host. |
This will build two destinations. The syslog server on host1
will be
receiving messages on the default port of 514
, while host2
will be receiving
the same messages on port 5555
.
Alternatively, you can configure your own custom fluent.conf in the
logging-fluentd
or logging-mux
ConfigMaps.
Fluentd Environment Variables
Parameter | Description |
---|---|
|
Defaults to |
|
(Required) Hostname or IP address of the remote syslog server. |
|
Port number to connect on. Defaults to |
|
Set the syslog severity level. Defaults to |
|
Set the syslog facility. Defaults to |
|
Defaults to |
|
Removes the prefix from the tag, defaults to |
|
If specified, uses this field as the key to look on the record, to set the tag on the syslog message. |
|
If specified, uses this field as the key to look on the record, to set the payload on the syslog message. |
This implementation is insecure, and should only be used in environments where you can guarantee no snooping on the connection. |
Fluentd Logging Ansible Variables
Parameter | Description |
---|---|
|
The default is set to |
|
Hostname or IP address of the remote syslog server, this is mandatory. |
|
Port number to connect on, defaults to |
|
Set the syslog severity level, defaults to |
|
Set the syslog facility, defaults to |
|
The default is set to |
|
Removes the prefix from the tag, defaults to |
|
If string is specified, uses this field as the key to look on the record, to set the tag on the syslog message. |
|
If string is specified, uses this field as the key to look on the record, to set the payload on the syslog message. |
Mux Logging Ansible Variables
Parameter | Description |
---|---|
|
The default is set to |
|
Hostname or IP address of the remote syslog server, this is mandatory. |
|
Port number to connect on, defaults to |
|
Set the syslog severity level, defaults to |
|
Set the syslog facility, defaults to |
|
The default is set to |
|
Removes the prefix from the tag, defaults to |
|
If string is specified, uses this field as the key to look on the record, to set the tag on the syslog message. |
|
If string is specified, uses this field as the key to look on the record, to set the payload on the syslog message. |
As of logging version 3.2.0, an administrator certificate, key, and CA that can be used to communicate with and perform administrative operations on Elasticsearch are provided within the logging-elasticsearch secret.
To confirm whether or not your EFK installation provides these, run: $ oc describe secret logging-elasticsearch |
If they are not available, refer to Manual Upgrades to ensure you are on the latest version first.
Connect to an Elasticsearch pod that is in the cluster on which you are attempting to perform maintenance.
To find a pod in a cluster use either:
$ oc get pods -l component=es -o name | head -1 $ oc get pods -l component=es-ops -o name | head -1
Connect to a pod:
$ oc rsh <your_Elasticsearch_pod>
Once connected to an Elasticsearch container, you can use the certificates mounted from the secret to communicate with Elasticsearch per its Indices APIs documentation.
Fluentd sends its logs to Elasticsearch using the index format project.{project_name}.{project_uuid}.YYYY.MM.DD where YYYY.MM.DD is the date of the log record.
For example, to delete all logs for the logging project with uuid 3b3594fa-2ccd-11e6-acb7-0eb6b35eaee3 from June 15, 2016, we can run:
$ curl --key /etc/elasticsearch/secret/admin-key \ --cert /etc/elasticsearch/secret/admin-cert \ --cacert /etc/elasticsearch/secret/admin-ca -XDELETE \ "https://localhost:9200/project.logging.3b3594fa-2ccd-11e6-acb7-0eb6b35eaee3.2016.06.15"
By default, aggregated logging uses the journald
log driver
unless json-file
was specified during installation. You can change the log driver between journald
and json-file
as needed.
When using the |
Fluentd determines the driver Docker is using by checking the /etc/docker/daemon.json and /etc/sysconfig/docker files.
You can determine which driver Docker is using with the docker info
command:
# docker info | grep Logging Logging Driver: journald
To change between json-file
and journald
after installation:
Modify either the /etc/sysconfig/docker or /etc/docker/daemon.json files.
For example:
# cat /etc/sysconfig/docker
OPTIONS=' --selinux-enabled --log-driver=json-file --log-opt max-size=1M --log-opt max-file=3 --signature-verification=False'
cat /etc/docker/daemon.json
{
"log-driver": "json-file",
"log-opts": {
"max-size": "1M",
"max-file": "1"
}
}
Restart the Docker service:
systemctl restart docker
Update the Fluentd log source.
If the Docker log driver has changed from json-file
to journald
and Fluentd
was previously configured with USE_JOURNAL=False
, then it will not be able to
pick up any new logs that are created. When the Fluentd daemonset is configured
with the default value for USE_JOURNAL
, then it will detect the Docker log
driver upon pod start-up, and configure itself to pull from the appropriate source.
To update Fluentd to detect the correct source upon start-up:
Remove the label from nodes where Fluentd is deployed:
$ oc label node --all logging-infra-fluentd- (1)
1 | This example assumes use of the default Fluentd node selector and it being deployed on all nodes. |
Update the daemonset/logging-fluentd
USE_JOURNAL
value to be empty:
$ oc patch daemonset/logging-fluentd \ -p '{"spec":{"template":{"spec":{"containers":[{"name":"fluentd-elasticsearch","env":[{"name": "USE_JOURNAL", "value":""}]}]}}}}'
Relabel your nodes to schedule Fluentd deployments:
$ oc label node --all logging-infra-fluentd=true (1)
1 | This example assumes use of the default Fluentd node selector and it being deployed on all nodes. |
As of OpenShift Container Platform 3.7 the Aggregated Logging stack updated the Elasticsearch
Deployment Config object so that it no longer has a Config Change Trigger, meaning
any changes to the dc
will not result in an automatic rollout. This was to prevent
unintended restarts happening in the Elasticsearch cluster, which could create excessive shard
rebalancing as cluster members restart.
This section presents two restart procedures: rolling-restart and full-restart. Where a rolling restart applies appropriate changes to the Elasticsearch cluster without down time (provided three masters are configured) and a full restart safely applies major changes without risk to existing data.
A rolling restart is recommended, when any of the following changes are made:
nodes on which Elasticsearch pods run require a reboot
logging-elasticsearch configmap
logging-es-* deployment configuration
new image deployment, or upgrade
This will be the recommended restart policy going forward.
Any action you do for an Elasticsearch cluster will need to be repeated for the ops cluster
if |
Prevent shard balancing when purposely bringing down nodes:
$ oc exec -c elasticsearch <any_es_pod_in_the_cluster> -- \ curl -s \ --cacert /etc/elasticsearch/secret/admin-ca \ --cert /etc/elasticsearch/secret/admin-cert \ --key /etc/elasticsearch/secret/admin-key \ -XPUT 'https://localhost:9200/_cluster/settings' \ -d '{ "transient": { "cluster.routing.allocation.enable" : "none" } }'
Once complete, for each dc
you have for an Elasticsearch cluster, run oc rollout latest
to deploy the latest version of the dc
object:
$ oc rollout latest <dc_name>
You will see a new pod deployed. Once the pod has two ready containers, you can
move on to the next dc
.
Once all `dc`s for the cluster have been rolled out, re-enable shard balancing:
$ oc exec -c elasticsearch <any_es_pod_in_the_cluster> -- \ curl -s \ --cacert /etc/elasticsearch/secret/admin-ca \ --cert /etc/elasticsearch/secret/admin-cert \ --key /etc/elasticsearch/secret/admin-key \ -XPUT 'https://localhost:9200/_cluster/settings' \ -d '{ "transient": { "cluster.routing.allocation.enable" : "all" } }'
A full restart is recommended when changing major versions of Elasticsearch or other changes which might put data integrity a risk during the change process.
Any action you do for an Elasticsearch cluster will need to be repeated for the ops cluster
if |
When making changes to the |
Disable all external communications to the Elasticsearch cluster while it is down. Edit
your non-cluster logging service (for example, logging-es
, logging-es-ops
)
to no longer match the Elasticsearch pods running:
$ oc patch svc/logging-es -p '{"spec":{"selector":{"component":"es-blocked","provider":"openshift"}}}'
Prevent shard balancing when purposely bringing down nodes:
$ oc exec -c elasticsearch <any_es_pod_in_the_cluster> -- \ curl -s \ --cacert /etc/elasticsearch/secret/admin-ca \ --cert /etc/elasticsearch/secret/admin-cert \ --key /etc/elasticsearch/secret/admin-key \ -XPUT 'https://localhost:9200/_cluster/settings' \ -d '{ "transient": { "cluster.routing.allocation.enable" : "none" } }'
Perform a shard synced flush to ensure there are no pending operations waiting to be written to disk prior to shutting down:
$ oc exec -c elasticsearch <any_es_pod_in_the_cluster> -- \ curl -s \ --cacert /etc/elasticsearch/secret/admin-ca \ --cert /etc/elasticsearch/secret/admin-cert \ --key /etc/elasticsearch/secret/admin-key \ -XPUT 'https://localhost:9200/_flush/synced'
Once complete, for each dc
you have for an ES cluster, run oc rollout latest
to deploy the latest version of the dc
object:
$ oc rollout latest <dc_name>
You will see a new pod deployed. Once the pod has two ready containers, you can
move on to the next dc
.
Once all DCs for the cluster have been rolled out, re-enable shard balancing:
$ oc exec -c elasticsearch <any_es_pod_in_the_cluster> -- curl -s --cacert /etc/elasticsearch/secret/admin-ca \ --cert /etc/elasticsearch/secret/admin-cert \ --key /etc/elasticsearch/secret/admin-key \ -XPUT 'https://localhost:9200/_cluster/settings' \ -d '{ "transient": { "cluster.routing.allocation.enable" : "all" } }'
Once the restart is complete, enable all external communications to the ES
cluster. Edit your non-cluster logging service (for example, logging-es
,
logging-es-ops
) to match the Elasticsearch pods running again:
$ oc patch svc/logging-es -p '{"spec":{"selector":{"component":"es","provider":"openshift"}}}'