$ oc create -n openshift -f \ /usr/share/openshift/examples/infrastructure-templates/enterprise/logging-deployer.yaml
As an OpenShift Enterprise cluster administrator, you can deploy the EFK stack to aggregate logs for a range of OpenShift Enterprise services. Application developers can view the logs of the projects for which they have view access. The EFK stack aggregates logs from hosts and applications, whether coming from multiple containers or even deleted pods.
The EFK stack is a modified version of the ELK stack and is comprised of:
Elasticsearch: An object store where all logs are stored.
Fluentd: Gathers logs from nodes and feeds them to Elasticsearch.
Kibana: A web UI for Elasticsearch.
Once deployed in a cluster, the stack aggregates logs from all nodes and projects into Elasticsearch, and provides a Kibana UI to view any logs. Cluster administrators can view all logs, but application developers can only view logs for projects they have permission to view. The stack components communicate securely.
Managing
Docker Container Logs discusses the use of |
Ensure that you have deployed a router for the cluster.
Ensure that you have the necessary storage for Elasticsearch. Note that each Elasticsearch replica requires its own storage volume. See Elasticsearch for more information.
Ansible-based installs should create the logging-deployer-template template in the openshift project. Otherwise you can create it with the following command:
$ oc create -n openshift -f \ /usr/share/openshift/examples/infrastructure-templates/enterprise/logging-deployer.yaml
Create a new project. Once implemented in a single project, the EFK stack collects logs for every project within your OpenShift Enterprise cluster. The examples in this topic use logging as an example project:
$ oadm new-project logging --node-selector="" $ oc project logging
Specifying a non-empty node selector on the project is not recommended, as this would restrict where Fluentd can be deployed. Instead, specify node selectors for the deployer to be applied to your other deployment configurations. |
Create a secret to provide security-related files to the deployer. Providing the secret is optional, and the objects will be randomly generated if not supplied.
You can supply the following files when creating a new secret:
File Name | Description |
---|---|
kibana.crt |
A browser-facing certificate for the Kibana server. |
kibana.key |
A key to be used with the Kibana certificate. |
kibana-ops.crt |
A browser-facing certificate for the Ops Kibana server. |
kibana-ops.key |
A key to be used with the Ops Kibana certificate. |
server-tls.json |
JSON TLS options to override the Kibana server defaults. Refer to Node.JS docs for available options. |
ca.crt |
A certificate for a CA that will be used to sign all certificates generated by the deployer. |
ca.key |
A matching CA key. |
For example:
$ oc secrets new logging-deployer \ kibana.crt=/path/to/cert kibana.key=/path/to/key
If a certificate file is not passed as a secret, the deployer will generate a self-signed certificate instead. However, a secret is still required for the deployer to run. In this case, you can create a "dummy" secret that does not specify a certificate value:
$ oc secrets new logging-deployer nothing=/dev/null
Create the deployer service account:
$ oc create -f - <<API apiVersion: v1 kind: ServiceAccount metadata: name: logging-deployer secrets: - name: logging-deployer API
Enable the Fluentd service account, which the deployer will create, that requires special privileges to operate Fluentd. Add the service account user to the security context:
$ oadm policy add-scc-to-user \ privileged system:serviceaccount:logging:aggregated-logging-fluentd (1)
1 | Use the new project you created earlier (e.g., logging) when specifying this service account. |
Give the Fluentd service account permission to read labels from all pods:
$ oadm policy add-cluster-role-to-user cluster-reader \ system:serviceaccount:logging:aggregated-logging-fluentd (1)
1 | Use the new project you created earlier (e.g., logging) when specifying this service account. |
The EFK stack is deployed using a template.
Run the deployer, specifying at least the parameters in the following example (more are described in the table below):
$ oc new-app logging-deployer-template \ --param KIBANA_HOSTNAME=kibana.example.com \ --param ES_CLUSTER_SIZE=1 \ --param PUBLIC_MASTER_URL=https://localhost:8443
Be sure to replace at least KIBANA_HOSTNAME
and PUBLIC_MASTER_URL
with
values relevant to your deployment.
The available parameters are:
Variable Name | Description |
---|---|
|
(Required with the |
|
If set to |
|
(Required with the |
|
(Required with the |
|
Amount of RAM to reserve per Elasticsearch instance. The default is 8G (for 8GB), and it must be at least 512M. Possible suffixes are G,g,M,m. |
|
The quorum required to elect a new master. Should be more than half the intended cluster size. |
|
When restarting the cluster, require this many nodes to be present before starting recovery. Defaults to one less than the cluster size to allow for one missing node. |
|
When restarting the cluster, wait for this number of nodes to be present before starting recovery. By default, the same as the cluster size. |
|
When restarting the cluster, this is a timeout for waiting for the expected number of nodes to be present. Defaults to "5m". |
|
The prefix for logging component images. For example, setting the prefix to registry.access.redhat.com/openshift3/ose- creates registry.access.redhat.com/openshift3/ose-logging-deployer:latest. |
|
The version for logging component images. For example, setting the version to v3.2 creates registry.access.redhat.com/openshift3/ose-logging-deployer:v3.2. |
Running the deployer creates a deployer pod and prints its name.
Wait until the pod is running. It may take several minutes for OpenShift Enterprise to retrieve the deployer image from the registry.
The logs for the openshift and openshift-infra projects are automatically aggregated and grouped into the .operations item in the Kibana interface. The project where you have deployed the EFK stack (logging, as documented here) is not aggregated into .operations and is found under its ID. |
You can watch its progress with:
$ oc get pod/<pod_name> -w
If it seems to be taking too long to start, you can retrieve more details about the pod and any associated events with:
$ oc describe pod/<pod_name>
When it runs, you can check the logs of the resulting pod to see if the deployment was successful:
$ oc logs -f <pod_name>
As a cluster administrator, deploy the logging-support-template
template
that the deployer created:
$ oc new-app logging-support-template
Deployment of logging components should begin automatically. However, because deployment is triggered based on tags being imported into the ImageStreams created in this step, and not all tags are automatically imported, this mechanism has become unreliable as multiple versions are released. Therefore, manual importing may be necessary as follows. For each ImageStream $ oc import-image <name>:<version> --from <prefix><name>:<tag> For example: $ oc import-image logging-auth-proxy:3.2.0 \ --from registry.access.redhat.com/openshift3/logging-auth-proxy:3.2.0 $ oc import-image logging-kibana:3.2.0 \ --from registry.access.redhat.com/openshift3/logging-kibana:3.2.0 $ oc import-image logging-elasticsearch:3.2.0 \ --from registry.access.redhat.com/openshift3/logging-elasticsearch:3.2.0 $ oc import-image logging-fluentd:3.2.0 \ --from registry.access.redhat.com/openshift3/logging-fluentd:3.2.0 |
A highly-available environment requires at least three replicas of Elasticsearch; each on a different host. Elasticsearch replicas require their own storage, but an OpenShift Enterprise deployment configuration shares storage volumes between all its pods. So, when scaled up, the EFK deployer ensures each replica of Elasticsearch has its own deployment configuration.
Viewing all Elasticsearch Deployments
To view all current Elasticsearch deployments:
$ oc get dc --selector logging-infra=elasticsearch
Persistent Elasticsearch Storage
The deployer creates an ephemeral deployment in which all of a pod’s data is lost upon restart. For production usage, add a persistent storage volume to each Elasticsearch deployment configuration.
The best-performing volumes are local disks, if it is possible to use them. Doing so requires some preparation as follows.
The relevant service account must be given the privilege to mount and edit a local volume, as follows:
$ oadm policy add-scc-to-user privileged \ system:serviceaccount:logging:aggregated-logging-elasticsearch (1)
1 | Use the new project you created earlier (e.g., logging) when specifying this service account. |
Each Elasticsearch replica definition must be patched to claim that privilege, for example:
$ for dc in $(oc get deploymentconfig --selector logging-infra=elasticsearch -o name); do oc scale $dc --replicas=0 oc patch $dc \ -p '{"spec":{"template":{"spec":{"containers":[{"name":"elasticsearch","securityContext":{"privileged": true}}]}}}}' done
The Elasticsearch pods must be located on the correct nodes to use the local storage, and should not move around even if those nodes are taken down for a period of time. This requires giving each Elasticsearch replica a node selector that is unique to the node where an administrator has allocated storage for it. See below for directions on setting a node selector.
Once these steps are taken, a local host mount can be applied to each replica as in this example (where we assume storage is mounted at the same path on each node):
$ for dc in $(oc get deploymentconfig --selector logging-infra=elasticsearch -o name); do oc set volume $dc \ --add --overwrite --name=elasticsearch-storage \ --type=hostPath --path=/usr/local/es-storage oc scale $dc --replicas=1 done
If using host mounts is impractical or undesirable, it may be necessary to attach block storage as a PersistentVolumeClaim as in the following example:
$ oc set volume dc/logging-es-<unique> \ --add --overwrite --name=elasticsearch-storage \ --type=persistentVolumeClaim --claim-name=logging-es-1
Using NFS storage directly or as a PersistentVolume (or via other NAS such as Gluster) is not supported for Elasticsearch storage, as Lucene relies on filesystem behavior that NFS does not supply. Data corruption and other problems can occur. If NFS storage is a requirement, you can allocate a large file on that storage to serve as a storage device and treat it as a host mount on each host. For example: $ truncate -s 1T /nfs/storage/elasticsearch-1 $ mkfs.xfs /nfs/storage/elasticsearch-1 $ mount -o loop /nfs/storage/elasticsearch-1 /usr/local/es-storage $ chown 1000:1000 /usr/local/es-storage Then, use /usr/local/es-storage as a host-mount as described above. Performance under this solution is significantly worse than using actual local drives. |
Node Selector
Because Elasticsearch can use a lot of resources, all members of a cluster should have low latency network connections to each other. Ensure this by directing the instances to dedicated nodes, or a dedicated region within your cluster, using a node selector.
To configure a node selector, edit each deployment configuration and add the
nodeSelector
parameter to specify the label of the desired nodes:
apiVersion: v1 kind: DeploymentConfig spec: template: spec: nodeSelector: nodelabel: logging-es-node-1
Alternatively you can use the oc patch
command:
$ oc patch dc/logging-es-<unique_name> \ -p '{"spec":{"template":{"spec":{"nodeSelector":{"nodeLabel":"logging-es-node-1"}}}}}'
Changing the Scale of Elasticsearch
If you need to scale up the number of Elasticsearch instances your cluster uses, it is not as simple as changing the number of Elasticsearch cluster nodes. This is due to the nature of persistent volumes and how Elasticsearch is configured to store its data and recover the cluster. Instead, you must create a deployment configuration for each Elasticsearch cluster node.
During installation, the deployer
creates templates with the
Elasticsearch configurations provided to it: logging-es-template and
logging-es-ops-template if the deployer was run with
ENABLE_OPS_CLUSTER=true
.
The node quorum and recovery settings were initially set based on the
CLUSTER_SIZE
value provided to the deployer. Since the cluster size is
changing, those values need to be updated.
Prior to changing the number of Elasticsearch cluster nodes, the EFK stack should first be scaled down to preserve log data as described in Upgrading the EFK Logging Stack.
Edit the cluster template you are scaling up and change the parameters to the desired value:
NODE_QUORUM
is the intended cluster size / 2 (rounded down) + 1. For an
intended cluster size of 5, the quorum would be 3.
RECOVER_EXPECTED_NODES
is the same as the intended cluster size.
RECOVER_AFTER_NODES
is the intended cluster size - 1.
$ oc edit template logging-es[-ops]-template
In addition to updating the template, all of the deployment configurations for that cluster also need to have the three environment variable values above updated. To edit each of the configurations for the cluster in series, you use the following.
$ oc get dc -l component=es[-ops] -o name | xargs -r oc edit
Create an additional deployment configuration, run the following command against the Elasticsearch cluster you want to to scale up for (logging-es-template or logging-es-ops-template).
$ oc new-app logging-es[-ops]-template
These deployments will be named differently, but all will have the logging-es prefix. Be aware of the cluster parameters (described in the deployer parameters) based on cluster size that may need corresponding adjustment in the template, as well as existing deployments.
After the intended number of deployment configurations are created, scale up your cluster, starting with Elasticsearch as described in Upgrading the EFK Logging Stack.
The $ oc process logging-es-template | oc volume -f - \ --add --overwrite --name=elasticsearch-storage \ --type=persistentVolumeClaim --claim-name={your_pvc}` |
Once Elasticsearch is running, scale Fluentd to every node to feed logs into Elasticsearch. The following example is for an OpenShift Enterprise instance with three nodes:
$ oc scale dc/logging-fluentd --replicas=3
You will need to scale Fluentd if nodes are added or subtracted.
When you make changes to any part of the EFK stack, specifically Elasticsearch or Fluentd, you should first scale Elasicsearch down to zero and scale Fluentd so it does not match any other nodes. Then, make the changes and scale Elasicsearch and Fluentd back.
To scale Elasicsearch to zero:
$ oc scale --replicas=0 dc/<ELASTICSEARCH_DC>
Change nodeSelector in the daemonset configuration to match zero:
$ oc get ds logging-fluentd -o yaml |grep -A 1 Selector nodeSelector: logging-infra-fluentd: "true"
oc patch
command to modify the daemonset nodeSelector:$ oc patch ds logging-fluentd -p '{"spec":{"template":{"spec":{"nodeSelector":{"nonexistlabel":"true"}}}}}'
$ oc get ds logging-fluentd -o yaml |grep -A 1 Selector nodeSelector: "nonexistlabel: "true"
Scale Elastcsearch back up from zero:
$ oc scale --replicas=# dc/<ELASTICSEARCH_DC>
Change nodeSelector in the daemonset configuration back to logging-infra-fluentd: "true".
Use the oc patch
command to modify the daemonset nodeSelector:
oc patch ds logging-fluentd -p '{"spec":{"template":{"spec":{"nodeSelector":{"logging-infra-fluentd":"true"}}}}}'
To access the Kibana console from the OpenShift Enterprise web console, add the
loggingPublicURL
parameter in the /etc/origin/master/master-config.yaml
file, with the URL of the Kibana console (the KIBANA_HOSTNAME
parameter).
The value must be an HTTPS URL:
... assetConfig: ... loggingPublicURL: "https://kibana.example.com" ...
Setting the loggingPublicURL
parameter creates a View Archive button on the
OpenShift Enterprise web console under the Browse → Pods → <pod_name> →
Logs tab. This links to the Kibana console.
You can scale the Kibana deployment as usual for redundancy:
$ oc scale dc/logging-kibana --replicas=2
You can see the UI by visiting the site specified at the KIBANA_HOSTNAME
variable.
See the Kibana documentation for more information on Kibana.
You can remove everything generated during the deployment while leaving other project contents intact:
$ oc delete all --selector logging-infra=kibana $ oc delete all --selector logging-infra=fluentd $ oc delete all --selector logging-infra=elasticsearch $ oc delete all --selector logging-infra=curator $ oc delete all,sa,oauthclient --selector logging-infra=support $ oc delete secret logging-fluentd logging-elasticsearch \ logging-es-proxy logging-kibana logging-kibana-proxy \ logging-kibana-ops-proxy
To upgrade the EFK logging stack, see Manual Upgrades.
Using the Kibana console with OpenShift Enterprise can cause problems that are easily solved, but are not accompanied with useful error messages. Check the following troubleshooting sections if you are experiencing any problems when deploying Kibana on OpenShift Enterprise:
Login Loop
The OAuth2 proxy on the Kibana console must share a secret with the master host’s OAuth2 server. If the secret is not identical on both servers, it can cause a login loop where you are continuously redirected back to the Kibana login page.
To fix this issue, delete the current oauthclient, and create a new one, using the same template as before:
$ oc delete oauthclient/kibana-proxy $ oc new-app logging-support-template
Cryptic Error When Viewing the Console
When attempting to visit the Kibana console, you may instead receive a browser error:
{"error":"invalid_request","error_description":"The request is missing a required parameter, includes an invalid parameter value, includes a parameter more than once, or is otherwise malformed."}
This can be caused by a mismatch between the OAuth2 client and server. The return address for the client must be in a whitelist so the server can securely redirect back after logging in.
Fix this issue by replacing the OAuth client entry:
$ oc delete oauthclient/kibana-proxy $ oc new-app logging-support-template
If the problem persists, check that you are accessing Kibana at a URL listed in the OAuth client. This issue can be caused by accessing the URL at a forwarded port, such as 1443 instead of the standard 443 HTTPS port. You can adjust the server whitelist by editing the OAuth client:
$ oc edit oauthclient/kibana-proxy
503 Error When Viewing the Console
If you receive a proxy error when viewing the Kibana console, it could be caused by one of two issues.
First, Kibana may not be recognizing pods. If Elasticsearch is slow in starting up, Kibana may timeout trying to reach it. Check whether the relevant service has any endpoints:
$ oc describe service logging-kibana Name: logging-kibana [...] Endpoints: <none>
If any Kibana pods are live, endpoints will be listed. If they are not, check the state of the Kibana pods and deployment. You may need to scale the deployment down and back up again.
The second possible issue may be caused if the route for accessing the Kibana service is masked. This can happen if you perform a test deployment in one project, then deploy in a different project without completely removing the first deployment. When multiple routes are sent to the same destination, the default router will only route to the first created. Check the problematic route to see if it is defined in multiple places:
$ oc get route --all-namespaces --selector logging-infra=support
Fluentd sends logs to the value of the ES_HOST
, ES_PORT
, OPS_HOST
, and
OPS_PORT
environment variables of the Elasticsearch deployment configuration.
The application logs are directed to the ES_HOST
destination, and operations
logs to OPS_HOST
.
To direct logs to a specific Elasticsearch instance, edit the deployment configuration and replace the value of the above variables with the desired instance:
$ oc edit dc/<deployment_configuration>
For an external Elasticsearch instance to contain both application and
operations logs, you can set ES_HOST
and OPS_HOST
to the same destination,
while ensuring that ES_PORT
and OPS_PORT
also have the same value.
If your externally hosted Elasticsearch instance does not use TLS, update the
*_CLIENT_CERT
, *_CLIENT_KEY
, and *_CA
variables to be empty. If it does
use TLS, but not mutual TLS, update the *_CLIENT_CERT
and *_CLIENT_KEY
variables to be empty and patch or recreate the logging-fluentd
secret with
the appropriate *_CA
value for communicating with your Elasticsearch instance.
If it uses Mutual TLS as the provided Elasticsearch instance does, patch or
recreate the logging-fluentd
secret with your client key, client cert, and CA.
You can use oc edit dc/logging-fluentd
to update your Fluentd configuration,
making sure to first scale down your number of replicas to zero before editing
the deployment configuration.
If you are not using the provided Kibana and Elasticsearch images, you will not have the same multi-tenant capabilities and your data will not be restricted by user access to a particular project. |
As of the Deployer version
3.2.0,
an admin certificate, key, and CA that can be used to communicate with and perform
administrative operations on Elasticsearch are provided within the
logging-elasticsearch
secret.
To confirm whether or not your EFK installation provides these, run: $ oc describe secret logging-elasticsearch |
If they are not available, refer to Manual Upgrades to ensure you are on the latest version first.
Connect to an Elasticsearch pod that is in the cluster on which you are attempting to perform maintenance.
To find a pod in a cluster use either:
$ oc get pods -l component=es -o name | head -1 $ oc get pods -l component=es-ops -o name | head -1
Then, connect to a pod:
$ oc rsh <your_Elasticsearch_pod>
Once connected to an Elasticsearch container, you can use the certificates mounted from the secret to communicate with Elasticsearch per its 1.5 Document APIs.
Fluentd sends its logs to Elasticsearch using the index format "{project_name}.{project_uuid}.YYYY.MM.DD" where YYYY.MM.DD is the date of the log record.
For example, to delete all logs for the logging
project with uuid 3b3594fa-2ccd-11e6-acb7-0eb6b35eaee3
from June 15, 2016, we can run:
$ curl --key /etc/elasticsearch/keys/admin-key --cert /etc/elasticsearch/keys/admin-cert \ --cacert /etc/elasticsearch/keys/admin-ca -XDELETE \ "https://localhost:9200/logging.3b3594fa-2ccd-11e6-acb7-0eb6b35eaee3.2016.06.15"
With Aggregated Logging version 3.2.1, Curator is available for use as Tech Preview. To start it, after completing an installation using the 3.2.1 Deployer, scale up the Curator deployment configuration that was created. (It defaults to zero replicas.) There should be one Curator pod running per Elasticsearch cluster. If you
deployed aggregated logging with $ oc scale dc/logging-curator --replicas=1 $ oc scale dc/logging-curator-ops --replicas=1 |
Curator allows administrators to configure scheduled Elasticsearch maintenance operations to be performed automatically on a per-project basis. It is scheduled to perform actions daily based on its configuration. Only one Curator pod is recommended per Elasticsearch cluster. Curator is configured via a mounted YAML configuration file with the following structure:
$PROJECT_NAME: $ACTION: $UNIT: $VALUE $PROJECT_NAME: $ACTION: $UNIT: $VALUE ...
The available parameters are:
Variable Name | Description |
---|---|
|
The actual name of a project, such as |
|
The action to take, currently only |
|
One of |
|
An integer for the number of units. |
|
Use |
|
(Number) the hour of the day in 24-hour format at which to run the Curator jobs. For
use with |
|
(Number) the minute of the hour at which to run the Curator jobs. For use with |
For example, to configure Curator to
delete indices in the myapp-dev
project older than 1 day
delete indices in the myapp-qe
project older than 1 week
delete operations
logs older than 8 weeks
delete all other projects indices after they are 30 days
old
run the Curator jobs at midnight every day
you would use:
myapp-dev: delete: days: 1 myapp-qe: delete: weeks: 1 .operations: delete: weeks: 8 .defaults: delete: days: 30 runhour: 0 runminute: 0
When you use |
To create the Curator configuration:
Create a YAML file with your configuration settings using your favorite editor.
Create a secret from your created yaml file:
$ oc secrets new index-management settings=</path/to/your/yaml/file>
Mount your created secret as a volume in your Curator DC:
$ oc volumes dc/logging-curator \ --add \ --type=secret \ --secret-name=index-management \ --mount-path=/etc/curator \ --name=index-management \ --overwrite
The mount-path value (e.g. |
You can also specify default values for the run hour, run minute, and age in days
of the indices when processing the Curator template. Use CURATOR_RUN_HOUR
and
CURATOR_RUN_MINUTE
to set the default runhour and runminute, and use
CURATOR_DEFAULT_DAYS
to set the default index age.