OpenShift Container Platform uses Elasticsearch (ES) to store and organize the log data.

You can add and remove nodes, configure storage for your Elasticsearch cluster, and define how shards are replicated across data nodes in the cluster, from full replication to no replication.

Elasticsearch is a memory-intensive application. Each Elasticsearch node needs 16G of memory for both memory requests and CPU limits unless you specify otherwise the ClusterLogging custom resource. The initial set of OpenShift Container Platform nodes might not be large enough to support the Elasticsearch cluster. You must add additional nodes to the OpenShift Container Platform cluster to run with the recommended or higher memory. Each Elasticsearch node can operate with a lower memory setting though this is not recommended for production deployments.

If you set the Elasticsearch Operator (EO) to unmanaged and leave the Cluster Logging Operator (CLO) as managed, the CLO will revert changes you make to the EO, as the EO is managed by the CLO.

Configuring Elasticsearch CPU and memory limits

Each component specification allows for adjustments to both the CPU and memory limits. You should not have to manually adjust these values as the Elasticsearch Operator sets values sufficient for your environment.

Prerequisite
  • Cluster logging and Elasticsearch must be installed.

  • If needed, get the name of the Cluster Logging Custom Resource in the openshift-logging project:

    $ oc get ClusterLogging
    NAME       AGE
    instance   112m
Procedure

Edit the Cluster Logging Custom Resource (CR) in the openshift-logging project:

$ oc edit ClusterLogging instance

apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"
....
spec:
    logStore:
      type: "elasticsearch"
      elasticsearch:
        resources: (1)
          limits:
            cpu: "4000m"
            memory: "4Gi"
          requests:
            cpu: "100m"
            memory: "1Gi"
1 Specify the CPU and memory limits as needed. If you leave these values blank, the Elasticsearch Operator sets default values that should be sufficient for most deployments.

Configuring Elasticsearch replication policy

You can define how Elasticsearch shards are replicated across data nodes in the cluster:

  • FullRedundancy. Elasticsearch fully replicates the primary shards for each index to every data node. This provides the highest safety, but at the cost of the highest amount of disk required and the poorest performance.

  • MultipleRedundancy. Elasticsearch fully replicates the primary shards for each index to half of the data nodes. This provides a good tradeoff between safety and performance.

  • SingleRedundancy. Elasticsearch makes one copy of the primary shards for each index. Logs are always available and recoverable as long as at least two data nodes exist. Better performance than MultipleRedundancy, when using 5 or more nodes. You cannot apply this policy on deployments of single Elasticsearch node.

  • ZeroRedundancy. Elasticsearch does not make copies of the primary shards. Logs might be unavailable or lost in the event a node is down or fails. Use this mode when you are more concerned with performance than safety, or have implemented your own disk/PVC backup/restore strategy.

Prerequisite
  • Cluster logging and Elasticsearch must be installed.

  • If needed, get the name of the Cluster Logging Custom Resource in the openshift-logging project:

    $ oc get ClusterLogging
    NAME       AGE
    instance   112m
Procedure

Edit the Cluster Logging Custom Resource (CR) in the openshift-logging project:

apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"

....

spec:
  logStore:
    type: "elasticsearch"
    elasticsearch:
      redundancyPolicy: "SingleRedundancy" (1)
1 Specify a redundancy policy for the shards. The change is applied upon saving the changes.

Configuring Elasticsearch storage

Elasticsearch requires persistent storage. The faster the storage, the faster the Elasticsearch performance is.

Prerequisites
  • Cluster logging and Elasticsearch must be installed.

  • If needed, get the name of the Cluster Logging Custom Resource in the openshift-logging project:

    $ oc get ClusterLogging
    NAME       AGE
    instance   112m
Procedure
  1. Edit the Cluster Logging CR to specify that each data node in the cluster is bound to a Persistent Volume Claim. This example requests 200G of General Purpose SSD (gp2) storage.

    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      name: "instance"
    
    ....
    
     spec:
        logStore:
          type: "elasticsearch"
          elasticsearch:
            nodeCount: 3
            storage:
              storageClass:
                name: "gp2"
                size: "200G"

This example specifies each data node in the cluster is bound to a Persistent Volume Claim that requests "200G" of AWS General Purpose SSD (gp2) storage.

Configuring Elasticsearch for emptyDir storage

You can use emptyDir with Elasticsearch, which creates an ephemeral deployment in which all of a pod’s data is lost upon restart.

When using emptyDir, you will lose data if Elasticsearch is restarted or redeployed.

Prerequisite
  • Cluster logging and Elasticsearch must be installed.

  • If needed, get the name of the Cluster Logging Custom Resource in the openshift-logging project:

    $ oc get ClusterLogging
    NAME       AGE
    instance   112m
Procedure
  1. Edit the Cluster Logging CR to specify emptyDir:

     spec:
        logStore:
          type: "elasticsearch"
          elasticsearch:
            nodeCount: 3
            storage: {}

Scaling your Elasticsearch cluster

You can scale the number of data nodes in Elasticsearch.

For example, if you want to increase redundancy, and use the FullRedundancy or MultipleRedundancy policy, you can scale up the cluster to increase the number of shard replicas in your cluster.

The maximum number of Elasticsearch master nodes is three. If you specify a nodeCount greater than 3, OpenShift Container Platform creates three Elasticsearch nodes that are Master-eligible nodes, with the master, client, and data roles. The additional Elasticsearch nodes are created as Data-only nodes, using client and data roles. Master nodes perform cluster-wide actions such as creating or deleting an index, shard allocation, and tracking nodes. Data nodes hold the shards and perform data-related operations such as CRUD, search, and aggregations. Data-related operations are I/O-, memory-, and CPU-intensive. It is important to monitor these resources and to add more Data nodes if the current nodes are overloaded.

Prerequisite
  • Cluster logging and Elasticsearch must be installed.

  • If needed, get the name of the Elasticsearch Custom Resource in the openshift-logging project:

    $ oc get ClusterLogging
    NAME       AGE
    instance   112m
Procedure
  1. To scale up the cluster, edit the Elasticsearch Custom Resource (CR) to add a number of nodes of a specific type:

    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      name: "instance"
    ...
    
      logStore:
        type: "elasticsearch"
        elasticsearch:
          nodeCount: 5 (1)
          storage:
            storageClassName: gp2
            size: 200G
          redundancyPolicy: "SingleRedundancy"
    1 Specify the number of Elasticsearch nodes. This example adds two nodes to the default 3. The new nodes will be Data-only nodes.

Exposing Elasticsearch as a route

By default, Elasticsearch deployed with cluster logging is not accessible from outside the logging cluster. You can enable a route with re-encryption termination for external access to Elasticsearch for those tools that want to access its data.

Externally, you can access Elasticsearch by creating a reencrypt route, your OpenShift Container Platform token and the installed Elasticsearch CA certificate. The request must contain three HTTP headers:

Authorization: Bearer $token
X-Proxy-Remote-User: $username
X-Forwarded-For: $ip_address

Internally, you can access Elastiscearch using the Elasticsearch cluster IP:

$ oc get service elasticsearch -o jsonpath={.spec.clusterIP} -n openshift-logging
172.30.183.229

oc get service elasticsearch
NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
elasticsearch   ClusterIP   172.30.183.229   <none>        9200/TCP   22h

$ oc exec elasticsearch-cdm-oplnhinv-1-5746475887-fj2f8 -- curl -tlsv1.2 --insecure -H "Authorization: Bearer ${token}" "https://172.30.183.229:9200/_cat/health"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    29  100    29    0     0    108      0 --:--:-- --:--:-- --:--:--   108
Prerequisites
  • Cluster logging and Elasticsearch must be installed.

  • You must have access to the project in order to be able to access to the logs. For example:

    $ oc login <user1>
    $ oc new-project <user1project>
    $ oc new-app <httpd-example>
Procedure

To expose Elasticsearch externally:

  1. Change to the openshift-logging project:

    $ oc project openshift-logging
  2. Use the following command to extract the CA certificate from Elasticsearch and write to the admin-ca file:

    $ oc extract secret/elasticsearch --to=. --keys=admin-ca
    
    admin-ca
  3. Create the route for the Elasticsearch service as a YAML file:

    1. Create a YAML file with the following:

      apiVersion: route.openshift.io/v1
      kind: Route
      metadata:
        name: elasticsearch
        namespace: openshift-logging
      spec:
        host:
        to:
          kind: Service
          name: elasticsearch
        tls:
          termination: reencrypt
          destinationCACertificate: | (1)
      1 Add the Elasticsearch CA ceritifcate or use the command in the next step. You do not have to set the spec.tls.key, spec.tls.certificate, and spec.tls.caCertificate parameters required by some reencrypt routes.
    2. Run the following command to add the Elasticsearch CA certificate to the route YAML you created:

      cat ./admin-ca | sed -e "s/^/      /" >> <file-name>.yaml
    3. Run the following command to create the route:

      $ oc create -f <file-name>.yaml
      
      route.route.openshift.io/elasticsearch created
  4. Check that the Elasticsearch service is exposed:

    1. Get the token of this ServiceAccount to be used in the request:

      $ token=$(oc whoami -t)
    2. Set the elasticsearch route you created as an environment variable.

      $ routeES=`oc get route elasticsearch -o jsonpath={.spec.host}`
    3. To verify the route was successfully created, run the following command that accesses Elasticsearch through the exposed route:

      curl -tlsv1.2 --insecure -H "Authorization: Bearer ${token}" "https://${routeES}/.operations.*/_search?size=1" | jq

      The response appears similar to the following:

        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
      100   944  100   944    0     0     62      0  0:00:15  0:00:15 --:--:--   204
      {
        "took": 441,
        "timed_out": false,
        "_shards": {
          "total": 3,
          "successful": 3,
          "skipped": 0,
          "failed": 0
        },
        "hits": {
          "total": 89157,
          "max_score": 1,
          "hits": [
            {
              "_index": ".operations.2019.03.15",
              "_type": "com.example.viaq.common",
              "_id": "ODdiNWIyYzAtMjg5Ni0TAtNWE3MDY1MjMzNTc3",
              "_score": 1,
              "_source": {
                "_SOURCE_MONOTONIC_TIMESTAMP": "673396",
                "systemd": {
                  "t": {
                    "BOOT_ID": "246c34ee9cdeecb41a608e94",
                    "MACHINE_ID": "e904a0bb5efd3e36badee0c",
                    "TRANSPORT": "kernel"
                  },
                  "u": {
                    "SYSLOG_FACILITY": "0",
                    "SYSLOG_IDENTIFIER": "kernel"
                  }
                },
                "level": "info",
                "message": "acpiphp: Slot [30] registered",
                "hostname": "localhost.localdomain",
                "pipeline_metadata": {
                  "collector": {
                    "ipaddr4": "10.128.2.12",
                    "ipaddr6": "fe80::xx:xxxx:fe4c:5b09",
                    "inputname": "fluent-plugin-systemd",
                    "name": "fluentd",
                    "received_at": "2019-03-15T20:25:06.273017+00:00",
                    "version": "1.3.2 1.6.0"
                  }
                },
                "@timestamp": "2019-03-15T20:00:13.808226+00:00",
                "viaq_msg_id": "ODdiNWIyYzAtMYTAtNWE3MDY1MjMzNTc3"
              }
            }
          ]
        }
      }

About Elasticsearch alerting rules

You can view these alerting rules in Prometheus.

Alert Description Severity

ElasticsearchClusterNotHealthy

Cluster health status has been RED for at least 2m. Cluster does not accept writes, shards may be missing or master node hasn’t been elected yet.

critical

ElasticsearchClusterNotHealthy

Cluster health status has been YELLOW for at least 20m. Some shard replicas are not allocated.

warning

ElasticsearchBulkRequestsRejectionJumps

High Bulk Rejection Ratio at node in cluster. This node may not be keeping up with the indexing speed.

warning

ElasticsearchNodeDiskWatermarkReached

Disk Low Watermark Reached at node in cluster. Shards can not be allocated to this node anymore. You should consider adding more disk to the node.

alert

ElasticsearchNodeDiskWatermarkReached

Disk High Watermark Reached at node in cluster. Some shards will be re-allocated to different nodes if possible. Make sure more disk space is added to the node or drop old indices allocated to this node.

high

ElasticsearchJVMHeapUseHigh

JVM Heap usage on the node in cluster is <value>

alert

AggregatedLoggingSystemCPUHigh

System CPU usage on the node in cluster is <value>

alert

ElasticsearchProcessCPUHigh

ES process CPU usage on the node in cluster is <value>

alert