Logging

OpenShift Dedicated uses an Elasticsearch, Fluentd, Kibana (EFK) stack to collect logs from applications and present them to OpenShift users. OpenShift Dedicated administrators can view all application and project logs, but application developers can only view logs for projects they have permission to view.

The EFK stack consists of the following components:

  • Elasticsearch: Object store where logs are stored and provides the ability to search logs.

  • Fluentd: Gathers and sends logs to Elasticsearch; this runs on all nodes on the cluster.

  • Kibana : A web interface to allow for easy interaction with Elasticsearch.

  • Curator: Schedule Elasticsearch maintenance operations automatically.

By default, OpenShift Dedicated retains a maximum of 14 days or 200GB of application logs, whichever comes first. Increased logging rates and large log sizes can require shorter log retention before logs are rotated.

Kibana can be accessed at:

https://logs.[cluster-name].openshift.com

Audit Logging

Audit logs are maintained by Red Hat and are kept for internal use only.

Metrics

Metrics provides the ability to view CPU, memory, and network-based metrics in the OpenShift Dedicated web console. These metrics allow for the horizontal autoscaling of pods based on parameters provided by an OpenShift Dedicated user.

The metrics stack consists of the following components:

  • Hawkular Metrics backed by Cassandra: A metrics engine which stores the data persistently in a Cassandra database. When this is configured, CPU, memory and network-based metrics are viewable from the OpenShift Dedicated web console and are available for use by horizontal pod autoscalers.

  • Heapster: A service that retrieves a list of all nodes from the master server, then contacts each node individually and scrapes the metrics for CPU, memory, and network usage. It then exports these into Hawkular Metrics.

Monitoring and Testing

OpenShift Dedicated clusters come with an integrated Prometheus/Grafana stack for cluster monitoring. Monitoring service containers run on each application node and scrap data from the nodes, which is then fronted on the Grafana dashboard. The aggregated data is further used by Red Hat Service Reliability Engineering (SRE) for services such as pruning, garbage collection, automated testing and configuration management.