Enabling Cluster Metrics | Installation and Configuration

Overview
Before You Begin
Service Accounts
- Metrics Deployer Service Account
- Heapster Service Account
Metrics Data Storage
- Persistent Storage
- Non-Persistent Storage
Metrics Deployer
- Using Secrets
- Modifying the Deployer Template
Deploying the Metric Components
Using a Re-encrypting Route
Configuring OpenShift Enterprise
Scaling OpenShift Enterprise Metrics Pods
- Prerequisites
- Scaling the Cassandra Components
Cleanup

Overview

The kubelet exposes metrics that can be collected and stored in back-ends by Heapster.

As an OpenShift Enterprise administrator, you can view a cluster’s metrics from all containers and components in one user interface. These metrics are also used by horizontal pod autoscalers in order to determine when and how to scale.

This topic describes using Hawkular Metrics as a metrics engine which stores the data persistently in a Cassandra database. When this is configured, CPU and memory-based metrics are viewable from the OpenShift Enterprise web console and are available for use by horizontal pod autoscalers.

Heapster retrieves a list of all nodes from the master server, then contacts each node individually through the /stats endpoint. From there, Heapster scrapes the metrics for CPU and memory usage, then exports them into Hawkular Metrics.

Browsing individual pods in the web console displays separate sparkline charts for memory and CPU. The time range displayed is selectable, and these charts automatically update every 30 seconds. If there are multiple containers on the pod, then you can select a specific container to display its metrics.

If resource limits are defined for your project, then you can also see a donut chart for each pod. The donut chart displays usage against the resource limit. For example: 145 Available of 200 MiB, with the donut chart showing 55 MiB Used.

Before You Begin

The components for cluster metrics must be deployed to the openshift-infra project. This allows horizontal pod autoscalers to discover the Heapster service and use it to retrieve metrics that can be used for autoscaling.

All of the following commands in this topic must be executed under the openshift-infra project. To switch to the openshift-infra project:

$ oc project openshift-infra

To enable cluster metrics, you must next configure the following:

Service Accounts
Metrics Data Storage
Metrics Deployer

Service Accounts

You must configure service accounts for:

Metrics Deployer
Heapster

Metrics Deployer Service Account

The Metrics Deployer will be discussed in a later step, but you must first set up a service account for it:

Create a metrics-deployer service account:

$ oc create -f - <<API
apiVersion: v1
kind: ServiceAccount
metadata:
  name: metrics-deployer
secrets:
- name: metrics-deployer
API

Before it can deploy components, the metrics-deployer service account must also be granted the edit permission for the openshift-infra project:
```
$ oadm policy add-role-to-user \
    edit system:serviceaccount:openshift-infra:metrics-deployer
```

Heapster Service Account

The Heapster component requires access to the master server to list all available nodes and access the /stats endpoint for each node. Before it can do this, the Heapster service account requires the cluster-reader permission:

$ oadm policy add-cluster-role-to-user \
    cluster-reader system:serviceaccount:openshift-infra:heapster

The Heapster service account is created automatically during the Deploying the Metrics Components step.

Metrics Data Storage

You can store the metrics data to either persistent storage or to a temporary pod volume.

Persistent Storage

Running OpenShift Enterprise cluster metrics with persistent storage means that your metrics will be stored to a persistent volume and be able to survive a pod being restarted or recreated. This is ideal if you require your metrics data to be guarded from data loss.

The size of the persisted volume can be specified with the CASSANDRA_PV_SIZE template parameter. By default it is set to 10 GB, which may or may not be sufficient for the size of the cluster you are using. If you require more space, for instance 100 GB, you could specify it with something like this:

$ oc process -f metrics-deployer.yaml -v \
    HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.example.com,CASSANDRA_PV_SIZE=100Gi \
    | oc create -f -

The size requirement of the Cassandra storage is dependent on the cluster size. It is the administrator’s responsibility to ensure that the size requirements are sufficient for their setup and to monitor usage to ensure that the disk does not become full.

Data loss will result if the Cassandra persisted volume runs out of sufficient space.

For cluster metrics to work with persistent storage, ensure that the persistent volume has the ReadWriteOnce access mode. If not, the persistent volume claim will not be able to find the persistent volume, and Cassandra will fail to start.

To use persistent storage with the metric components, ensure that a persistent volume of sufficient size is available. The creation of persistent volume claims is handled by the Metrics Deployer.

Non-Persistent Storage

Running OpenShift Enterprise cluster metrics with non-persistent storage means that any stored metrics will be deleted when the pod is deleted. While it is much easier to run cluster metrics with non-persistent data, running with non-persistent data does come with the risk of permanent data loss. However, metrics can still survive a container being restarted.

In order to use non-persistent storage, you must set the USE_PERSISTENT_STORAGE template option to false for the Metrics Deployer.

Metrics Deployer

The Metrics Deployer deploys and configures all of the metrics components. You can configure it by passing in information from secrets and by passing parameters to the Metrics Deployer’s template.

Using Secrets

By default, the Metrics Deployer auto-generates self-signed certificates for use between components. Because these are self-signed certificates, they are not automatically trusted by a web browser. Therefore, it is recommended to use internal certificates for anything being accessed outside of the OpenShift Enterprise cluster, and then use the re-encrypting route to provide your own custom certificates. This is especially important for the Hawkular Metrics server as it must be accessible in a browser for the web console to function.

The Metrics Deployer requires that you manually create a metrics-deployer secret whether you are providing your own certificates or using generated self-signed certificates.

Providing Your Own Certificates

To provide your own certificates and replace the internally used ones, you can pass these values as secrets to the Metrics Deployer.

The preferred metrics deployment method is to pass the metrics secret with no certificates:

$ oc secrets new metrics-deployer nothing=/dev/null

Then, use the a re-encrypting route to pass your custom certificates to Heapster. This allows for greater control in modifying the certificates in the future.

Using a re-encrypting route allows the self-signed certificates to remain in use internally while allowing your own certificates to be used for externally access. To use a re-encrypting route, do not set the certificates as a secret, but a secret named metrics-deployer must still exist before the Metrics Deployer can complete.

Optionally, provide your own certificate that is configured to be trusted by your browser by pointing your secret to the certificate’s .pem and certificate authority certificate files:

$ oc secrets new metrics-deployer \
    hawkular-metrics.pem=/home/openshift/metrics/hm.pem \
    hawkular-metrics-ca.cert=/home/openshift/metrics/hm-ca.cert

Setting the value using secrets will replace the internally used certificates. Therefore, these certificates must be valid for both the externally used host names as well as the external host name. For hawkular-metrics, this means the certificate must have a value of the literal string hawkular-metrics as well as the value specified in HAWKULAR_METRICS_HOSTNAME.

If you are unable to add the internal host name to your certificate, then you can use the re-encrypting route method.

The following table contains more advanced configuration options, detailing all the secrets which can be used by the deployer:

Secret Name Description

Secret Name	Description
*hawkular-metrics.pem*	The *pem* file to use for the Hawkular Metrics certificate. This certificate must contain the literal string `hawkular-metrics` as a host name as well as the publicly available host name used by the route. This file is auto-generated if unspecified.
*hawkular-metrics-ca.cert*	The certificate for the CA used to sign the *hawkular-metrics.pem. This option is ignored if the hawkular-metrics.pem* option is not specified.
*hawkular-cassandra.pem*	The *.pem* file to use for the Cassandra certificate. This certificate must contain the hawkular-cassandra host name. This file is auto-generated if unspecified.
*hawkular-cassandra-ca.cert*	The certificate for the CA used to sign the *hawkular-cassandra.pem. This option is ignored if the hawkular-cassandra.pem* option is not specified.
*heapster.cert*	The certificate for Heapster to use. This is auto-generated if unspecified.
*heapster.key*	The key to use with the Heapster certificate. This is ignored if *heapster.cert* is not specified
*heapster_client_ca.cert*	The certificate that generates *heapster.cert. This is required if heapster.cert* is specified. Otherwise, the main CA for the OpenShift Enterprise installation is used. In order for horizontal pod autoscaling to function properly, this should not be overridden.
*heapster_allowed_users*	A file containing a comma-separated list of CN to accept from certificates signed with the specified CA. By default, this is set to allow the OpenShift Enterprise service proxy to connect. If you override this, make sure to add `system:master-proxy` to the list in order to allow horizontal pod autoscaling to function properly.

hawkular-metrics.pem

The pem file to use for the Hawkular Metrics certificate. This certificate must contain the literal string hawkular-metrics as a host name as well as the publicly available host name used by the route. This file is auto-generated if unspecified.

hawkular-metrics-ca.cert

The certificate for the CA used to sign the hawkular-metrics.pem. This option is ignored if the hawkular-metrics.pem option is not specified.

hawkular-cassandra.pem

The .pem file to use for the Cassandra certificate. This certificate must contain the hawkular-cassandra host name. This file is auto-generated if unspecified.

hawkular-cassandra-ca.cert

The certificate for the CA used to sign the hawkular-cassandra.pem. This option is ignored if the hawkular-cassandra.pem option is not specified.

heapster.cert

The certificate for Heapster to use. This is auto-generated if unspecified.

heapster.key

The key to use with the Heapster certificate. This is ignored if heapster.cert is not specified

heapster_client_ca.cert

The certificate that generates heapster.cert. This is required if heapster.cert is specified. Otherwise, the main CA for the OpenShift Enterprise installation is used. In order for horizontal pod autoscaling to function properly, this should not be overridden.

heapster_allowed_users

A file containing a comma-separated list of CN to accept from certificates signed with the specified CA. By default, this is set to allow the OpenShift Enterprise service proxy to connect. If you override this, make sure to add system:master-proxy to the list in order to allow horizontal pod autoscaling to function properly.

The Heapster component uses the service name DNS registry to connect to Hawkular Metrics. In the metrics code, the URL used by Heapster to connect to Hawkular Metrics is hard-coded. It attaches the search domain and resolves to the service IP.

Using Generated Self-Signed Certificates

The Metrics Deployer can accept multiple certificates using secrets. If a certificate is not passed as a secret, then the deployer generates a self-signed certificate instead, forcing users to accept the certificate as a security exception.

In order to use official certificates for the web console, you must use a re-encrypting route. This allows the self-signed certificates to remain in use internally, while allowing your own certificates to be used for external access. When using a re-encrypting route, do not set the certificates as a secret. A "dummy" secret named metrics-deployer must still exist for the Metrics Deployer to generate certificates.

To create a "dummy" secret that does not specify a certificate value:

$ oc secrets new metrics-deployer nothing=/dev/null

If you do not use a re-encrypting route when using generated self-signed certificates you will encounter errors.

Modifying the Deployer Template

The OpenShift Enterprise installer uses a template to deploy the metrics components. The default template can be found at the following path:

/usr/share/openshift/examples/infrastructure-templates/enterprise/metrics-deployer.yaml

In case you need to make any changes to this file, copy it to another directory with the file name metrics-deployer.yaml and refer to the new location when using it in the following sections.

Deployer Template Parameters

The deployer template parameter options and their defaults are listed in the default metrics-deployer.yaml file. If required, you can override these values when creating the Metrics Deployer.

Table 1. Template Parameters
Parameter	Description
`METRIC_DURATION`	The number of days metrics should be stored.
`CASSANDRA_PV_SIZE`	The persistent volume size for each of the Cassandra nodes.
`USE_PERSISTENT_STORAGE`	Set to true for persistent storage; set to false to use non-persistent storage.
`REDEPLOY`	If set to true, the deployer will try to delete all the existing components before trying to redeploy.
`HAWKULAR_METRICS_HOSTNAME`	External host name where clients can reach Hawkular Metrics.
`MASTER_URL`	Internal URL for the master, for authentication retrieval.
`IMAGE_VERSION`	Specify version for metrics components. For example, for openshift/origin-metrics-deployer:latest, set version to latest.
`IMAGE_PREFIX`	Specify prefix for metrics components. For example, for openshift/origin-metrics-deployer:latest, set prefix to openshift/origin-.

The only required parameter is HAWKULAR_METRICS_HOSTNAME. This value is required when creating the deployer because it specifies the hostname for the Hawkular Metrics route. This value should correspond to a fully qualified domain name. You will need to know the value of HAWKULAR_METRICS_HOSTNAME when configuring the console for metrics access.

If you are using persistent storage with Cassandra, it is the administrator’s responsibility to set a sufficient disk size for the cluster using the CASSANDRA_PV_SIZE parameter. It is also the administrator’s responsibility to monitor disk usage to make sure that it does not become full.

Data loss will result if the Cassandra persisted volume runs out of sufficient space.

All of the other parameters are optional and allow for greater customization. For instance, if you have a custom install in which the Kubernetes master is not available under https://kubernetes.default.svc:443 you can specify the value to use instead with the MASTER_URL parameter. To deploy a specific version of the metrics components, use the IMAGE_VERSION parameter.

Deploying the Metric Components

Because deploying and configuring all the metric components is handled by the Metrics Deployer, you can simply deploy everything in one step.

The following examples show you how to deploy metrics with and without persistent storage using the default template parameters. Optionally, you can specify any of the template parameters when calling these commands.

In accordance with upstream Kubernetes rules, metrics can be collected only on the default interface of eth0.

Example 1. Deploying with Persistent Storage

The following command sets the Hawkular Metrics route to use hawkular-metrics.example.com and is deployed using persistent storage.

You must have a persistent volume of sufficient size available.

$ oc new-app -f metrics-deployer.yaml \
    -p HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.example.com

Example 2. Deploying without Persistent Storage

The following command sets the Hawkular Metrics route to use hawkular-metrics.example.com and deploy without persistent storage.

$ oc new-app -f metrics-deployer.yaml \
    -p HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.example.com \
    -p USE_PERSISTENT_STORAGE=false

Because this is being deployed without persistent storage, metric data loss can occur.

Using a Re-encrypting Route

The following section is not required if the hawkular-metrics.pem secret was specified as a deployer secret.

By default, the Hawkular Metrics server uses an internally signed certificate, which is not trusted by browsers or other external services. To provide your own trusted certificate to be used for external access, use a route with re-encryption termination.

Creating this new route requires deleting the default route that just passes through to an internally signed certificate:

First, delete the default route that uses the self-signed certificates:
```
$ oc delete route hawkular-metrics
```

Create a new route with re-encryption termination

$ oc create route reencrypt hawkular-metrics-reencrypt \
            --hostname hawkular-metrics.example.com \ (1)
            --key /path/to/key \ (2)
            --cert /path/to/cert \ (2)
            --ca-cert /path/to/ca.crt \ (2)
            --service hawkular-metrics
            --dest-ca-cert /path/to/internal-ca.crt (3)

1	The value specified in the `HAWKULAR_METRICS_HOSTNAME` template parameter.
2	These need to define the custom certificate you want to provide.
3	This needs to correspond to the CA used to sign the internal Hawkular Metrics certificate.

The CA used to sign the internal Hawkular Metrics certificate can be found from the hawkular-metrics-certificate secret:

$ base64 -d <<< \
    `oc get -o yaml secrets hawkular-metrics-certificate \
    | grep -i hawkular-metrics-ca.certificate | awk '{print $2}'` \
    > /path/to/internal-ca.crt

Configuring OpenShift Enterprise

The OpenShift Enterprise web console uses the data coming from the Hawkular Metrics service to display its graphs. The URL for accessing the Hawkular Metrics service must be configured via the metricsPublicURL option in the master configuration file (/etc/origin/master/master-config.yaml). This URL corresponds to the route created with the HAWKULAR_METRICS_HOSTNAME template parameter during the deployment of the metrics components.

You must be able to resolve the HAWKULAR_METRICS_HOSTNAME from the browser accessing the console.

For example, if your HAWKULAR_METRICS_HOSTNAME corresponds to hawkular-metrics.example.com, then you must make the following change in the master-config.yaml file:

  assetConfig:
    ...
    metricsPublicURL: "https://hawkular-metrics.example.com/hawkular/metrics"

Once you have updated and saved the master-config.yaml file, you must restart your OpenShift Enterprise instance.

When your OpenShift Enterprise server is back up and running, metrics will be displayed on the pod overview pages.

If you are using self-signed certificates, remember that the Hawkular Metrics service is hosted under a different host name and uses different certificates than the console. You may need to explicitly open a browser tab to the value specified in metricsPublicURL and accept that certificate.

To avoid this issue, use certificates which are configured to be acceptable by your browser.

Scaling OpenShift Enterprise Metrics Pods

One set of metrics pods (Cassandra/Hawkular/Heapster) is able to monitor at least 10,000 pods.

Pay attention to system load on nodes where OpenShift Enterprise metrics pods run. Use that information to determine if it is necessary to scale out a number of OpenShift Enterprise metrics pods and spread the load across multiple OpenShift Enterprise nodes. Scaling OpenShift Enterprise metrics heapster pods is not recommended.

Autoscaling the metrics components, such as Hawkular and Heapster, is not supported by OpenShift Enterprise.

Prerequisites

If persistent storage was used to deploy OpenShift Enterprise metrics, then you must create a persistent volume (PV) for the new Cassandra pod to use before you can scale out the number of OpenShift Enterprise metrics Cassandra pods. However, if Cassandra was deployed with dynamically provisioned PVs, then this step is not necessary.

Scaling the Cassandra Components

The Cassandra nodes use persistent storage, therefore scaling up or down is not possible with replication controllers.

Scaling a Cassandra cluster requires you to use the hawkular-cassandra-node template. By default, the Cassandra cluster is a single-node cluster.

To scale out the number of OpenShift Enterprise metrics hawkular pods to two replicas, run:

# oc scale -n openshift-infra --replicas=2 rc hawkular-metrics

If you add a new node to a Cassandra cluster, the data stored in the cluster rebalances across the cluster. The same thing happens if you remove a node from the Cluster.

Cleanup

You can remove everything deloyed by the metrics deployer by performing the following steps:

$ oc delete all,sa,templates,secrets,pvc --selector="metrics-infra"

To remove the deployer components, perform the following steps:

$ oc delete sa,secret metrics-deployer