You can use OpenShift Monitoring for your own services in addition to monitoring the cluster. This way, you do not need to use an additional monitoring solution. This helps keeping monitoring centralized. Additionally, you can extend the access to the metrics of your services beyond cluster administrators. This enables developers and arbitrary users to access these metrics.

Custom Prometheus instances and the Prometheus Operator installed through Operator Lifecycle Manager (OLM) can cause issues with user-defined workload monitoring if it is enabled. Custom Prometheus instances are not supported in OpenShift Container Platform.

Monitoring your own services is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.

Enabling monitoring of your own services

You can enable monitoring your own services by setting the techPreviewUserWorkload/enabled flag in the cluster monitoring ConfigMap.

Prerequisites
  • You have access to the cluster as a user with the cluster-admin role.

  • You have installed the OpenShift CLI (oc).

  • You have created the cluster-monitoring-config ConfigMap object.

Procedure
  1. Start editing the cluster-monitoring-config ConfigMap:

    $ oc -n openshift-monitoring edit configmap cluster-monitoring-config
  2. Set the techPreviewUserWorkload setting to true under data/config.yaml:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        techPreviewUserWorkload:
          enabled: true
  3. Save the file to apply the changes. Monitoring your own services is enabled automatically.

  4. Optional: You can check that the prometheus-user-workload Pods were created:

    $ oc -n openshift-user-workload-monitoring get pod
    Example output
    NAME                                   READY   STATUS        RESTARTS   AGE
    prometheus-operator-6f7b748d5b-t7nbg   2/2     Running       0          3h
    prometheus-user-workload-0             5/5     Running       1          3h
    prometheus-user-workload-1             5/5     Running       1          3h
    thanos-ruler-user-workload-0           3/3     Running       0          3h
    thanos-ruler-user-workload-1           3/3     Running       0          3h
Additional resources

Deploying a sample service

To test monitoring your own services, you can deploy a sample service.

Procedure
  1. Create a YAML file for the service configuration. In this example, it is called prometheus-example-app.yaml.

  2. Fill the file with the configuration for deploying the service:

    apiVersion: v1
    kind: Namespace
    metadata:
      name: ns1
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: prometheus-example-app
      name: prometheus-example-app
      namespace: ns1
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: prometheus-example-app
      template:
        metadata:
          labels:
            app: prometheus-example-app
        spec:
          containers:
          - image: quay.io/brancz/prometheus-example-app:v0.2.0
            imagePullPolicy: IfNotPresent
            name: prometheus-example-app
    ---
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app: prometheus-example-app
      name: prometheus-example-app
      namespace: ns1
    spec:
      ports:
      - port: 8080
        protocol: TCP
        targetPort: 8080
        name: web
      selector:
        app: prometheus-example-app
      type: ClusterIP

    This configuration deploys a service named prometheus-example-app in the ns1 project. This service exposes the custom version metric.

  3. Apply the configuration file to the cluster:

    $ oc apply -f prometheus-example-app.yaml

    It will take some time to deploy the service.

  4. You can check that the service is running:

    $ oc -n ns1 get pod
    Example output
    NAME                                      READY     STATUS    RESTARTS   AGE
    prometheus-example-app-7857545cb7-sbgwq   1/1       Running   0          81m

Granting user permissions using web console

This procedure shows how to grant users permissions for monitoring their own services using the web console.

Prerequisites
  • Have a user created.

  • Log in to the web console as a cluster administrator.

Procedure
  1. In the web console, navigate to User ManagementRole BindingsCreate Binding.

  2. In Binding Type, select the "Namespace Role Binding" type.

  3. In Name, enter a name for the binding.

  4. In Namespace, select the namespace where you want to grant the access. For example, select ns1.

  5. In Role Name, enter monitoring-rules-view, monitoring-rules-edit, or monitoring-edit.

    • monitoring-rules-view allows reading PrometheusRule custom resources within the namespace.

    • monitoring-rules-edit allows creating, modifying, and deleting PrometheusRule custom resources matching the permitted namespace.

    • monitoring-edit gives the same permissions as monitoring-rules-edit. Additionally, it allows creating new scraping targets for services or Pods. It also allows creating, modifying, and deleting ServiceMonitors and PodMonitors.

    Whichever role you choose, you must bind it against a specific namespace as a cluster administrator.

    For example, enter monitoring-edit.

  6. In Subject, select User.

  7. In Subject Name, enter the name of the user. For example, enter johnsmith.

  8. Confirm the role binding. If you followed the example, then user johnsmith has been assigned the permissions for setting up metrics collection and creating alerting rules in the ns1 namespace.

Granting user permissions using CLI

This procedure shows how to grant users permissions for monitoring their own services using the CLI.

Prerequisites
  • Have a user created.

  • Log in using the oc command.

Procedure
  • Run this command to assign <role> to <user> in <namespace>:

    $ oc policy add-role-to-user <role> <user> -n <namespace>

    Substitute <role> with monitoring-rules-view, monitoring-rules-edit, or monitoring-edit.

    • monitoring-rules-view allows reading PrometheusRule custom resources within the namespace.

    • monitoring-rules-edit allows creating, modifying, and deleting PrometheusRule custom resources matching the permitted namespace.

    • monitoring-edit gives the same permissions as monitoring-rules-edit. Additionally, it allows creating new scraping targets for services or Pods. It also allows creating, modifying, and deleting ServiceMonitors and PodMonitors.

    Whichever role you choose, you must bind it against a specific namespace as a cluster administrator.

    As an example, substitute <role> with monitoring-edit, <user> with johnsmith, and <namespace> with ns1. This assigns to user johnsmith the permissions for setting up metrics collection and creating alerting rules in the ns1 namespace.

Setting up metrics collection

To use the metrics exposed by your service, you must configure OpenShift Monitoring to scrape metrics from the /metrics endpoint. You can do this using a ServiceMonitor, which is a custom resource definition (CRD) that specifies how a service should be monitored, or a PodMonitor, which is a CRD that specifies how a Pod should be monitored. The former requires a Service object, while the latter does not, allowing Prometheus to directly scrape metrics from the metrics endpoint exposed by a Pod.

This procedure shows how to create a ServiceMonitor for the service.

Prerequisites
  • Log in as a cluster administrator or a user with the monitoring-edit role.

Procedure
  1. Create a YAML file for the ServiceMonitor configuration. In this example, the file is called example-app-service-monitor.yaml.

  2. Fill the file with the configuration for creating the ServiceMonitor:

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      labels:
        k8s-app: prometheus-example-monitor
      name: prometheus-example-monitor
      namespace: ns1
    spec:
      endpoints:
      - interval: 30s
        port: web
        scheme: http
      selector:
        matchLabels:
          app: prometheus-example-app

    This configuration makes OpenShift Monitoring scrape the metrics exposed by the sample service deployed in "Deploying a sample service", which includes the single version metric.

  3. Apply the configuration file to the cluster:

    $ oc apply -f example-app-service-monitor.yaml

    It will take some time to deploy the ServiceMonitor.

  4. You can check that the ServiceMonitor is running:

    $ oc -n ns1 get servicemonitor
    Example output
    NAME                         AGE
    prometheus-example-monitor   81m
Additional resources

See the Prometheus Operator API documentation for more information on ServiceMonitors and PodMonitors.

Creating alerting rules

You can create alerting rules, which will fire alerts based on values of chosen metrics.

Viewing and managing your rules and alerts is not yet integrated into the web console. A cluster administrator can instead use the Alertmanager UI or the Thanos Ruler. See the respective sections for instructions.

Prerequisites
  • Log in as a user that has the monitoring-rules-edit role for the namespace where you want to create the alerting rule.

Procedure
  1. Create a YAML file for alerting rules. In this example, it is called example-app-alerting-rule.yaml.

  2. Fill the file with the configuration for the alerting rules:

    When you create an alerting rule, a namespace label is enforced on it if a rule with the same name exists in another namespace.

    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      name: example-alert
      namespace: ns1
    spec:
      groups:
      - name: example
        rules:
        - alert: VersionAlert
          expr: version{job="prometheus-example-app"} == 0

    This configuration creates an alerting rule named example-alert, which fires an alert when the version metric exposed by the sample service becomes 0.

    For every namespace, you can use metrics of that namespace and cluster metrics, but not metrics of another namespace.

    For example, an alerting rule for ns1 can have metrics from ns1 and cluster metrics, such as the CPU and memory metrics. However, the rule cannot include metrics from ns2.

    Additionally, you cannot create alerting rules for the openshift-* core OpenShift namespaces. OpenShift Container Platform Monitoring by default provides a set of alerting rules for these namespaces.

  3. Apply the configuration file to the cluster:

    $ oc apply -f example-app-alerting-rule.yaml

    It will take some time to create the alerting rules.

Removing alerting rules

You can remove an alerting rule.

Prerequisites
  • Log in as a user that has the monitoring-rules-edit role for the namespace where you want to remove an alerting rule.

Procedure
  • To remove rule <foo> in <namespace>, run:

    $ oc -n <namespace> delete prometheusrule <foo>

Accessing alerting rules for your project

You can list existing alerting rules for your project.

Prerequisites
  • Log in as a user with the monitoring-rules-view role against your project.

Procedure
  1. To list alerting rules in <project>, run:

    $ oc -n <project> get prometheusrule
  2. To list the configuration of an alerting rule, run:

    $ oc -n <project> get prometheusrule <rule> -oyaml

Accessing alerting rules for all namespaces

As a cluster administrator, you can access alerting rules from all namespaces together in a single view.

In a future release, the route to the Thanos Ruler UI will be deprecated in favor of the web console.

Prerequisites
  • Have the oc command installed.

  • Log in as a cluster administrator.

Procedure
  1. List routes for the openshift-user-workload-monitoring namespace:

    $ oc -n openshift-user-workload-monitoring get routes

    The output shows the URL for the Thanos Ruler UI:

    NAME           HOST/PORT
    ...
    thanos-ruler   thanos-ruler-openshift-user-workload-monitoring.apps.example.devcluster.openshift.com
  2. Navigate to the listed URL. Here you can see user alerting rules from all namespaces.

Accessing the metrics of your service as a developer

After you have enabled monitoring your own services, deployed a service, and set up metrics collection for the service, you can access the metrics of the service as a developer or as a user with view permissions for the project.

The Grafana instance shipped within OpenShift Container Platform Monitoring is read-only and displays only infrastructure-related dashboards.

Prerequisites
  • Deploy the service that you want to monitor.

  • Enable monitoring of your own services.

  • Have metrics scraping set up for the service.

  • Log in as a developer or as a user with view permissions for the project.

Procedure
  1. Go to the OpenShift Container Platform web console, switch to the Developer Perspective, then click AdvancedMetrics. Select the project you want to see the metrics for.

    Developers can only use the Developer Perspective and not the Administrator Perspective. They can only query metrics from a single project. They cannot access the third-party UIs provided with OpenShift Container Platform Monitoring.

  2. Use the PromQL interface to run queries for your services.

Accessing metrics of all services as a cluster administrator

If you are a cluster administrator or a user with view permissions for all namespaces, you can access metrics of all services from all namespaces together in a single view.

Prerequisites
  • Log in to the web console as a cluster administrator or a user with view permissions for all namespaces.

  • Optionally, log in with the oc command as well.

Procedure
  • Using the Metrics web interface:

    1. Go to the OpenShift Container Platform web console, switch to the Administrator Perspective, and click MonitoringMetrics.

      Cluster administrators, when using the Administrator Perspective, have access to all cluster metrics and to custom service metrics from all projects.

      Only cluster administrators have access to the third-party UIs provided with OpenShift Container Platform Monitoring.

    2. Use the PromQL interface to run queries for your services.

  • Using the Thanos Querier UI:

    In a future release, the route to the Thanos Querier UI will be deprecated in favor of the web console.

    1. List routes for the openshift-monitoring namespace:

      $ oc -n openshift-monitoring get routes

      The output shows the URL for the Thanos Querier UI:

      NAME                HOST/PORT
      ...
      thanos-querier      thanos-querier-openshift-monitoring.apps.example.devcluster.openshift.com
    2. Navigate to the listed URL. Here you can see all metrics from all namespaces.

Additional resources