×

In logging subsystem documentation, LokiStack refers to the logging subsystem supported combination of Loki and web proxy with Red Hat OpenShift Service on AWS authentication integration. LokiStack’s proxy uses Red Hat OpenShift Service on AWS authentication to enforce multi-tenancy. Loki refers to the log store as either the individual component or an external store.

Loki is a horizontally scalable, highly available, multi-tenant log aggregation system currently offered as an alternative to Elasticsearch as a log store for the logging subsystem. Elasticsearch indexes incoming log records completely during ingestion. Loki only indexes a few fixed labels during ingestion and defers more complex parsing until after the logs have been stored. This means Loki can collect logs more quickly. You can query Loki by using the LogQL log query language.

Loki deployment sizing

Sizing for Loki follows the format of <N>x.<size> where the value <N> is number of instances and <size> specifies performance capabilities.

Table 1. Loki sizing
1x.extra-small 1x.small 1x.medium

Data transfer

100GB/day

500GB/day

2TB/day

Queries per second (QPS)

1-25 QPS at 200ms

25-50 QPS at 200ms

25-75 QPS at 200ms

Replication factor

2

2

2

Total CPU requests

14 vCPUs

34 vCPUs

54 vCPUs

Total CPU requests if using the ruler

16 vCPUs

42 vCPUs

70 vCPUs

Total memory requests

31Gi

67Gi

139Gi

Total memory requests if using the ruler

35Gi

83Gi

171Gi

Total disk requests

430Gi

430Gi

590Gi

Total disk requests if using the ruler

650Gi

650Gi

910Gi

Creating a new group for the cluster-admin user role

Querying application logs for multiple namespaces as a cluster-admin user, where the sum total of characters of all of the namespaces in the cluster is greater than 5120, results in the error Parse error: input size too long (XXXX > 5120). For better control over access to logs in LokiStack, make the cluster-admin user a member of the cluster-admin group. If the cluster-admin group does not exist, create it and add the desired users to it.

Use the following procedure to create a new group for users with cluster-admin permissions.

Procedure
  1. Enter the following command to create a new group:

    $ oc adm groups new cluster-admin
  2. Enter the following command to add the desired user to the cluster-admin group:

    $ oc adm groups add-users cluster-admin <username>
  3. Enter the following command to add cluster-admin user role to the group:

    $ oc adm policy add-cluster-role-to-group cluster-admin cluster-admin

Installing Loki Operator using the Red Hat OpenShift Service on AWS web console

To install and configure logging on your Red Hat OpenShift Service on AWS cluster, additional Operators must be installed. This can be done from the Operator Hub within the web console.

Red Hat OpenShift Service on AWS Operators use custom resources (CR) to manage applications and their components. High-level configuration and settings are provided by the user within a CR. The Operator translates high-level directives into low-level actions, based on best practices embedded within the Operator’s logic. A custom resource definition (CRD) defines a CR and lists all the configurations available to users of the Operator. Installing an Operator creates the CRDs, which are then used to generate CRs.

Prerequisites
  • Supported object store (AWS S3, Google Cloud Storage, Azure, Swift, Minio, OpenShift Data Foundation)

Procedure
  1. To install the Loki Operator in the Red Hat OpenShift Service on AWS web console, click OperatorsOperatorHub.

  2. Type Loki Operator in the filter by keyword box. Choose Loki Operator from the list of available Operators and click Install.

    The Community Loki Operator is not supported by Red Hat.

  3. On the Install Operator page, for Update channel select stable.

    The stable channel only provides updates to the most recent release of logging. To continue receiving updates for prior releases, you must change your subscription channel to stable-X where X is the version of logging you have installed.

    As the Loki Operator must be deployed to the global operator group namespace openshift-operators-redhat, Installation mode and Installed Namespace is already selected. If this namespace does not already exist, it is created for you.

  4. Select Enable operator-recommended cluster monitoring on this namespace.

    This option sets the openshift.io/cluster-monitoring: "true" label in the Namespace object. You must select this option to ensure that cluster monitoring scrapes the openshift-operators-redhat namespace.

  5. For Update approval select Automatic, then click Install.

    If the approval strategy in the subscription is set to Automatic, the update process initiates as soon as a new Operator version is available in the selected channel. If the approval strategy is set to Manual, you must manually approve pending updates.

The Operators should now be available to all users and projects that use this cluster.

Verification
  1. Navigate to OperatorsInstalled Operators.

  2. Make sure the openshift-logging project is selected.

  3. In the Status column, verify that you see green checkmarks with InstallSucceeded and the text Up to date.

An Operator might display a Failed status before the installation finishes. If the Operator install completes with an InstallSucceeded message, refresh the page.

Installing the Red Hat OpenShift Logging Operator using the Red Hat OpenShift Service on AWS web console

To install and configure logging on your Red Hat OpenShift Service on AWS cluster, additional Operators must be installed. This can be done from the Operator Hub within the web console.

Red Hat OpenShift Service on AWS Operators use custom resources (CR) to manage applications and their components. High-level configuration and settings are provided by the user within a CR. The Operator translates high-level directives into low-level actions, based on best practices embedded within the Operator’s logic. A custom resource definition (CRD) defines a CR and lists all the configurations available to users of the Operator. Installing an Operator creates the CRDs, which are then used to generate CRs.

Prerequisites
  • Supported object store (AWS S3, Google Cloud Storage, Azure, Swift, Minio, OpenShift Data Foundation)

Procedure
  1. To install the Red Hat OpenShift Logging Operator in the Red Hat OpenShift Service on AWS web console, click OperatorsOperatorHub.

    1. Type OpenShift Logging in the filter by keyword box. Choose Red Hat OpenShift Logging from the list of available Operators and click Install.

    2. On the Install Operator page, under Update channel select stable.

      The stable channel only provides updates to the most recent release of logging. To continue receiving updates for prior releases, you must change your subscription channel to stable-X where X is the version of logging you have installed.

      As the Red Hat OpenShift Logging Operator is only deployable to the openshift-logging namespace, Installation mode and Installed Namespace is already selected. If this namespace does not already exist, it is created for you.

    3. If you are creating the openshift-logging namespace, select the option to Enable Operator recommended cluster monitoring on this Namespace.

      If the openshift-logging namespace already exists, you must add the namespace label, openshift.io/cluster-monitoring: "true", to enable metrics service discovery.

    4. Under Update approval select Automatic.

      If the approval strategy in the subscription is set to Automatic, the update process initiates as soon as a new Operator version is available in the selected channel. If the approval strategy is set to Manual, you must manually approve pending updates.

    5. For Console plugin select Enable, then click Install.

The Operators should now be available to all users and projects that use this cluster.

Verification
  1. Navigate to OperatorsInstalled Operators.

  2. Make sure the openshift-logging project is selected.

  3. In the Status column, verify that you see green checkmarks with InstallSucceeded and the text Up to date.

An Operator might display a Failed status before the installation finishes. If the Operator install completes with an InstallSucceeded message, refresh the page.

Installing Loki Operator using the Red Hat OpenShift Service on AWS CLI

To install and configure logging on your Red Hat OpenShift Service on AWS cluster, additional Operators must be installed. This can be done from the Red Hat OpenShift Service on AWS CLI.

Red Hat OpenShift Service on AWS Operators use custom resources (CR) to manage applications and their components. High-level configuration and settings are provided by the user within a CR. The Operator translates high-level directives into low-level actions, based on best practices embedded within the Operator’s logic. A custom resource definition (CRD) defines a CR and lists all the configurations available to users of the Operator. Installing an Operator creates the CRDs, which are then used to generate CRs.

Prerequisites
  • Supported object store (AWS S3, Google Cloud Storage, Azure, Swift, Minio, OpenShift Data Foundation)

Procedure
  1. Install the Loki Operator by creating the following objects:

    1. Create a Subscription object YAML file (for example, olo-sub.yaml) to subscribe a namespace to the Loki Operator using the template below:

      $ oc create -f <file-name>.yaml
      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: loki-operator
        namespace: openshift-operators-redhat (1)
      spec:
        charsion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: loki-operator
        namespace: openshift-operators-redhat (1)
      spec:
        channel: stable (2)
        name: loki-operator
        source: redhat-operators (3)
        sourceNamespace: openshift-marketplace
      1 You must specify the openshift-operators-redhat namespace.
      2 Specify stable, or stable-5.<y> as the channel.
      3 Specify redhat-operators. If your Red Hat OpenShift Service on AWS cluster is installed on a restricted network, also known as a disconnected cluster, specify the name of the CatalogSource object you created when you configured the Operator Lifecycle Manager (OLM).
  2. Create a LokiStack instance:

    1. Create an instance object YAML file (for example, logging-loki.yaml) using the template below:

      $ oc create -f <file-name>.yaml
      apiVersion: loki.grafana.com/v1
      kind: LokiStack
      metadata:
        name: logging-loki
        namespace: openshift-logging
      spec:
        size: 1x.small (1)
        storage:
          schemas:
          - version: v12
            effectiveDate: "2022-06-01"
          secret:
            name: logging-loki-s3 (2)
            type: s3 (3)
        storageClassName: <storage_class_name> (4)
        tenants:
          mode: openshift-logging
      1 Supported size options for production instances of Loki are 1x.small and 1x.medium.
      2 Enter the name of your log store secret.
      3 Enter the type of your log store secret.
      4 Enter the name of an existing storage class for temporary storage. For best performance, specify a storage class that allocates block storage. Available storage classes for your cluster can be listed using oc get storageclasses.
Verification

Verify the installation by listing the pods in the openshift-logging project by running the following command:

$ oc get pods -n openshift-logging

You should see several pods for components of the Logging subsystem, similar to the following list:

Example output
$ oc get pods -n openshift-logging
NAME                                               READY   STATUS    RESTARTS   AGE
cluster-logging-operator-fb7f7cf69-8jsbq           1/1     Running   0          98m
collector-222js                                    2/2     Running   0          18m
collector-g9ddv                                    2/2     Running   0          18m
collector-hfqq8                                    2/2     Running   0          18m
collector-sphwg                                    2/2     Running   0          18m
collector-vv7zn                                    2/2     Running   0          18m
collector-wk5zz                                    2/2     Running   0          18m
logging-view-plugin-6f76fbb78f-n2n4n               1/1     Running   0          18m
lokistack-sample-compactor-0                       1/1     Running   0          42m
lokistack-sample-distributor-7d7688bcb9-dvcj8      1/1     Running   0          42m
lokistack-sample-gateway-5f6c75f879-bl7k9          2/2     Running   0          42m
lokistack-sample-gateway-5f6c75f879-xhq98          2/2     Running   0          42m
lokistack-sample-index-gateway-0                   1/1     Running   0          42m
lokistack-sample-ingester-0                        1/1     Running   0          42m
lokistack-sample-querier-6b7b56bccc-2v9q4          1/1     Running   0          42m
lokistack-sample-query-frontend-84fb57c578-gq2f7   1/1     Running   0          42m

Installing Red Hat OpenShift Logging Operator using the Red Hat OpenShift Service on AWS CLI

To install and configure logging on your Red Hat OpenShift Service on AWS cluster, additional Operators must be installed. This can be done from the Red Hat OpenShift Service on AWS CLI.

Red Hat OpenShift Service on AWS Operators use custom resources (CR) to manage applications and their components. High-level configuration and settings are provided by the user within a CR. The Operator translates high-level directives into low-level actions, based on best practices embedded within the Operator’s logic. A custom resource definition (CRD) defines a CR and lists all the configurations available to users of the Operator. Installing an Operator creates the CRDs, which are then used to generate CRs.

Prerequisites
  • Supported object store (AWS S3, Google Cloud Storage, Azure, Swift, Minio, OpenShift Data Foundation)

Procedure
  1. Install the Red Hat OpenShift Logging Operator by creating the following objects:

    1. Create an Operator Group object YAML file (for example, olo-og.yaml) using the template below:

      $ oc create -f <file-name>.yaml
      apiVersion: operators.coreos.com/v1
      kind: OperatorGroup
      metadata:
        name: cluster-logging
        namespace: openshift-logging (1)
      spec:
        targetNamespaces:
        - openshift-logging
      1 You must specify the openshift-logging namespace.
    2. Create a Subscription object YAML file (for example, olo-sub.yaml) to subscribe a namespace to the Red Hat OpenShift Logging Operator using the template below:

      $ oc create -f <file-name>.yaml
      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: cluster-logging
        namespace: openshift-logging (1)
      spec:
        channel: stable (2)
        name: cluster-logging
        source: redhat-operators (3)
        sourceNamespace: openshift-marketplace
      1 You must specify the openshift-logging namespace.
      2 Specify stable, or stable-5.<y> as the channel.
      3 Specify redhat-operators. If your Red Hat OpenShift Service on AWS cluster is installed on a restricted network, also known as a disconnected cluster, specify the name of the CatalogSource object you created when you configured the Operator Lifecycle Manager (OLM).
    3. Verify the Operator installation.

      There should be a Red Hat OpenShift Logging Operator in the openshift-logging namespace. The version number might be different than what is shown.

      $ oc get csv -n openshift-logging
      Example output
      NAME                            DISPLAY                            VERSION   REPLACES                        PHASE
      cluster-logging.v5.7.4          Red Hat OpenShift Logging          5.7.4     cluster-logging.v5.7.3          Succeeded
  2. Create a Red Hat OpenShift Logging Operator instance:

    1. Create an instance object YAML file (for example, olo-instance.yaml) using the template below:

      $ oc create -f <file-name>.yaml
      apiVersion: logging.openshift.io/v1
      kind: ClusterLogging
      metadata:
        name: instance
        namespace: openshift-logging
      spec:
        logStore:
          type: lokistack
          lokistack:
            name: logging-loki
        collection:
          type: vector
Verification

Verify the installation by listing the pods in the openshift-logging project by running the following command:

$ oc get pods -n openshift-logging

You should see several pods for components of the Logging subsystem, similar to the following list:

Example output
$ oc get pods -n openshift-logging
NAME                                               READY   STATUS    RESTARTS   AGE
cluster-logging-operator-fb7f7cf69-8jsbq           1/1     Running   0          98m
collector-222js                                    2/2     Running   0          18m
collector-g9ddv                                    2/2     Running   0          18m
collector-hfqq8                                    2/2     Running   0          18m
collector-sphwg                                    2/2     Running   0          18m
collector-vv7zn                                    2/2     Running   0          18m
collector-wk5zz                                    2/2     Running   0          18m
logging-view-plugin-6f76fbb78f-n2n4n               1/1     Running   0          18m
lokistack-sample-compactor-0                       1/1     Running   0          42m
lokistack-sample-distributor-7d7688bcb9-dvcj8      1/1     Running   0          42m
lokistack-sample-gateway-5f6c75f879-bl7k9          2/2     Running   0          42m
lokistack-sample-gateway-5f6c75f879-xhq98          2/2     Running   0          42m
lokistack-sample-index-gateway-0                   1/1     Running   0          42m
lokistack-sample-ingester-0                        1/1     Running   0          42m
lokistack-sample-querier-6b7b56bccc-2v9q4          1/1     Running   0          42m
lokistack-sample-query-frontend-84fb57c578-gq2f7   1/1     Running   0          42m

Configuring log storage

You can configure which log storage type your logging subsystem uses by modifying the ClusterLogging custom resource (CR).

Prerequisites
  • You have administrator permissions.

  • You have installed the OpenShift CLI (oc).

  • You have installed the Red Hat OpenShift Logging Operator and an internal log store that is either the LokiStack or Elasticsearch.

  • You have created a ClusterLogging CR.

The Elasticsearch Operator is deprecated and is planned to be removed in a future release. Red Hat provides bug fixes and support for this feature during the current release lifecycle, but this feature no longer receives enhancements. As an alternative to using the Elasticsearch Operator to manage the default log storage, you can use the Loki Operator.

Procedure
  1. Modify the ClusterLogging CR logStore spec:

    ClusterLogging CR example
    apiVersion: logging.openshift.io/v1
    kind: ClusterLogging
    metadata:
    # ...
    spec:
    # ...
      logStore:
        type: <log_store_type> (1)
        elasticsearch: (2)
          nodeCount: <integer>
          resources: {}
          storage: {}
          redundancyPolicy: <redundancy_type> (3)
        lokistack: (4)
          name: {}
    # ...
    1 Specify the log store type. This can be either lokistack or elasticsearch.
    2 Optional configuration options for the Elasticsearch log store.
    3 Specify the redundancy type. This value can be ZeroRedundancy, SingleRedundancy, MultipleRedundancy, or FullRedundancy.
    4 Optional configuration options for LokiStack.
  2. Apply the ClusterLogging CR by running the following command:

    $ oc apply -f <filename>.yaml

Loki object storage

The Loki Operator supports AWS S3, as well as other S3 compatible object stores such as Minio and OpenShift Data Foundation. Azure, GCS, and Swift are also supported.

The recommended nomenclature for Loki storage is logging-loki-<your_storage_provider>.

You can create a secret in the directory that contains your certificate and key files by using the following command:

$ oc create secret generic -n openshift-logging <your_secret_name> \
 --from-file=tls.key=<your_key_file>
 --from-file=tls.crt=<your_crt_file>
 --from-file=ca-bundle.crt=<your_bundle_file>
 --from-literal=username=<your_username>
 --from-literal=password=<your_password>

Use generic or opaque secrets for best results.

You can verify a secret has been created by running the following command:

$ oc get secrets

The following table shows the type values within the LokiStack custom resource (CR) for each storage provider. For more information, see the section on your storage provider.

Table 2. Secret type quick reference
Storage provider Secret type value

AWS

s3

Azure

azure

Google Cloud

gcs

Minio

s3

OpenShift Data Foundation

s3

Swift

swift

AWS storage

Prerequisites
Procedure
  • Create an object storage secret with the name logging-loki-aws by running the following command:

$ oc create secret generic logging-loki-aws \
  --from-literal=bucketnames="<bucket_name>" \
  --from-literal=endpoint="<aws_bucket_endpoint>" \
  --from-literal=access_key_id="<aws_access_key_id>" \
  --from-literal=access_key_secret="<aws_access_key_secret>" \
  --from-literal=region="<aws_region_of_your_bucket>"

Azure storage

Prerequisites
  • You have deployed Loki Operator.

  • You have created a bucket on Azure.

Procedure
  • Create an object storage secret with the name logging-loki-azure by running the following command:

$ oc create secret generic logging-loki-azure \
  --from-literal=container="<azure_container_name>" \
  --from-literal=environment="<azure_environment>" \ (1)
  --from-literal=account_name="<azure_account_name>" \
  --from-literal=account_key="<azure_account_key>"
1 Supported environment values are: AzureGlobal, AzureChinaCloud, AzureGermanCloud, AzureUSGovernment.

GCP storage

Prerequisites
  • You have deployed Loki Operator.

  • You have created a project on Google Cloud Platform.

  • You have created a bucket in the same project.

  • You have created a service account in the same project for GCP authentication.

Procedure
  1. Copy the service account credentials received from GCP into a file called key.json.

  2. Create an object storage secret with the name logging-loki-gcs by running the following command:

    $ oc create secret generic logging-loki-gcs \
      --from-literal=bucketname="<bucket_name>" \
      --from-file=key.json="<path/to/key.json>"

Minio storage

Prerequisites
  • You have deployed Loki Operator.

  • You have Minio deployed on your Cluster.

  • You have created a bucket on Minio.

Procedure
  • Create an object storage secret with the name logging-loki-minio by running the following command:

$ oc create secret generic logging-loki-minio \
  --from-literal=bucketnames="<bucket_name>" \
  --from-literal=endpoint="<minio_bucket_endpoint>" \
  --from-literal=access_key_id="<minio_access_key_id>" \
  --from-literal=access_key_secret="<minio_access_key_secret>"

ODF storage

Prerequisites
Procedure
  1. Create an ObjectBucketClaim custom resource in the openshift-logging namespace:

    apiVersion: objectbucket.io/v1alpha1
    kind: ObjectBucketClaim
    metadata:
      name: loki-bucket-odf
      namespace: openshift-logging
    spec:
      generateBucketName: loki-bucket-odf
      storageClassName: openshift-storage.noobaa.io
  2. Get bucket properties from the associated ConfigMap object by running the following command:

    BUCKET_HOST=$(oc get -n openshift-logging configmap loki-bucket-odf -o jsonpath='{.data.BUCKET_HOST}')
    BUCKET_NAME=$(oc get -n openshift-logging configmap loki-bucket-odf -o jsonpath='{.data.BUCKET_NAME}')
    BUCKET_PORT=$(oc get -n openshift-logging configmap loki-bucket-odf -o jsonpath='{.data.BUCKET_PORT}')
  3. Get bucket access key from the associated secret by running the following command:

    ACCESS_KEY_ID=$(oc get -n openshift-logging secret loki-bucket-odf -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 -d)
    SECRET_ACCESS_KEY=$(oc get -n openshift-logging secret loki-bucket-odf -o jsonpath='{.data.AWS_SECRET_ACCESS_KEY}' | base64 -d)
    1. Create an object storage secret with the name logging-loki-odf by running the following command:

      $ oc create -n openshift-logging secret generic logging-loki-odf \
      --from-literal=access_key_id="<access_key_id>" \
      --from-literal=access_key_secret="<secret_access_key>" \
      --from-literal=bucketnames="<bucket_name>" \
      --from-literal=endpoint="https://<bucket_host>:<bucket_port>"

Swift storage

Prerequisites
  • You have deployed Loki Operator.

  • You have created a bucket on Swift.

Procedure
  • Create an object storage secret with the name logging-loki-swift by running the following command:

$ oc create secret generic logging-loki-swift \
  --from-literal=auth_url="<swift_auth_url>" \
  --from-literal=username="<swift_usernameclaim>" \
  --from-literal=user_domain_name="<swift_user_domain_name>" \
  --from-literal=user_domain_id="<swift_user_domain_id>" \
  --from-literal=user_id="<swift_user_id>" \
  --from-literal=password="<swift_password>" \
  --from-literal=domain_id="<swift_domain_id>" \
  --from-literal=domain_name="<swift_domain_name>" \
  --from-literal=container_name="<swift_container_name>"
  • You can optionally provide project-specific data, region, or both by running the following command:

$ oc create secret generic logging-loki-swift \
  --from-literal=auth_url="<swift_auth_url>" \
  --from-literal=username="<swift_usernameclaim>" \
  --from-literal=user_domain_name="<swift_user_domain_name>" \
  --from-literal=user_domain_id="<swift_user_domain_id>" \
  --from-literal=user_id="<swift_user_id>" \
  --from-literal=password="<swift_password>" \
  --from-literal=domain_id="<swift_domain_id>" \
  --from-literal=domain_name="<swift_domain_name>" \
  --from-literal=container_name="<swift_container_name>" \
  --from-literal=project_id="<swift_project_id>" \
  --from-literal=project_name="<swift_project_name>" \
  --from-literal=project_domain_id="<swift_project_domain_id>" \
  --from-literal=project_domain_name="<swift_project_domain_name>" \
  --from-literal=region="<swift_region>"

LokiStack behavior during cluster restarts

In logging version 5.8 and newer versions, when an Red Hat OpenShift Service on AWS cluster is restarted, LokiStack ingestion and the query path continue to operate within the available CPU and memory resources available for the node. This means that there is no downtime for the LokiStack during Red Hat OpenShift Service on AWS cluster updates. This behavior is achieved by using PodDisruptionBudget resources. The Loki Operator provisions PodDisruptionBudget resources for Loki, which determine the minimum number of pods that must be available per component to ensure normal operations under certain conditions.

Configuring Loki to tolerate node failure

In the logging subsystem 5.8 and later versions, the Loki Operator supports setting pod anti-affinity rules to request that pods of the same component are scheduled on different available nodes in the cluster.

Affinity is a property of pods that controls the nodes on which they prefer to be scheduled. Anti-affinity is a property of pods that prevents a pod from being scheduled on a node.

In Red Hat OpenShift Service on AWS, pod affinity and pod anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled on based on the key-value labels on other pods.

The Operator sets default, preferred podAntiAffinity rules for all Loki components, which includes the compactor, distributor, gateway, indexGateway, ingester, querier, queryFrontend, and ruler components.

You can override the preferred podAntiAffinity settings for Loki components by configuring required settings in the requiredDuringSchedulingIgnoredDuringExecution field:

Example user settings for the ingester component
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
# ...
  template:
    ingester:
      podAntiAffinity:
      # ...
        requiredDuringSchedulingIgnoredDuringExecution: (1)
        - labelSelector:
            matchLabels: (2)
              app.kubernetes.io/component: ingester
          topologyKey: kubernetes.io/hostname
# ...
1 The stanza to define a required rule.
2 The key-value pair (label) that must be matched to apply the rule.

Zone aware data replication

In the logging subsystem 5.8 and later versions, the Loki Operator offers support for zone-aware data replication through pod topology spread constraints. Enabling this feature enhances reliability and safeguards against log loss in the event of a single zone failure. When configuring the deployment size as 1x.extra.small, 1x.small, or 1x.medium, the replication.factor field is automatically set to 2.

To ensure proper replication, you need to have at least as many availability zones as the replication factor specifies. While it is possible to have more availability zones than the replication factor, having fewer zones can lead to write failures. Each zone should host an equal number of instances for optimal operation.

Example LokiStack CR with zone replication enabled
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
 name: logging-loki
 namespace: openshift-logging
spec:
 replicationFactor: 2 (1)
 replication:
   factor: 2 (2)
   zones:
   -  maxSkew: 1 (3)
      topologyKey: topology.kubernetes.io/zone (4)
1 Deprecated field, values entered are overwritten by replication.factor.
2 This value is automatically set when deployment size is selected at setup.
3 The maximum difference in number of pods between any two topology domains. The default is 1, and you cannot specify a value of 0.
4 Defines zones in the form of a topology key that corresponds to a node label.

Recovering Loki pods from failed zones

In Red Hat OpenShift Service on AWS a zone failure happens when specific availability zone resources become inaccessible. Availability zones are isolated areas within a cloud provider’s data center, aimed at enhancing redundancy and fault tolerance. If your Red Hat OpenShift Service on AWS cluster isn’t configured to handle this, a zone failure can lead to service or data loss.

Loki pods are part of a StatefulSet, and they come with Persistent Volume Claims (PVCs) provisioned by a StorageClass object. Each Loki pod and its PVCs reside in the same zone. When a zone failure occurs in a cluster, the StatefulSet controller automatically attempts to recover the affected pods in the failed zone.

The following procedure will delete the PVCs in the failed zone, and all data contained therein. To avoid complete data loss the replication factor field of the LokiStack CR should always be set to a value greater than 1 to ensure that Loki is replicating.

Prerequisites
  • Logging version 5.8 or later.

  • Verify your LokiStack CR has a replication factor greater than 1.

  • Zone failure detected by the control plane, and nodes in the failed zone are marked by cloud provider integration.

The StatefulSet controller automatically attempts to reschedule pods in a failed zone. Because the associated PVCs are also in the failed zone, automatic rescheduling to a different zone does not work. You must manually delete the PVCs in the failed zone to allow successful re-creation of the stateful Loki Pod and its provisioned PVC in the new zone.

Procedure
  1. List the pods in Pending status by running the following command:

    oc get pods --field-selector status.phase==Pending -n openshift-logging
    Example oc get pods output
    NAME                           READY   STATUS    RESTARTS   AGE (1)
    logging-loki-index-gateway-1   0/1     Pending   0          17m
    logging-loki-ingester-1        0/1     Pending   0          16m
    logging-loki-ruler-1           0/1     Pending   0          16m
    1 These pods are in Pending status because their corresponding PVCs are in the failed zone.
  2. List the PVCs in Pending status by running the following command:

    oc get pvc -o=json -n openshift-logging | jq '.items[] | select(.status.phase == "Pending") | .metadata.name' -r
    Example oc get pvc output
    storage-logging-loki-index-gateway-1
    storage-logging-loki-ingester-1
    wal-logging-loki-ingester-1
    storage-logging-loki-ruler-1
    wal-logging-loki-ruler-1
  3. Delete the PVC(s) for a pod by running the following command:

    oc delete pvc __<pvc_name>__  -n openshift-logging
  4. Then delete the pod(s) by running the following command:

    oc delete pod __<pod_name>__  -n openshift-logging

Once these objects have been successfully deleted, they should automatically be rescheduled in an available zone.

Troubleshooting PVC in a terminating state

The PVCs might hang in the terminating state without being deleted, if PVC metadata finalizers are set to kubernetes.io/pv-protection. Removing the finalizers should allow the PVCs to delete successfully.

  1. Remove the finalizer for each PVC by running the command below, then retry deletion.

    oc patch pvc __<pvc_name>__ -p '{"metadata":{"finalizers":null}}' -n openshift-logging

Fine grained access for Loki logs

In logging subsystem 5.8 and later, the ClusterLogging Operator does not grant all users access to logs by default. As an administrator, you need to configure your users access unless the Operator was upgraded and prior configurations are in place. Depending on your configuration and need, you can configure fine grain access to logs using the following:

  • Cluster wide policies

  • Namespace scoped policies

  • Creation of custom admin groups

As an administrator, you need to create the role bindings and cluster role bindings appropriate for your deployment. The ClusterLogging Operator provides the following cluster roles:

  • cluster-logging-application-view grants permission to read application logs.

  • cluster-logging-infrastructure-view grants permission to read infrastructure logs.

  • cluster-logging-audit-view grants permission to read audit logs.

If you have upgraded from a prior version, an additional cluster role logging-application-logs-reader and associated cluster role binding logging-all-authenticated-application-logs-reader provide backward compatibility, allowing any authenticated user read access in their namespaces.

Users with access by namespace must provide a namespace when querying application logs.

Cluster wide access

Cluster role binding resources reference cluster roles, and set permissions cluster wide.

Example ClusterRoleBinding
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: logging-all-application-logs-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-logging-application-view (1)
subjects: (2)
- kind: Group
  name: system:authenticated
  apiGroup: rbac.authorization.k8s.io
1 Additional ClusterRoles are cluster-logging-infrastructure-view, and cluster-logging-audit-view.
2 Specifies the users or groups this object applies to.

Namespaced access

RoleBinding resources can be used with ClusterRole objects to define the namespace a user or group has access to logs for.

Example RoleBinding
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: allow-read-logs
  namespace: log-test-0 (1)
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-logging-application-view
subjects:
- kind: User
  apiGroup: rbac.authorization.k8s.io
  name: testuser-0
1 Specifies the namespace this RoleBinding applies to.

Custom admin group access

If you have a large deployment with a number of users who require broader permissions, you can create a custom group using the adminGroup field. Users who are members of any group specified in the adminGroups field of the LokiStack CR are considered admins. Admin users have access to all application logs in all namespaces, if they also get assigned the cluster-logging-application-view role.

Example LokiStack CR
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  tenants:
    mode: openshift-logging (1)
    openshift:
      adminGroups: (2)
      - cluster-admin
      - custom-admin-group (3)
1 Custom admin groups are only available in this mode.
2 Entering an empty list [] value for this field disables admin groups.
3 Overrides the default groups (system:cluster-admins, cluster-admin, dedicated-admin)

Enabling stream-based retention with Loki

Additional resources

With Logging version 5.6 and higher, you can configure retention policies based on log streams. Rules for these may be set globally, per tenant, or both. If you configure both, tenant rules apply before global rules.

  1. To enable stream-based retention, create a LokiStack custom resource (CR):

    Example global stream-based retention
    apiVersion: loki.grafana.com/v1
    kind: LokiStack
    metadata:
      name: logging-loki
      namespace: openshift-logging
    spec:
      limits:
        global: (1)
          retention: (2)
            days: 20
            streams:
            - days: 4
              priority: 1
              selector: '{kubernetes_namespace_name=~"test.+"}' (3)
            - days: 1
              priority: 1
              selector: '{log_type="infrastructure"}'
      managementState: Managed
      replicationFactor: 1
      size: 1x.small
      storage:
        schemas:
        - effectiveDate: "2020-10-11"
          version: v11
        secret:
          name: logging-loki-s3
          type: aws
      storageClassName: standard
      tenants:
        mode: openshift-logging
    1 Sets retention policy for all log streams. Note: This field does not impact the retention period for stored logs in object storage.
    2 Retention is enabled in the cluster when this block is added to the CR.
    3 Contains the LogQL query used to define the log stream.
    Example per-tenant stream-based retention
    apiVersion: loki.grafana.com/v1
    kind: LokiStack
    metadata:
      name: logging-loki
      namespace: openshift-logging
    spec:
      limits:
        global:
          retention:
            days: 20
        tenants: (1)
          application:
            retention:
              days: 1
              streams:
                - days: 4
                  selector: '{kubernetes_namespace_name=~"test.+"}' (2)
          infrastructure:
            retention:
              days: 5
              streams:
                - days: 1
                  selector: '{kubernetes_namespace_name=~"openshift-cluster.+"}'
      managementState: Managed
      replicationFactor: 1
      size: 1x.small
      storage:
        schemas:
        - effectiveDate: "2020-10-11"
          version: v11
        secret:
          name: logging-loki-s3
          type: aws
      storageClassName: standard
      tenants:
        mode: openshift-logging
    1 Sets retention policy by tenant. Valid tenant types are application, audit, and infrastructure.
    2 Contains the LogQL query used to define the log stream.
  2. Apply the LokiStack CR:

    $ oc apply -f <filename>.yaml

This is not for managing the retention for stored logs. Global retention periods for stored logs to a supported maximum of 30 days is configured with your object storage.

Troubleshooting Loki rate limit errors

If the Log Forwarder API forwards a large block of messages that exceeds the rate limit to Loki, Loki generates rate limit (429) errors.

These errors can occur during normal operation. For example, when adding the logging subsystem to a cluster that already has some logs, rate limit errors might occur while the logging subsystem tries to ingest all of the existing log entries. In this case, if the rate of addition of new logs is less than the total rate limit, the historical data is eventually ingested, and the rate limit errors are resolved without requiring user intervention.

In cases where the rate limit errors continue to occur, you can fix the issue by modifying the LokiStack custom resource (CR).

The LokiStack CR is not available on Grafana-hosted Loki. This topic does not apply to Grafana-hosted Loki servers.

Conditions
  • The Log Forwarder API is configured to forward logs to Loki.

  • Your system sends a block of messages that is larger than 2 MB to Loki. For example:

    "values":[["1630410392689800468","{\"kind\":\"Event\",\"apiVersion\":\
    .......
    ......
    ......
    ......
    \"received_at\":\"2021-08-31T11:46:32.800278+00:00\",\"version\":\"1.7.4 1.6.0\"}},\"@timestamp\":\"2021-08-31T11:46:32.799692+00:00\",\"viaq_index_name\":\"audit-write\",\"viaq_msg_id\":\"MzFjYjJkZjItNjY0MC00YWU4LWIwMTEtNGNmM2E5ZmViMGU4\",\"log_type\":\"audit\"}"]]}]}
  • After you enter oc logs -n openshift-logging -l component=collector, the collector logs in your cluster show a line containing one of the following error messages:

    429 Too Many Requests Ingestion rate limit exceeded
    Example Vector error message
    2023-08-25T16:08:49.301780Z  WARN sink{component_kind="sink" component_id=default_loki_infra component_type=loki component_name=default_loki_infra}: vector::sinks::util::retries: Retrying after error. error=Server responded with an error: 429 Too Many Requests internal_log_rate_limit=true
    Example Fluentd error message
    2023-08-30 14:52:15 +0000 [warn]: [default_loki_infra] failed to flush the buffer. retry_times=2 next_retry_time=2023-08-30 14:52:19 +0000 chunk="604251225bf5378ed1567231a1c03b8b" error_class=Fluent::Plugin::LokiOutput::LogPostError error="429 Too Many Requests Ingestion rate limit exceeded for user infrastructure (limit: 4194304 bytes/sec) while attempting to ingest '4082' lines totaling '7820025' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased\n"

    The error is also visible on the receiving end. For example, in the LokiStack ingester pod:

    Example Loki ingester error message
    level=warn ts=2023-08-30T14:57:34.155592243Z caller=grpc_logging.go:43 duration=1.434942ms method=/logproto.Pusher/Push err="rpc error: code = Code(429) desc = entry with timestamp 2023-08-30 14:57:32.012778399 +0000 UTC ignored, reason: 'Per stream rate limit exceeded (limit: 3MB/sec) while attempting to ingest for stream
Procedure
  • Update the ingestionBurstSize and ingestionRate fields in the LokiStack CR:

    apiVersion: loki.grafana.com/v1
    kind: LokiStack
    metadata:
      name: logging-loki
      namespace: openshift-logging
    spec:
      limits:
        global:
          ingestion:
            ingestionBurstSize: 16 (1)
            ingestionRate: 8 (2)
    # ...
    1 The ingestionBurstSize field defines the maximum local rate-limited sample size per distributor replica in MB. This value is a hard limit. Set this value to at least the maximum logs size expected in a single push request. Single requests that are larger than the ingestionBurstSize value are not permitted.
    2 The ingestionRate field is a soft limit on the maximum amount of ingested samples per second in MB. Rate limit errors occur if the rate of logs exceeds the limit, but the collector retries sending the logs. As long as the total average is lower than the limit, the system recovers and errors are resolved without user intervention.