Using tolerations to control logging pod placement - Logging | Observability

Understanding taints and tolerations
Loki pod placement
Using tolerations to control log collector pod placement
Configuring resources and scheduling for logging collectors
Viewing logging collector pods
Additional resources

Taints and tolerations allow the node to control which pods should (or should not) be scheduled on them.

Understanding taints and tolerations

A taint allows a node to refuse a pod to be scheduled unless that pod has a matching toleration.

You apply taints to a node through the Node specification (NodeSpec) and apply tolerations to a pod through the Pod specification (PodSpec). When you apply a taint a node, the scheduler cannot place a pod on that node unless the pod can tolerate the taint.

Example taint in a node specification

apiVersion: v1
kind: Node
metadata:
  name: my-node
#...
spec:
  taints:
  - effect: NoExecute
    key: key1
    value: value1
#...

Example toleration in a Pod spec

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
#...
spec:
  tolerations:
  - key: "key1"
    operator: "Equal"
    value: "value1"
    effect: "NoExecute"
    tolerationSeconds: 3600
#...

Taints and tolerations consist of a key, value, and effect.

Parameter Description

key

The key is any string, up to 253 characters. The key must begin with a letter or number, and may contain letters, numbers, hyphens, dots, and underscores.

value

The value is any string, up to 63 characters. The value must begin with a letter or number, and may contain letters, numbers, hyphens, dots, and underscores.

effect

The effect is one of the following:

NoSchedule ^[1]

New pods that do not match the taint are not scheduled onto that node.
Existing pods on the node remain.

PreferNoSchedule

New pods that do not match the taint might be scheduled onto that node, but the scheduler tries not to.
Existing pods on the node remain.

NoExecute

New pods that do not match the taint cannot be scheduled onto that node.
Existing pods on the node that do not have a matching toleration are removed.

operator

Equal

The key/value/effect parameters must match. This is the default.

Exists

The key/effect parameters must match. You must leave a blank value parameter, which matches any.

If you add a NoSchedule taint to a control plane node, the node must have the node-role.kubernetes.io/master=:NoSchedule taint, which is added by default.

For example:

apiVersion: v1
kind: Node
metadata:
  annotations:
    machine.openshift.io/machine: openshift-machine-api/ci-ln-62s7gtb-f76d1-v8jxv-master-0
    machineconfiguration.openshift.io/currentConfig: rendered-master-cdc1ab7da414629332cc4c3926e6e59c
  name: my-node
#...
spec:
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
#...

A toleration matches a taint:

If the operator parameter is set to Equal:
- the key parameters are the same;
- the value parameters are the same;
- the effect parameters are the same.
If the operator parameter is set to Exists:
- the key parameters are the same;
- the effect parameters are the same.

The following taints are built into Red Hat OpenShift Service on AWS:

node.kubernetes.io/not-ready: The node is not ready. This corresponds to the node condition Ready=False.
node.kubernetes.io/unreachable: The node is unreachable from the node controller. This corresponds to the node condition Ready=Unknown.
node.kubernetes.io/memory-pressure: The node has memory pressure issues. This corresponds to the node condition MemoryPressure=True.
node.kubernetes.io/disk-pressure: The node has disk pressure issues. This corresponds to the node condition DiskPressure=True.
node.kubernetes.io/network-unavailable: The node network is unavailable.
node.kubernetes.io/unschedulable: The node is unschedulable.
node.cloudprovider.kubernetes.io/uninitialized: When the node controller is started with an external cloud provider, this taint is set on a node to mark it as unusable. After a controller from the cloud-controller-manager initializes this node, the kubelet removes this taint.
node.kubernetes.io/pid-pressure: The node has pid pressure. This corresponds to the node condition PIDPressure=True.

Red Hat OpenShift Service on AWS does not set a default pid.available evictionHard.

edit

Loki pod placement

You can control which nodes the Loki pods run on, and prevent other workloads from using those nodes, by using tolerations or node selectors on the pods.

You can apply tolerations to the log store pods with the LokiStack custom resource (CR) and apply taints to a node with the node specification. A taint on a node is a key:value pair that instructs the node to repel all pods that do not allow the taint. Using a specific key:value pair that is not on other pods ensures that only the log store pods can run on that node.

Example LokiStack with node selectors

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
# ...
  template:
    compactor: (1)
      nodeSelector:
        node-role.kubernetes.io/infra: "" (2)
    distributor:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    gateway:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    indexGateway:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    ingester:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    querier:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    queryFrontend:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    ruler:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
# ...

1	Specifies the component pod type that applies to the node selector.
2	Specifies the pods that are moved to nodes containing the defined label.

In the previous example configuration, all Loki pods are moved to nodes containing the node-role.kubernetes.io/infra: "" label.

Example LokiStack CR with node selectors and tolerations

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
# ...
  template:
    compactor:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    distributor:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    indexGateway:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    ingester:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    querier:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    queryFrontend:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    ruler:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    gateway:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
# ...

To configure the nodeSelector and tolerations fields of the LokiStack (CR), you can use the oc explain command to view the description and fields for a particular resource:

$ oc explain lokistack.spec.template

Example output

KIND:     LokiStack
VERSION:  loki.grafana.com/v1

RESOURCE: template <Object>

DESCRIPTION:
     Template defines the resource/limits/tolerations/nodeselectors per
     component

FIELDS:
   compactor	<Object>
     Compactor defines the compaction component spec.

   distributor	<Object>
     Distributor defines the distributor component spec.
...

For more detailed information, you can add a specific field:

$ oc explain lokistack.spec.template.compactor

Example output

KIND:     LokiStack
VERSION:  loki.grafana.com/v1

RESOURCE: compactor <Object>

DESCRIPTION:
     Compactor defines the compaction component spec.

FIELDS:
   nodeSelector	<map[string]string>
     NodeSelector defines the labels required by a node to schedule the
     component onto it.
...

edit

Using tolerations to control log collector pod placement

By default, log collector pods have the following tolerations configuration:

apiVersion: v1
kind: Pod
metadata:
  name: collector-example
  namespace: openshift-logging
spec:
# ...
  collection:
    type: vector
    tolerations:
    - effect: NoSchedule
      key: node-role.kubernetes.io/master
      operator: Exists
    - effect: NoSchedule
      key: node.kubernetes.io/disk-pressure
      operator: Exists
    - effect: NoExecute
      key: node.kubernetes.io/not-ready
      operator: Exists
    - effect: NoExecute
      key: node.kubernetes.io/unreachable
      operator: Exists
    - effect: NoSchedule
      key: node.kubernetes.io/memory-pressure
      operator: Exists
    - effect: NoSchedule
      key: node.kubernetes.io/pid-pressure
      operator: Exists
    - effect: NoSchedule
      key: node.kubernetes.io/unschedulable
      operator: Exists
# ...

Prerequisites

You have installed the Red Hat OpenShift Logging Operator and OpenShift CLI (oc).

Procedure

Add a taint to a node where you want logging collector pods to schedule logging collector pods by running the following command:
```
$ oc adm taint nodes <node_name> <key>=<value>:<effect>
```
Example command
```
$ oc adm taint nodes node1 collector=node:NoExecute
```
This example places a taint on node1 that has key collector, value node, and taint effect NoExecute. You must use the NoExecute taint effect. NoExecute schedules only pods that match the taint and removes existing pods that do not match.

Edit the collection stanza of the ClusterLogging custom resource (CR) to configure a toleration for the logging collector pods:

apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
# ...
spec:
# ...
  collection:
    type: vector
    tolerations:
    - key: collector (1)
      operator: Exists (2)
      effect: NoExecute (3)
      tolerationSeconds: 6000 (4)
    resources:
      limits:
        memory: 2Gi
      requests:
        cpu: 100m
        memory: 1Gi
# ...

1	Specify the key that you added to the node.
2	Specify the `Exists` operator to require the `key`/`value`/`effect` parameters to match.
3	Specify the `NoExecute` effect.
4	Optionally, specify the `tolerationSeconds` parameter to set how long a pod can remain bound to a node before being evicted.

This toleration matches the taint created by the oc adm taint command. A pod with this toleration can be scheduled onto node1.

edit

Configuring resources and scheduling for logging collectors

Administrators can modify the resources or scheduling of the collector by creating a ClusterLogging custom resource (CR) that is in the same namespace and has the same name as the ClusterLogForwarder CR that it supports.

The applicable stanzas for the ClusterLogging CR when using multiple log forwarders in a deployment are managementState and collection. All other stanzas are ignored.

Prerequisites

You have administrator permissions.
You have installed the Red Hat OpenShift Logging Operator version 5.8 or newer.
You have created a ClusterLogForwarder CR.

Procedure

Create a ClusterLogging CR that supports your existing ClusterLogForwarder CR:

Example ClusterLogging CR YAML

apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
  name:  <name> (1)
  namespace: <namespace> (2)
spec:
  managementState: "Managed"
  collection:
    type: "vector"
    tolerations:
    - key: "logging"
      operator: "Exists"
      effect: "NoExecute"
      tolerationSeconds: 6000
    resources:
      limits:
        memory: 1Gi
      requests:
        cpu: 100m
        memory: 1Gi
    nodeSelector:
      collector: needed
# ...

1	The name must be the same name as the `ClusterLogForwarder` CR.
2	The namespace must be the same namespace as the `ClusterLogForwarder` CR.

Apply the ClusterLogging CR by running the following command:
```
$ oc apply -f <filename>.yaml
```

edit

Viewing logging collector pods

You can view the logging collector pods and the corresponding nodes that they are running on.

Procedure

Run the following command in a project to view the logging collector pods and their details:

$ oc get pods --selector component=collector -o wide -n <project_name>

Example output

NAME           READY  STATUS    RESTARTS   AGE     IP            NODE                  NOMINATED NODE   READINESS GATES
collector-8d69v  1/1    Running   0          134m    10.130.2.30   master1.example.com   <none>           <none>
collector-bd225  1/1    Running   0          134m    10.131.1.11   master2.example.com   <none>           <none>
collector-cvrzs  1/1    Running   0          134m    10.130.0.21   master3.example.com   <none>           <none>
collector-gpqg2  1/1    Running   0          134m    10.128.2.27   worker1.example.com   <none>           <none>
collector-l9j7j  1/1    Running   0          134m    10.129.2.31   worker2.example.com   <none>           <none>

edit

Using taints and tolerations to control logging pod placement

Understanding taints and tolerations

Loki pod placement

Using tolerations to control log collector pod placement

Configuring resources and scheduling for logging collectors

Viewing logging collector pods

Additional resources