×

There are times when you need to make changes to the operating systems running on OpenShift Container Platform nodes. This can include changing settings for network time service, adding kernel arguments, or configuring journaling in a specific way.

Aside from a few specialized features, most changes to operating systems on OpenShift Container Platform nodes can be done by creating what are referred to as MachineConfig objects that are managed by the Machine Config Operator.

Tasks in this section describe how to use features of the Machine Config Operator to configure operating system features on OpenShift Container Platform nodes.

Understanding the Machine Config Operator

Machine Config Operator

Purpose

The Machine Config Operator manages and applies configuration and updates of the base operating system and container runtime, including everything between the kernel and kubelet.

There are four components:

  • machine-config-server: Provides Ignition configuration to new machines joining the cluster.

  • machine-config-controller: Coordinates the upgrade of machines to the desired configurations defined by a MachineConfig object. Options are provided to control the upgrade for sets of machines individually.

  • machine-config-daemon: Applies new machine configuration during update. Validates and verifies the state of the machine to the requested machine configuration.

  • machine-config: Provides a complete source of machine configuration at installation, first start up, and updates for a machine.

Project

Machine config overview

The Machine Config Operator (MCO) manages updates to systemd, CRI-O and Kubelet, the kernel, Network Manager and other system features. It also offers a MachineConfig CRD that can write configuration files onto the host (see machine-config-operator). Understanding what MCO does and how it interacts with other components is critical to making advanced, system-level changes to an OpenShift Container Platform cluster. Here are some things you should know about MCO, machine configs, and how they are used:

  • A machine config can make a specific change to a file or service on the operating system of each system representing a pool of OpenShift Container Platform nodes.

  • MCO applies changes to operating systems in pools of machines. All OpenShift Container Platform clusters start with worker and control plane node pools. By adding more role labels, you can configure custom pools of nodes. For example, you can set up a custom pool of worker nodes that includes particular hardware features needed by an application. However, examples in this section focus on changes to the default pool types.

    A node can have multiple labels applied that indicate its type, such as master or worker, however it can be a member of only a single machine config pool.

  • Some machine configuration must be in place before OpenShift Container Platform is installed to disk. In most cases, this can be accomplished by creating a machine config that is injected directly into the OpenShift Container Platform installer process, instead of running as a post-installation machine config. In other cases, you might need to do bare metal installation where you pass kernel arguments at OpenShift Container Platform installer startup, to do such things as setting per-node individual IP addresses or advanced disk partitioning.

  • MCO manages items that are set in machine configs. Manual changes you do to your systems will not be overwritten by MCO, unless MCO is explicitly told to manage a conflicting file. In other words, MCO only makes specific updates you request, it does not claim control over the whole node.

  • Manual changes to nodes are strongly discouraged. If you need to decommission a node and start a new one, those direct changes would be lost.

  • MCO is only supported for writing to files in /etc and /var directories, although there are symbolic links to some directories that can be writeable by being symbolically linked to one of those areas. The /opt and /usr/local directories are examples.

  • Ignition is the configuration format used in MachineConfigs. See the Ignition Configuration Specification v3.2.0 for details.

  • Although Ignition config settings can be delivered directly at OpenShift Container Platform installation time, and are formatted in the same way that MCO delivers Ignition configs, MCO has no way of seeing what those original Ignition configs are. Therefore, you should wrap Ignition config settings into a machine config before deploying them.

  • When a file managed by MCO changes outside of MCO, the Machine Config Daemon (MCD) sets the node as degraded. It will not overwrite the offending file, however, and should continue to operate in a degraded state.

  • A key reason for using a machine config is that it will be applied when you spin up new nodes for a pool in your OpenShift Container Platform cluster. The machine-api-operator provisions a new machine and MCO configures it.

MCO uses Ignition as the configuration format. OpenShift Container Platform 4.6 moved from Ignition config specification version 2 to version 3.

What can you change with machine configs?

The kinds of components that MCO can change include:

  • config: Create Ignition config objects (see the Ignition configuration specification) to do things like modify files, systemd services, and other features on OpenShift Container Platform machines, including:

    • Configuration files: Create or overwrite files in the /var or /etc directory.

    • systemd units: Create and set the status of a systemd service or add to an existing systemd service by dropping in additional settings.

    • users and groups: Change SSH keys in the passwd section post-installation.

Changing SSH keys via machine configs is only supported for the core user.

  • kernelArguments: Add arguments to the kernel command line when OpenShift Container Platform nodes boot.

  • kernelType: Optionally identify a non-standard kernel to use instead of the standard kernel. Use realtime to use the RT kernel (for RAN). This is only supported on select platforms.

  • fips: Enable FIPS mode. FIPS should be set at installation-time setting and not a post-installation procedure.

The use of FIPS Validated / Modules in Process cryptographic libraries is only supported on OpenShift Container Platform deployments on the x86_64 architecture.

  • extensions: Extend RHCOS features by adding selected pre-packaged software. For this feature, available extensions include usbguard and kernel modules.

  • Custom resources (for ContainerRuntime and Kubelet): Outside of machine configs, MCO manages two special custom resources for modifying CRI-O container runtime settings (ContainerRuntime CR) and the Kubelet service (Kubelet CR).

The MCO is not the only Operator that can change operating system components on OpenShift Container Platform nodes. Other Operators can modify operating system-level features as well. One example is the Node Tuning Operator, which allows you to do node-level tuning through Tuned daemon profiles.

Tasks for the MCO configuration that can be done post-installation are included in the following procedures. See descriptions of RHCOS bare metal installation for system configuration tasks that must be done during or before OpenShift Container Platform installation.

There might be situations where the configuration on a node does not fully match what the currently-applied machine config specifies. This state is called configuration drift. The Machine Config Daemon (MCD) regularly checks the nodes for configuration drift. If the MCD detects configuration drift, the MCO marks the node degraded until an administrator corrects the node configuration. A degraded node is online and operational, but, it cannot be updated. For more information on configuration drift, see Understanding configuration drift detection.

Project

See the openshift-machine-config-operator GitHub site for details.

Understanding configuration drift detection

There might be situations when the on-disk state of a node differs from what is configured in the machine config. This is known as configuration drift. For example, a cluster admin might manually modify a file, a systemd unit file, or a file permission that was configured through a machine config. This causes configuration drift. Configuration drift can cause problems between nodes in a Machine Config Pool or when the machine configs are updated.

The Machine Config Operator (MCO) uses the Machine Config Daemon (MCD) to check nodes for configuration drift on a regular basis. If detected, the MCO sets the node and the machine config pool (MCP) to Degraded and reports the error. A degraded node is online and operational, but, it cannot be updated.

The MCD performs configuration drift detection upon each of the following conditions:

  • When a node boots.

  • After any of the files (Ignition files and systemd drop-in units) specified in the machine config are modified outside of the machine config.

  • Before a new machine config is applied.

    If you apply a new machine config to the nodes, the MCD temporarily shuts down configuration drift detection. This shutdown is needed because the new machine config necessarily differs from the machine config on the nodes. After the new machine config is applied, the MCD restarts detecting configuration drift using the new machine config.

When performing configuration drift detection, the MCD validates that the file contents and permissions fully match what the currently-applied machine config specifies. Typically, the MCD detects configuration drift in less than a second after the detection is triggered.

If the MCD detects configuration drift, the MCD performs the following tasks:

  • Emits an error to the console logs

  • Emits a Kubernetes event

  • Stops further detection on the node

  • Sets the node and MCP to degraded

You can check if you have a degraded node by listing the MCPs:

$ oc get mcp worker

If you have a degraded MCP, the DEGRADEDMACHINECOUNT field is non-zero, similar to the following output:

Example output
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-404caf3180818d8ac1f50c32f14b57c3   False     True       True       2              1                   1                     1                      5h51m

You can determine if the problem is caused by configuration drift by examining the machine config pool:

$ oc describe mcp worker
Example output
 ...
    Last Transition Time:  2021-12-20T18:54:00Z
    Message:               Node ci-ln-j4h8nkb-72292-pxqxz-worker-a-fjks4 is reporting: "content mismatch for file \"/etc/mco-test-file\"" (1)
    Reason:                1 nodes are reporting degraded status on sync
    Status:                True
    Type:                  NodeDegraded (2)
 ...
1 This message shows that a node’s /etc/mco-test-file file, which was added by the machine config, has changed outside of the machine config.
2 The state of the node is NodeDegraded.

Or, if you know which node is degraded, examine that node:

$ oc describe node/ci-ln-j4h8nkb-72292-pxqxz-worker-a-fjks4
Example output
 ...

Annotations:        cloud.network.openshift.io/egress-ipconfig: [{"interface":"nic0","ifaddr":{"ipv4":"10.0.128.0/17"},"capacity":{"ip":10}}]
                    csi.volume.kubernetes.io/nodeid:
                      {"pd.csi.storage.gke.io":"projects/openshift-gce-devel-ci/zones/us-central1-a/instances/ci-ln-j4h8nkb-72292-pxqxz-worker-a-fjks4"}
                    machine.openshift.io/machine: openshift-machine-api/ci-ln-j4h8nkb-72292-pxqxz-worker-a-fjks4
                    machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-67bd55d0b02b0f659aef33680693a9f9
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-67bd55d0b02b0f659aef33680693a9f9
                    machineconfiguration.openshift.io/reason: content mismatch for file "/etc/mco-test-file" (1)
                    machineconfiguration.openshift.io/state: Degraded (2)
 ...
1 The error message indicating that configuration drift was detected between the node and the listed machine config. Here the error message indicates that the contents of the /etc/mco-test-file, which was added by the machine config, has changed outside of the machine config.
2 The state of the node is Degraded.

You can correct configuration drift and return the node to the Ready state by performing one of the following remediations:

  • Ensure that the contents and file permissions of the files on the node match what is configured in the machine config. You can manually rewrite the file contents or change the file permissions.

  • Generate a force file on the degraded node. The force file causes the MCD to bypass the usual configuration drift detection and reapplies the current machine config.

    Generating a force file on a node causes that node to reboot.

Checking machine config pool status

To see the status of the Machine Config Operator, its sub-components, and the resources it manages, use the following oc commands:

Procedure
  1. To see the number of MCO-managed nodes available on your cluster for each pool, type:

    $ oc get machineconfigpool
    NAME      CONFIG                  UPDATED  UPDATING   DEGRADED  MACHINECOUNT  READYMACHINECOUNT  UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT  AGE
    master    rendered-master-dd…     True     False      False     3             3                  3                                0                     4h42m
    worker    rendered-worker-fde…    True     False      False     3             3                  3                                0                     4h42m

    In the previous output, there are three master and three worker nodes. All machines are updated and none are currently updating. Because all nodes are Updated and Ready and none are Degraded, you can tell that there are no issues.

  2. To see each existing machineconfig, type:

    $ oc get machineconfigs
    NAME                             GENERATEDBYCONTROLLER          IGNITIONVERSION  AGE
    00-master                        2c9371fbb673b97a6fe8b1c52...   3.2.0            5h18m
    00-worker                        2c9371fbb673b97a6fe8b1c52...   3.2.0            5h18m
    01-master-container-runtime      2c9371fbb673b97a6fe8b1c52...   3.2.0            5h18m
    01-master-kubelet                2c9371fbb673b97a6fe8b1c52…     3.2.0            5h18m
    ...
    rendered-master-dde...           2c9371fbb673b97a6fe8b1c52...   3.2.0            5h18m
    rendered-worker-fde...           2c9371fbb673b97a6fe8b1c52...   3.2.0            5h18m

    Note that the machineconfigs listed as rendered are not meant to be changed or deleted. Expect them to be hidden at some point in the future.

  3. Check the status of worker (or change to master) to see the status of that pool of nodes:

    $ oc describe mcp worker
    ...
      Degraded Machine Count:     0
      Machine Count:              3
      Observed Generation:        2
      Ready Machine Count:        3
      Unavailable Machine Count:  0
      Updated Machine Count:      3
    Events:                       <none>
  4. You can view the contents of a particular machine config (in this case, 01-master-kubelet). The trimmed output from the following oc describe command shows that this machineconfig contains both configuration files (cloud.conf and kubelet.conf) and a systemd service (Kubernetes Kubelet):

    $ oc describe machineconfigs 01-master-kubelet
    Name:         01-master-kubelet
    ...
    Spec:
      Config:
        Ignition:
          Version:  3.2.0
        Storage:
          Files:
            Contents:
              Source:   data:,
            Mode:       420
            Overwrite:  true
            Path:       /etc/kubernetes/cloud.conf
            Contents:
              Source:   data:,kind%3A%20KubeletConfiguration%0AapiVersion%3A%20kubelet.config.k8s.io%2Fv1beta1%0Aauthentication%3A%0A%20%20x509%3A%0A%20%20%20%20clientCAFile%3A%20%2Fetc%2Fkubernetes%2Fkubelet-ca.crt%0A%20%20anonymous...
            Mode:       420
            Overwrite:  true
            Path:       /etc/kubernetes/kubelet.conf
        Systemd:
          Units:
            Contents:  [Unit]
    Description=Kubernetes Kubelet
    Wants=rpc-statd.service network-online.target crio.service
    After=network-online.target crio.service
    
    ExecStart=/usr/bin/hyperkube \
        kubelet \
          --config=/etc/kubernetes/kubelet.conf \ ...

If something goes wrong with a machine config that you apply, you can always back out that change. For example, if you had run oc create -f ./myconfig.yaml to apply a machine config, you could remove that machine config by typing:

$ oc delete -f ./myconfig.yaml

If that was the only problem, the nodes in the affected pool should return to a non-degraded state. This actually causes the rendered configuration to roll back to its previously rendered state.

If you add your own machine configs to your cluster, you can use the commands shown in the previous example to check their status and the related status of the pool to which they are applied.

Using MachineConfig objects to configure nodes

You can use the tasks in this section to create MachineConfig objects that modify files, systemd unit files, and other operating system features running on OpenShift Container Platform nodes. For more ideas on working with machine configs, see content related to adding or updating SSH authorized keys, verifying image signatures, enabling SCTP, and configuring iSCSI initiatornames for OpenShift Container Platform.

OpenShift Container Platform supports Ignition specification version 3.2. All new machine configs you create going forward should be based on Ignition specification version 3.2. If you are upgrading your OpenShift Container Platform cluster, any existing Ignition specification version 2.x machine configs will be translated automatically to specification version 3.2.

There might be situations where the configuration on a node does not fully match what the currently-applied machine config specifies. This state is called configuration drift. The Machine Config Daemon (MCD) regularly checks the nodes for configuration drift. If the MCD detects configuration drift, the MCO marks the node degraded until an administrator corrects the node configuration. A degraded node is online and operational, but, it cannot be updated. For more information on configuration drift, see Understanding configuration drift detection.

Use the following "Configuring chrony time service" procedure as a model for how to go about adding other configuration files to OpenShift Container Platform nodes.

Configuring chrony time service

You can set the time server and related settings used by the chrony time service (chronyd) by modifying the contents of the chrony.conf file and passing those contents to your nodes as a machine config.

Procedure
  1. Create a Butane config including the contents of the chrony.conf file. For example, to configure chrony on worker nodes, create a 99-worker-chrony.bu file.

    See "Creating machine configs with Butane" for information about Butane.

    variant: openshift
    version: 4.10.0
    metadata:
      name: 99-worker-chrony (1)
      labels:
        machineconfiguration.openshift.io/role: worker (1)
    storage:
      files:
      - path: /etc/chrony.conf
        mode: 0644 (2)
        overwrite: true
        contents:
          inline: |
            pool 0.rhel.pool.ntp.org iburst (3)
            driftfile /var/lib/chrony/drift
            makestep 1.0 3
            rtcsync
            logdir /var/log/chrony
    1 On control plane nodes, substitute master for worker in both of these locations.
    2 Specify an octal value mode for the mode field in the machine config file. After creating the file and applying the changes, the mode is converted to a decimal value. You can check the YAML file with the command oc get mc <mc-name> -o yaml.
    3 Specify any valid, reachable time source, such as the one provided by your DHCP server. Alternately, you can specify any of the following NTP servers: 1.rhel.pool.ntp.org, 2.rhel.pool.ntp.org, or 3.rhel.pool.ntp.org.
  2. Use Butane to generate a MachineConfig object file, 99-worker-chrony.yaml, containing the configuration to be delivered to the nodes:

    $ butane 99-worker-chrony.bu -o 99-worker-chrony.yaml
  3. Apply the configurations in one of two ways:

    • If the cluster is not running yet, after you generate manifest files, add the MachineConfig object file to the <installation_directory>/openshift directory, and then continue to create the cluster.

    • If the cluster is already running, apply the file:

      $ oc apply -f ./99-worker-chrony.yaml

Disabling the chrony time service

You can disable the chrony time service (chronyd) for nodes with a specific role by using a MachineConfig custom resource (CR).

Prerequisites
  • Install the OpenShift CLI (oc).

  • Log in as a user with cluster-admin privileges.

Procedure
  1. Create the MachineConfig CR that disables chronyd for the specified node role.

    1. Save the following YAML in the disable-chronyd.yaml file:

      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        labels:
          machineconfiguration.openshift.io/role: <node_role> (1)
        name: disable-chronyd
      spec:
        config:
          ignition:
            version: 3.2.0
          systemd:
            units:
              - contents: |
                  [Unit]
                  Description=NTP client/server
                  Documentation=man:chronyd(8) man:chrony.conf(5)
                  After=ntpdate.service sntp.service ntpd.service
                  Conflicts=ntpd.service systemd-timesyncd.service
                  ConditionCapability=CAP_SYS_TIME
                  [Service]
                  Type=forking
                  PIDFile=/run/chrony/chronyd.pid
                  EnvironmentFile=-/etc/sysconfig/chronyd
                  ExecStart=/usr/sbin/chronyd $OPTIONS
                  ExecStartPost=/usr/libexec/chrony-helper update-daemon
                  PrivateTmp=yes
                  ProtectHome=yes
                  ProtectSystem=full
                  [Install]
                  WantedBy=multi-user.target
                enabled: false
                name: "chronyd.service"
      1 Node role where you want to disable chronyd, for example, master.
    2. Create the MachineConfig CR by running the following command:

      $ oc create -f disable-chronyd.yaml

Adding kernel arguments to nodes

In some special cases, you might want to add kernel arguments to a set of nodes in your cluster. This should only be done with caution and clear understanding of the implications of the arguments you set.

Improper use of kernel arguments can result in your systems becoming unbootable.

Examples of kernel arguments you could set include:

  • enforcing=0: Configures Security Enhanced Linux (SELinux) to run in permissive mode. In permissive mode, the system acts as if SELinux is enforcing the loaded security policy, including labeling objects and emitting access denial entries in the logs, but it does not actually deny any operations. While not recommended for production systems, permissive mode can be helpful for debugging.

  • nosmt: Disables symmetric multithreading (SMT) in the kernel. Multithreading allows multiple logical threads for each CPU. You could consider nosmt in multi-tenant environments to reduce risks from potential cross-thread attacks. By disabling SMT, you essentially choose security over performance.

  • systemd.unified_cgroup_hierarchy: Enables Linux control groups version 2 (cgroups v2). Cgroup v2 is the next version of the kernel control groups and offers multiple improvements.

See Kernel.org kernel parameters for a list and descriptions of kernel arguments.

In the following procedure, you create a MachineConfig object that identifies:

  • A set of machines to which you want to add the kernel argument. In this case, machines with a worker role.

  • Kernel arguments that are appended to the end of the existing kernel arguments.

  • A label that indicates where in the list of machine configs the change is applied.

Prerequisites
  • Have administrative privilege to a working OpenShift Container Platform cluster.

Procedure
  1. List existing MachineConfig objects for your OpenShift Container Platform cluster to determine how to label your machine config:

    $ oc get MachineConfig
    Example output
    NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
    00-master                                          52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    00-worker                                          52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-master-container-runtime                        52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-master-kubelet                                  52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-worker-container-runtime                        52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-worker-kubelet                                  52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-master-generated-registries                     52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-master-ssh                                                                                 3.2.0             40m
    99-worker-generated-registries                     52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-worker-ssh                                                                                 3.2.0             40m
    rendered-master-23e785de7587df95a4b517e0647e5ab7   52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    rendered-worker-5d596d9293ca3ea80c896a1191735bb1   52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
  2. Create a MachineConfig object file that identifies the kernel argument (for example, 05-worker-kernelarg-selinuxpermissive.yaml)

    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: worker(1)
      name: 05-worker-kernelarg-selinuxpermissive(2)
    spec:
      config:
        ignition:
          version: 3.2.0
      kernelArguments:
        - enforcing=0(3)
    1 Applies the new kernel argument only to worker nodes.
    2 Named to identify where it fits among the machine configs (05) and what it does (adds a kernel argument to configure SELinux permissive mode).
    3 Identifies the exact kernel argument as enforcing=0.
  3. Create the new machine config:

    $ oc create -f 05-worker-kernelarg-selinuxpermissive.yaml
  4. Check the machine configs to see that the new one was added:

    $ oc get MachineConfig
    Example output
    NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
    00-master                                          52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    00-worker                                          52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-master-container-runtime                        52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-master-kubelet                                  52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-worker-container-runtime                        52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-worker-kubelet                                  52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    05-worker-kernelarg-selinuxpermissive                                                         3.2.0             105s
    99-master-generated-registries                     52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-master-ssh                                                                                 3.2.0             40m
    99-worker-generated-registries                     52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-worker-ssh                                                                                 3.2.0             40m
    rendered-master-23e785de7587df95a4b517e0647e5ab7   52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    rendered-worker-5d596d9293ca3ea80c896a1191735bb1   52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
  5. Check the nodes:

    $ oc get nodes
    Example output
    NAME                           STATUS                     ROLES    AGE   VERSION
    ip-10-0-136-161.ec2.internal   Ready                      worker   28m   v1.23.0
    ip-10-0-136-243.ec2.internal   Ready                      master   34m   v1.23.0
    ip-10-0-141-105.ec2.internal   Ready,SchedulingDisabled   worker   28m   v1.23.0
    ip-10-0-142-249.ec2.internal   Ready                      master   34m   v1.23.0
    ip-10-0-153-11.ec2.internal    Ready                      worker   28m   v1.23.0
    ip-10-0-153-150.ec2.internal   Ready                      master   34m   v1.23.0

    You can see that scheduling on each worker node is disabled as the change is being applied.

  6. Check that the kernel argument worked by going to one of the worker nodes and listing the kernel command line arguments (in /proc/cmdline on the host):

    $ oc debug node/ip-10-0-141-105.ec2.internal
    Example output
    Starting pod/ip-10-0-141-105ec2internal-debug ...
    To use host binaries, run `chroot /host`
    
    sh-4.2# cat /host/proc/cmdline
    BOOT_IMAGE=/ostree/rhcos-... console=tty0 console=ttyS0,115200n8
    rootflags=defaults,prjquota rw root=UUID=fd0... ostree=/ostree/boot.0/rhcos/16...
    coreos.oem.id=qemu coreos.oem.id=ec2 ignition.platform.id=ec2 enforcing=0
    
    sh-4.2# exit

    You should see the enforcing=0 argument added to the other kernel arguments.

Enabling multipathing with kernel arguments on RHCOS

Red Hat Enterprise Linux CoreOS (RHCOS) supports multipathing on the primary disk, allowing stronger resilience to hardware failure to achieve higher host availability. Post-installation support is available by activating multipathing via the machine config.

Enabling multipathing during installation is supported and recommended for nodes provisioned in OpenShift Container Platform 4.8 or higher. In setups where any I/O to non-optimized paths results in I/O system errors, you must enable multipathing at installation time. For more information about enabling multipathing during installation time, see "Enabling multipathing with kernel arguments on RHCOS" in the Installing on bare metal documentation.

On IBM Z and LinuxONE, you can enable multipathing only if you configured your cluster for it during installation. For more information, see "Installing RHCOS and starting the OpenShift Container Platform bootstrap process" in Installing a cluster with z/VM on IBM Z and LinuxONE.

Prerequisites
  • You have a running OpenShift Container Platform cluster that uses version 4.7 or later.

  • You are logged in to the cluster as a user with administrative privileges.

Procedure
  1. To enable multipathing post-installation on control plane nodes:

    • Create a machine config file, such as 99-master-kargs-mpath.yaml, that instructs the cluster to add the master label and that identifies the multipath kernel argument, for example:

      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        labels:
          machineconfiguration.openshift.io/role: "master"
        name: 99-master-kargs-mpath
      spec:
        kernelArguments:
          - 'rd.multipath=default'
          - 'root=/dev/disk/by-label/dm-mpath-root'
  2. To enable multipathing post-installation on worker nodes:

    • Create a machine config file, such as 99-worker-kargs-mpath.yaml, that instructs the cluster to add the worker label and that identifies the multipath kernel argument, for example:

      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        labels:
          machineconfiguration.openshift.io/role: "worker"
        name: 99-worker-kargs-mpath
      spec:
        kernelArguments:
          - 'rd.multipath=default'
          - 'root=/dev/disk/by-label/dm-mpath-root'
  3. Create the new machine config by using either the master or worker YAML file you previously created:

    $ oc create -f ./99-master-kargs-mpath.yaml
  4. Check the machine configs to see that the new one was added:

    $ oc get MachineConfig
    Example output
    NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
    00-master                                          52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    00-worker                                          52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-master-container-runtime                        52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-master-kubelet                                  52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-worker-container-runtime                        52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-worker-kubelet                                  52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-master-kargs-mpath                              52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             105s
    99-master-generated-registries                     52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-master-ssh                                                                                 3.2.0             40m
    99-worker-generated-registries                     52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-worker-ssh                                                                                 3.2.0             40m
    rendered-master-23e785de7587df95a4b517e0647e5ab7   52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    rendered-worker-5d596d9293ca3ea80c896a1191735bb1   52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
  5. Check the nodes:

    $ oc get nodes
    Example output
    NAME                           STATUS                     ROLES    AGE   VERSION
    ip-10-0-136-161.ec2.internal   Ready                      worker   28m   v1.23.0
    ip-10-0-136-243.ec2.internal   Ready                      master   34m   v1.23.0
    ip-10-0-141-105.ec2.internal   Ready,SchedulingDisabled   worker   28m   v1.23.0
    ip-10-0-142-249.ec2.internal   Ready                      master   34m   v1.23.0
    ip-10-0-153-11.ec2.internal    Ready                      worker   28m   v1.23.0
    ip-10-0-153-150.ec2.internal   Ready                      master   34m   v1.23.0

    You can see that scheduling on each worker node is disabled as the change is being applied.

  6. Check that the kernel argument worked by going to one of the worker nodes and listing the kernel command line arguments (in /proc/cmdline on the host):

    $ oc debug node/ip-10-0-141-105.ec2.internal
    Example output
    Starting pod/ip-10-0-141-105ec2internal-debug ...
    To use host binaries, run `chroot /host`
    
    sh-4.2# cat /host/proc/cmdline
    ...
    rd.multipath=default root=/dev/disk/by-label/dm-mpath-root
    ...
    
    sh-4.2# exit

    You should see the added kernel arguments.

Additional resources

Enabling Linux control groups version 2 (cgroups v2)

You can enable Linux control groups version 2 (cgroups v2) on specific nodes in your cluster by using a machine config. The OpenShift Container Platform process for enabling cgroups v2 disables all cgroups version 1 controllers and hierarchies.

The OpenShift Container Platform cgroups version 2 feature is in Developer Preview and is not supported by Red Hat at this time.

Prerequisites
  • You have a running OpenShift Container Platform cluster that uses version 4.10 or later.

  • You are logged in to the cluster as a user with administrative privileges.

  • You have the node-role.kubernetes.io value for the node(s) you want to configure.

    $ oc describe node <node-name>
    Example output
    Name:               ci-ln-v05w5m2-72292-5s9ht-worker-a-r6fpg
    Roles:              worker
    Labels:             beta.kubernetes.io/arch=amd64
                        beta.kubernetes.io/instance-type=n1-standard-4
                        beta.kubernetes.io/os=linux
                        failure-domain.beta.kubernetes.io/region=us-central1
                        failure-domain.beta.kubernetes.io/zone=us-central1-a
                        kubernetes.io/arch=amd64
                        kubernetes.io/hostname=ci-ln-v05w5m2-72292-5s9ht-worker-a-r6fpg
                        kubernetes.io/os=linux
                        node-role.kubernetes.io/worker= (1)
    #...
    1 This value is the node role you need.
Procedure
  1. Enable cgroups v2 on nodes:

    • Create a machine config file YAML, such as worker-cgroups-v2.yaml:

      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        labels:
          machineconfiguration.openshift.io/role: "worker" (1)
        name: worker-enable-cgroups-v2
      spec:
        kernelArguments:
          - systemd.unified_cgroup_hierarchy=1 (2)
          - cgroup_no_v1="all" (3)
      1 Specifies the node-role.kubernetes.io value for the nodes you want to configure, such as master, worker, or infra.
      2 Enables cgroups v2 in systemd.
      3 Disables cgroups v1.
    • Create the new machine config:

      $ oc create -f worker-enable-cgroups-v2.yaml
  2. Check the machine configs to see that the new one was added:

    $ oc get MachineConfig
    Example output
    NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
    00-master                                          52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    00-worker                                          52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-master-container-runtime                        52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-master-kubelet                                  52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-worker-container-runtime                        52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-worker-kubelet                                  52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-master-generated-registries                     52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-master-ssh                                                                                 3.2.0             40m
    99-worker-generated-registries                     52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-worker-ssh                                                                                 3.2.0             40m
    rendered-master-23e785de7587df95a4b517e0647e5ab7   52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    rendered-worker-5d596d9293ca3ea80c896a1191735bb1   52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    worker-enable-cgroups-v2                                                                      3.2.0             10s
  3. Check the nodes to see that scheduling on each affected node is disabled. This indicates that the change is being applied:

    $ oc get nodes
    Example output
    NAME                                       STATUS                     ROLES    AGE   VERSION
    ci-ln-fm1qnwt-72292-99kt6-master-0         Ready                      master   58m   v1.23.0
    ci-ln-fm1qnwt-72292-99kt6-master-1         Ready                      master   58m   v