Managing your cluster with multi-architecture compute machines - Configuring multi-architecture compute machines on an OpenShift cluster | Postinstallation configuration

Scheduling workloads on clusters with multi-architecture compute machines
- Sample multi-architecture node workload deployments
Enabling 64k pages on the Red Hat Enterprise Linux CoreOS (RHCOS) kernel
Importing manifest lists in image streams on your multi-architecture compute machines

Scheduling workloads on clusters with multi-architecture compute machines

Deploying a workload on a cluster with compute nodes of different architectures requires attention and monitoring of your cluster. There might be further actions you need to take in order to successfully place pods in the nodes of your cluster.

For more detailed information on node affinity, scheduling, taints and tolerlations, see the following documentatinon:

Sample multi-architecture node workload deployments

Before you schedule workloads on a cluster with compute nodes of different architectures, consider the following use cases:

Using node affinity to schedule workloads on a node

You can allow a workload to be scheduled on only a set of nodes with architectures supported by its images, you can set the spec.affinity.nodeAffinity field in your pod’s template specification.

Example deployment with the nodeAffinity set to certain architectures

apiVersion: apps/v1
kind: Deployment
metadata: # ...
spec:
   # ...
  template:
     # ...
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values: (1)
                - amd64
                - arm64

1	Specify the supported architectures. Valid values include `amd64`,`arm64`, or both values.

Tainting every node for a specific architecture

You can taint a node to avoid workloads that are not compatible with its architecture to be scheduled on that node. In the case where your cluster is using a MachineSet object, you can add parameters to the .spec.template.spec.taints field to avoid workloads being scheduled on nodes with non-supported architectures.

Before you can taint a node, you must scale down the MachineSet object or remove available machines. You can scale down the machine set by using one of following commands:
```
$ oc scale --replicas=0 machineset <machineset> -n openshift-machine-api
```
Or:
```
$ oc edit machineset <machineset> -n openshift-machine-api
```
For more information on scaling machine sets, see "Modifying a compute machine set".

Example MachineSet with a taint set

apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata: # ...
spec:
  # ...
  template:
    # ...
    spec:
      # ...
      taints:
      - effect: NoSchedule
        key: multi-arch.openshift.io/arch
        value: arm64

You can also set a taint on a specific node by running the following command:

$ oc adm taint nodes <node-name> multi-arch.openshift.io/arch=arm64:NoSchedule

Creating a default toleration

You can annotate a namespace so all of the workloads get the same default toleration by running the following command:

$ oc annotate namespace my-namespace \
  'scheduler.alpha.kubernetes.io/defaultTolerations'='[{"operator": "Exists", "effect": "NoSchedule", "key": "multi-arch.openshift.io/arch"}]'

Tolerating architecture taints in workloads

On a node with a defined taint, workloads will not be scheduled on that node. However, you can allow them to be scheduled by setting a toleration in the pod’s specification.

Example deployment with a toleration

apiVersion: apps/v1
kind: Deployment
metadata: # ...
spec:
  # ...
  template:
    # ...
    spec:
      tolerations:
      - key: "multi-arch.openshift.io/arch"
        value: "arm64"
        operator: "Equal"
        effect: "NoSchedule"

This example deployment can also be allowed on nodes with the multi-arch.openshift.io/arch=arm64 taint specified.

Using node affinity with taints and tolerations

When a scheduler computes the set of nodes to schedule a pod, tolerations can broaden the set while node affinity restricts the set. If you set a taint to the nodes of a specific architecture, the following example toleration is required for scheduling pods.

Example deployment with a node affinity and toleration set.

apiVersion: apps/v1
kind: Deployment
metadata: # ...
spec:
  # ...
  template:
    # ...
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values:
                - amd64
                - arm64
      tolerations:
      - key: "multi-arch.openshift.io/arch"
        value: "arm64"
        operator: "Equal"
        effect: "NoSchedule"

Additional resources

Modifying a compute machine set

Enabling 64k pages on the Red Hat Enterprise Linux CoreOS (RHCOS) kernel

You can enable the 64k memory page in the Red Hat Enterprise Linux CoreOS (RHCOS) kernel on the 64-bit ARM compute machines in your cluster. The 64k page size kernel specification can be used for large GPU or high memory workloads. This is done using the Machine Config Operator (MCO) which uses a machine config pool to update the kernel. To enable 64k page sizes, you must dedicate a machine config pool for ARM64 to enable on the kernel.

Using 64k pages is exclusive to 64-bit ARM architecture compute nodes or clusters installed on 64-bit ARM machines. If you configure the 64k pages kernel on a machine config pool using 64-bit x86 machines, the machine config pool and MCO will degrade.

Prerequisites:

You installed the OpenShift CLI (oc).
You created a cluster with compute nodes of different architecture on one of the supported platforms.

Procedure:

Label the nodes where you want to run the 64k page size kernel:

$ oc label node <node_name> <label>

Example command

$ oc label node worker-arm64-01 node-role.kubernetes.io/worker-64k-pages=

Create a machine config pool that contains the worker role that uses the ARM64 architecture and the worker-64k-pages role:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-64k-pages
spec:
  machineConfigSelector:
    matchExpressions:
      - key: machineconfiguration.openshift.io/role
        operator: In
        values:
        - worker
        - worker-64k-pages
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-64k-pages: ""
      kubernetes.io/arch: arm64

Create a machine config on your compute node to enable 64k-pages with the 64k-pages parameter.

$ oc create -f <filename>.yaml

Example MachineConfig

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: "worker-64k-pages" (1)
  name: 99-worker-64kpages
spec:
  kernelType: 64k-pages (2)

1	Specify the value of the `machineconfiguration.openshift.io/role` label in the custom machine config pool. The example MachineConfig uses the `worker-64k-pages` label to enable 64k pages in the `worker-64k-pages` pool.
2	Specify your desired kernel type. Valid values are `64k-pages` and `default`

The 64k-pages type is supported on only 64-bit ARM architecture based compute nodes. The realtime type is supported on only 64-bit x86 architecture based compute nodes.

Verification

To view your new worker-64k-pages machine config pool, run the following command:

$ oc get mcp

Example output

NAME     CONFIG                                                                UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-9d55ac9a91127c36314e1efe7d77fbf8                      True      False      False      3              3                   3                     0                      361d
worker   rendered-worker-e7b61751c4a5b7ff995d64b967c421ff                      True      False      False      7              7                   7                     0                      361d
worker-64k-pages  rendered-worker-64k-pages-e7b61751c4a5b7ff995d64b967c421ff   True      False      False      2              2                   2                     0                      35m

Importing manifest lists in image streams on your multi-architecture compute machines

On an OpenShift Container Platform 4.15 cluster with multi-architecture compute machines, the image streams in the cluster do not import manifest lists automatically. You must manually change the default importMode option to the PreserveOriginal option in order to import the manifest list.

Prerequisites

You installed the OpenShift Container Platform CLI (oc).

Procedure

The following example command shows how to patch the ImageStream cli-artifacts so that the cli-artifacts:latest image stream tag is imported as a manifest list.
```
$ oc patch is/cli-artifacts -n openshift -p '{"spec":{"tags":[{"name":"latest","importPolicy":{"importMode":"PreserveOriginal"}}]}}'
```

Verification

You can check that the manifest lists imported properly by inspecting the image stream tag. The following command will list the individual architecture manifests for a particular tag.

$ oc get istag cli-artifacts:latest -n openshift -oyaml

If the dockerImageManifests object is present, then the manifest list import was successful.

Example output of the dockerImageManifests object

dockerImageManifests:
  - architecture: amd64
    digest: sha256:16d4c96c52923a9968fbfa69425ec703aff711f1db822e4e9788bf5d2bee5d77
    manifestSize: 1252
    mediaType: application/vnd.docker.distribution.manifest.v2+json
    os: linux
  - architecture: arm64
    digest: sha256:6ec8ad0d897bcdf727531f7d0b716931728999492709d19d8b09f0d90d57f626
    manifestSize: 1252
    mediaType: application/vnd.docker.distribution.manifest.v2+json
    os: linux
  - architecture: ppc64le
    digest: sha256:65949e3a80349cdc42acd8c5b34cde6ebc3241eae8daaeea458498fedb359a6a
    manifestSize: 1252
    mediaType: application/vnd.docker.distribution.manifest.v2+json
    os: linux
  - architecture: s390x
    digest: sha256:75f4fa21224b5d5d511bea8f92dfa8e1c00231e5c81ab95e83c3013d245d1719
    manifestSize: 1252
    mediaType: application/vnd.docker.distribution.manifest.v2+json
    os: linux