Creating a cluster with multi-architecture compute machines on IBM Z and IBM LinuxONE with RHEL KVM

Verifying cluster compatibility
Creating RHCOS machines using virt-install
Approving the certificate signing requests for your machines

To create a cluster with multi-architecture compute machines on IBM Z® and IBM® LinuxONE (s390x) with RHEL KVM, you must have an existing single-architecture x86_64 cluster. You can then add s390x compute machines to your OpenShift Container Platform cluster.

Before you can add s390x nodes to your cluster, you must upgrade your cluster to one that uses the multi-architecture payload. For more information on migrating to the multi-architecture payload, see Migrating to a cluster with multi-architecture compute machines.

The following procedures explain how to create a RHCOS compute machine using a RHEL KVM instance. This will allow you to add s390x nodes to your cluster and deploy a cluster with multi-architecture compute machines.

To create an IBM Z® or IBM® LinuxONE (s390x) cluster with multi-architecture compute machines on x86_64, follow the instructions for Installing a cluster on IBM Z® and IBM® LinuxONE. You can then add x86_64 compute machines as described in Creating a cluster with multi-architecture compute machines on bare metal, IBM Power, or IBM Z.

Verifying cluster compatibility

Before you can start adding compute nodes of different architectures to your cluster, you must verify that your cluster is multi-architecture compatible.

Prerequisites

You installed the OpenShift CLI (oc)

Procedure

You can check that your cluster uses the architecture payload by running the following command:
```
$ oc adm release info -o jsonpath="{ .metadata.metadata}"
```

Verification

If you see the following output, then your cluster is using the multi-architecture payload:
```
{
 "release.openshift.io/architecture": "multi",
 "url": "https://access.redhat.com/errata/<errata_version>"
}
```
You can then begin adding multi-arch compute nodes to your cluster.
If you see the following output, then your cluster is not using the multi-architecture payload:
```
{
 "url": "https://access.redhat.com/errata/<errata_version>"
}
```
To migrate your cluster so the cluster supports multi-architecture compute machines, follow the procedure in Migrating to a cluster with multi-architecture compute machines.

Creating RHCOS machines using `virt-install`

You can create more Red Hat Enterprise Linux CoreOS (RHCOS) compute machines for your cluster by using virt-install.

Prerequisites

You have at least one LPAR running on RHEL 8.7 or later with KVM, referred to as RHEL KVM host in this procedure.
The KVM/QEMU hypervisor is installed on the RHEL KVM host.
You have a domain name server (DNS) that can perform hostname and reverse lookup for the nodes.
An HTTP or HTTPS server is set up.

Procedure

Disable UDP aggregation.

Currently, UDP aggregation is not supported on IBM Z® and is not automatically deactivated on multi-architecture compute clusters with an x86_64 control plane and additional s390x compute machines. To ensure that the addtional compute nodes are added to the cluster correctly, you must manually disable UDP aggregation.
1. Create a YAML file udp-aggregation-config.yaml with the following content:
  apiVersion: v1 kind: ConfigMap data: disable-udp-aggregation: "true" metadata: name: udp-aggregation-config namespace: openshift-network-operator
2. Create the ConfigMap resource by running the following command:
  $ oc create -f udp-aggregation-config.yaml

Extract the Ignition config file from the cluster by running the following command:

$ oc extract -n openshift-machine-api secret/worker-user-data-managed --keys=userData --to=- > worker.ign

Upload the worker.ign Ignition config file you exported from your cluster to your HTTP server. Note the URL of this file.
You can validate that the Ignition file is available on the URL. The following example gets the Ignition config file for the compute node:
```
$ curl -k http://<HTTP_server>/worker.ign
```

Download the RHEL live kernel, initramfs, and rootfs files by running the following commands:

 $ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \
| jq -r '.architectures.s390x.artifacts.metal.formats.pxe.kernel.location')

$ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \
| jq -r '.architectures.s390x.artifacts.metal.formats.pxe.initramfs.location')

$ curl -LO $(oc -n openshift-machine-config-operator get configmap/coreos-bootimages -o jsonpath='{.data.stream}' \
| jq -r '.architectures.s390x.artifacts.metal.formats.pxe.rootfs.location')

Move the downloaded RHEL live kernel, initramfs and rootfs files to an HTTP or HTTPS server before you launch virt-install.

Create the new KVM guest nodes using the RHEL kernel, initramfs, and Ignition files; the new disk image; and adjusted parm line arguments.

$ virt-install \
   --connect qemu:///system \
   --name <vm_name> \
   --autostart \
   --os-variant rhel9.2 \ (1)
   --cpu host \
   --vcpus <vcpus> \
   --memory <memory_mb> \
   --disk <vm_name>.qcow2,size=<image_size> \
   --network network=<virt_network_parm> \
   --location <media_location>,kernel=<rhcos_kernel>,initrd=<rhcos_initrd> \ (2)
   --extra-args "rd.neednet=1" \
   --extra-args "coreos.inst.install_dev=/dev/vda" \
   --extra-args "coreos.inst.ignition_url=<worker_ign>" \ (3)
   --extra-args "coreos.live.rootfs_url=<rhcos_rootfs>" \ (4)
   --extra-args "ip=<ip>::<default_gateway>:<subnet_mask_length>:<hostname>::none:<MTU>" \ (5)
   --extra-args "nameserver=<dns>" \
   --extra-args "console=ttysclp0" \
   --noautoconsole \
   --wait

For os-variant, specify the RHEL version for the RHCOS compute machine. rhel9.2 is the recommended version. To query the supported RHEL version of your operating system, run the following command:

$ osinfo-query os -f short-id

The os-variant is case sensitive.

2 For --location, specify the location of the kernel/initrd on the HTTP or HTTPS server.

3 For coreos.inst.ignition_url=, specify the worker.ign Ignition file for the machine role. Only HTTP and HTTPS protocols are supported.

4 For coreos.live.rootfs_url=, specify the matching rootfs artifact for the kernel and initramfs you are booting. Only HTTP and HTTPS protocols are supported.

5 Optional: For hostname, specify the fully qualified hostname of the client machine.

If you are using HAProxy as a load balancer, update your HAProxy rules for ingress-router-443 and ingress-router-80 in the /etc/haproxy/haproxy.cfg configuration file.

Continue to create more compute machines for your cluster.

Approving the certificate signing requests for your machines

When you add machines to a cluster, two pending certificate signing requests (CSRs) are generated for each machine that you added. You must confirm that these CSRs are approved or, if necessary, approve them yourself. The client requests must be approved first, followed by the server requests.

Prerequisites

You added machines to your cluster.

Procedure

Confirm that the cluster recognizes the machines:
```
$ oc get nodes
```
Example output
```
NAME      STATUS    ROLES   AGE  VERSION
master-0  Ready     master  63m  v1.28.5
master-1  Ready     master  63m  v1.28.5
master-2  Ready     master  64m  v1.28.5
```
The output lists all of the machines that you created.

The preceding output might not include the compute nodes, also known as worker nodes, until some CSRs are approved.

Review the pending CSRs and ensure that you see the client requests with the Pending or Approved status for each machine that you added to the cluster:

$ oc get csr

Example output

NAME        AGE     REQUESTOR                                                                   CONDITION
csr-8b2br   15m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-8vnps   15m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
...

In this example, two machines are joining the cluster. You might see more approved CSRs in the list.

If the CSRs were not approved, after all of the pending CSRs for the machines you added are in Pending status, approve the CSRs for your cluster machines:

Because the CSRs rotate automatically, approve your CSRs within an hour of adding the machines to the cluster. If you do not approve them within an hour, the certificates will rotate, and more than two certificates will be present for each node. You must approve all of these certificates. After the client CSR is approved, the Kubelet creates a secondary CSR for the serving certificate, which requires manual approval. Then, subsequent serving certificate renewal requests are automatically approved by the machine-approver if the Kubelet requests a new certificate with identical parameters.

For clusters running on platforms that are not machine API enabled, such as bare metal and other user-provisioned infrastructure, you must implement a method of automatically approving the kubelet serving certificate requests (CSRs). If a request is not approved, then the oc exec, oc rsh, and oc logs commands cannot succeed, because a serving certificate is required when the API server connects to the kubelet. Any operation that contacts the Kubelet endpoint requires this certificate approval to be in place. The method must watch for new CSRs, confirm that the CSR was submitted by the node-bootstrapper service account in the system:node or system:admin groups, and confirm the identity of the node.

To approve them individually, run the following command for each valid CSR:
```
$ oc adm certificate approve <csr_name> (1)
```
1 <csr_name> is the name of a CSR from the list of current CSRs.

To approve all pending CSRs, run the following command:

$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve

Some Operators might not become available until some CSRs are approved.

Now that your client requests are approved, you must review the server requests for each machine that you added to the cluster:

$ oc get csr

Example output

NAME        AGE     REQUESTOR                                                                   CONDITION
csr-bfd72   5m26s   system:node:ip-10-0-50-126.us-east-2.compute.internal                       Pending
csr-c57lv   5m26s   system:node:ip-10-0-95-157.us-east-2.compute.internal                       Pending
...

If the remaining CSRs are not approved, and are in the Pending status, approve the CSRs for your cluster machines:
- To approve them individually, run the following command for each valid CSR:
  $ oc adm certificate approve <csr_name> (1)
  1 <csr_name> is the name of a CSR from the list of current CSRs.
- To approve all pending CSRs, run the following command:
  $ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve

After all client and server CSRs have been approved, the machines have the Ready status. Verify this by running the following command:

$ oc get nodes

Example output

NAME      STATUS    ROLES   AGE  VERSION
master-0  Ready     master  73m  v1.28.5
master-1  Ready     master  73m  v1.28.5
master-2  Ready     master  74m  v1.28.5
worker-0  Ready     worker  11m  v1.28.5
worker-1  Ready     worker  11m  v1.28.5

It can take a few minutes after approval of the server CSRs for the machines to transition to the Ready status.

Additional information

For more information on CSRs, see Certificate Signing Requests.

Verifying cluster compatibility

Creating RHCOS machines using virt-install

Approving the certificate signing requests for your machines

Creating RHCOS machines using `virt-install`