Installing a cluster with Agent-based Installer - Installing an on-premise cluster with the Agent-based Installer | Installing

Prerequisites
Installing OpenShift Container Platform with the Agent-based Installer
Sample GitOps ZTP custom resources
Gathering log data from a failed Agent-based installation

Use the following procedures to install an OpenShift Container Platform cluster using the Agent-based Installer.

Prerequisites

You reviewed details about the OpenShift Container Platform installation and update processes.
You read the documentation on selecting a cluster installation method and preparing it for users.
If you use a firewall or proxy, you configured it to allow the sites that your cluster requires access to.

Installing OpenShift Container Platform with the Agent-based Installer

The following procedures deploy a single-node OpenShift Container Platform in a disconnected environment. You can use these procedures as a basis and modify according to your requirements.

Downloading the Agent-based Installer

Use this procedure to download the Agent-based Installer and the CLI needed for your installation.

Currently, downloading the Agent-based Installer is not supported on the IBM Z® (s390x) architecture. The recommended method is by creating PXE assets.

Procedure

Log in to the OpenShift Container Platform web console using your login credentials.
Navigate to Datacenter.
Click Run Agent-based Installer locally.
Select the operating system and architecture for the OpenShift Installer and Command line interface.
Click Download Installer to download and extract the install program.
You can either download or copy the pull secret by clicking on Download pull secret or Copy pull secret.
Click Download command-line tools and place the openshift-install binary in a directory that is on your PATH.

Creating the preferred configuration inputs

Use this procedure to create the preferred configuration inputs used to create the agent image.

Procedure

Install nmstate dependency by running the following command:
```
$ sudo dnf install /usr/bin/nmstatectl -y
```
Place the openshift-install binary in a directory that is on your PATH.
Create a directory to store the install configuration by running the following command:
```
$ mkdir ~/<directory_name>
```
This is the preferred method for the Agent-based installation. Using GitOps ZTP manifests is optional.

Create the install-config.yaml file:

$ cat << EOF > ./my-cluster/install-config.yaml
apiVersion: v1
baseDomain: test.example.com
compute:
- architecture: amd64 (1)
  hyperthreading: Enabled
  name: worker
  replicas: 0
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  replicas: 1
metadata:
  name: sno-cluster (2)
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 192.168.0.0/16
  networkType: OVNKubernetes (3)
  serviceNetwork:
  - 172.30.0.0/16
platform: (4)
  none: {}
pullSecret: '<pull_secret>' (5)
sshKey: '<ssh_pub_key>' (6)
EOF

1 Specify the system architecture, valid values are amd64, arm64, ppc64le, and s390x.

2 Required. Specify your cluster name.

3 The cluster network plugin to install. The default value OVNKubernetes is the only supported value.

Specify your platform.

For bare metal platforms, host settings made in the platform section of the install-config.yaml file are used by default, unless they are overridden by configurations made in the agent-config.yaml file.

5 Specify your pull secret.

6 Specify your SSH public key.

If you set the platform to vSphere or baremetal, you can configure IP address endpoints for cluster nodes in three ways:

IPv4
IPv6
IPv4 and IPv6 in parallel (dual-stack)

IPv6 is supported only on bare metal platforms.

Example of dual-stack networking

networking:
  clusterNetwork:
    - cidr: 172.21.0.0/16
      hostPrefix: 23
    - cidr: fd02::/48
      hostPrefix: 64
  machineNetwork:
    - cidr: 192.168.11.0/16
    - cidr: 2001:DB8::/32
  serviceNetwork:
    - 172.22.0.0/16
    - fd03::/112
  networkType: OVNKubernetes
platform:
  baremetal:
    apiVIPs:
    - 192.168.11.3
    - 2001:DB8::4
    ingressVIPs:
    - 192.168.11.4
    - 2001:DB8::5

Create the agent-config.yaml file:

$ cat > agent-config.yaml << EOF
apiVersion: v1beta1
kind: AgentConfig
metadata:
  name: sno-cluster
rendezvousIP: 192.168.111.80 (1)
hosts: (2)
  - hostname: master-0 (3)
    interfaces:
      - name: eno1
        macAddress: 00:ef:44:21:e6:a5
    rootDeviceHints: (4)
      deviceName: /dev/sdb
    networkConfig: (5)
      interfaces:
        - name: eno1
          type: ethernet
          state: up
          mac-address: 00:ef:44:21:e6:a5
          ipv4:
            enabled: true
            address:
              - ip: 192.168.111.80
                prefix-length: 23
            dhcp: false
      dns-resolver:
        config:
          server:
            - 192.168.111.1
      routes:
        config:
          - destination: 0.0.0.0/0
            next-hop-address: 192.168.111.2
            next-hop-interface: eno1
            table-id: 254
EOF

1	This IP address is used to determine which node performs the bootstrapping process as well as running the `assisted-service` component. You must provide the rendezvous IP address when you do not specify at least one host’s IP address in the `networkConfig` parameter. If this address is not provided, one IP address is selected from the provided hosts' `networkConfig`.
2	Optional: Host configuration. The number of hosts defined must not exceed the total number of hosts defined in the `install-config.yaml` file, which is the sum of the values of the `compute.replicas` and `controlPlane.replicas` parameters.
3	Optional: Overrides the hostname obtained from either the Dynamic Host Configuration Protocol (DHCP) or a reverse DNS lookup. Each host must have a unique hostname supplied by one of these methods.
4	Enables provisioning of the Red Hat Enterprise Linux CoreOS (RHCOS) image to a particular device. The installation program examines the devices in the order it discovers them, and compares the discovered values with the hint values. It uses the first discovered device that matches the hint value.
5	Optional: Configures the network interface of a host in NMState format.

Additional resources

Configuring regions and zones for a VMware vCenter

Optional: Creating additional manifest files

You can create additional manifests to further configure your cluster beyond the configurations available in the install-config.yaml and agent-config.yaml files.

Disk partitioning

In general, you should use the default disk partitioning that is created during the RHCOS installation. However, there are cases where you might want to create a separate partition for a directory that you expect to grow.

OpenShift Container Platform supports the addition of a single partition to attach storage to either the /var directory or a subdirectory of /var. For example:

/var/lib/containers: Holds container-related content that can grow as more images and containers are added to a system.
/var/lib/etcd: Holds data that you might want to keep separate for purposes such as performance optimization of etcd storage.
/var: Holds data that you might want to keep separate for purposes such as auditing.

For disk sizes larger than 100GB, and especially larger than 1TB, create a separate /var partition.

Storing the contents of a /var directory separately makes it easier to grow storage for those areas as needed and reinstall OpenShift Container Platform at a later date and keep that data intact. With this method, you will not have to pull all your containers again, nor will you have to copy massive log files when you update systems.

The use of a separate partition for the /var directory or a subdirectory of /var also prevents data growth in the partitioned directory from filling up the root file system.

The following procedure sets up a separate /var partition by adding a machine config manifest that is wrapped into the Ignition config file for a node type during the preparation phase of an installation.

Procedure

On your installation host, create the openshift subdirectory within the installation directory:
```
$ mkdir <installation_directory>/openshift
```

Create a Butane config that configures the additional partition. For example, name the file $HOME/clusterconfig/98-var-partition.bu, change the disk device name to the name of the storage device on the worker systems, and set the storage size as appropriate. This example places the /var directory on a separate partition:

variant: openshift
version: 4.15.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 98-var-partition
storage:
  disks:
  - device: /dev/disk/by-id/<device_name> (1)
    partitions:
    - label: var
      start_mib: <partition_start_offset> (2)
      size_mib: <partition_size> (3)
      number: 5
  filesystems:
    - device: /dev/disk/by-partlabel/var
      path: /var
      format: xfs
      mount_options: [defaults, prjquota] (4)
      with_mount_unit: true

1	The storage device name of the disk that you want to partition.
2	When adding a data partition to the boot disk, a minimum offset value of 25000 mebibytes is recommended. The root file system is automatically resized to fill all available space up to the specified offset. If no offset value is specified, or if the specified value is smaller than the recommended minimum, the resulting root file system will be too small, and future reinstalls of RHCOS might overwrite the beginning of the data partition.
3	The size of the data partition in mebibytes.
4	The `prjquota` mount option must be enabled for filesystems used for container storage.

When creating a separate /var partition, you cannot use different instance types for compute nodes, if the different instance types do not have the same device name.

Create a manifest from the Butane config and save it to the clusterconfig/openshift directory. For example, run the following command:
```
$ butane $HOME/clusterconfig/98-var-partition.bu -o $HOME/clusterconfig/openshift/98-var-partition.yaml
```

Optional: Using ZTP manifests

You can use GitOps Zero Touch Provisioning (ZTP) manifests to configure your installation beyond the options available through the install-config.yaml and agent-config.yaml files.

GitOps ZTP manifests can be generated with or without configuring the install-config.yaml and agent-config.yaml files beforehand. If you chose to configure the install-config.yaml and agent-config.yaml files, the configurations will be imported to the ZTP cluster manifests when they are generated.

Prerequisites

You have placed the openshift-install binary in a directory that is on your PATH.
Optional: You have created and configured the install-config.yaml and agent-config.yaml files.

Procedure

Use the following command to generate ZTP cluster manifests:

$ openshift-install agent create cluster-manifests --dir <installation_directory>

If you have created the install-config.yaml and agent-config.yaml files, those files are deleted and replaced by the cluster manifests generated through this command.

Any configurations made to the install-config.yaml and agent-config.yaml files are imported to the ZTP cluster manifests when you run the openshift-install agent create cluster-manifests command.

Navigate to the cluster-manifests directory:

$ cd <installation_directory>/cluster-manifests

Configure the manifest files in the cluster-manifests directory. For sample files, see the "Sample GitOps ZTP custom resources" section.
Disconnected clusters: If you did not define mirror configuration in the install-config.yaml file before generating the ZTP manifests, perform the following steps:
1. Navigate to the mirror directory:
  $ cd ../mirror
2. Configure the manifest files in the mirror directory.

Additional resources

Sample GitOps ZTP custom resources.
See Challenges of the network far edge to learn more about GitOps Zero Touch Provisioning (ZTP).

Optional: Encrypting the disk

Use this procedure to encrypt your disk or partition while installing OpenShift Container Platform with the Agent-based Installer.

Prerequisites

You have created and configured the install-config.yaml and agent-config.yaml files, unless you are using ZTP manifests.
You have placed the openshift-install binary in a directory that is on your PATH.

Procedure

Use the following command to generate ZTP cluster manifests:

$ openshift-install agent create cluster-manifests --dir <installation_directory>

If you have created the install-config.yaml and agent-config.yaml files, those files are deleted and replaced by the cluster manifests generated through this command.

If you have already generated ZTP manifests, skip this step.

Navigate to the cluster-manifests directory:

$ cd <installation_directory>/cluster-manifests

Add the following section to the agent-cluster-install.yaml file:

diskEncryption:
    enableOn: all (1)
    mode: tang (2)
    tangServers: "server1": "http://tang-server-1.example.com:7500" (3)

1	Specify which nodes to enable disk encryption on. Valid values are 'none', 'all', 'master', and 'worker'.
2	Specify which disk encryption mode to use. Valid values are 'tpmv2' and 'tang'.
3	Optional: If you are using Tang, specify the Tang servers.

Additional resources

About disk encryption

Creating and booting the agent image

Use this procedure to boot the agent image on your machines.

Procedure

Create the agent image by running the following command:

$ openshift-install --dir <install_directory> agent create image

Red Hat Enterprise Linux CoreOS (RHCOS) supports multipathing on the primary disk, allowing stronger resilience to hardware failure to achieve higher host availability. Multipathing is enabled by default in the agent ISO image, with a default /etc/multipath.conf configuration.

Boot the agent.x86_64.iso or agent.aarch64.iso image on the bare metal machines.

Verifying that the current installation host can pull release images

After you boot the agent image and network services are made available to the host, the agent console application performs a pull check to verify that the current host can retrieve release images.

If the primary pull check passes, you can quit the application to continue with the installation. If the pull check fails, the application performs additional checks, as seen in the Additional checks section of the TUI, to help you troubleshoot the problem. A failure for any of the additional checks is not necessarily critical as long as the primary pull check succeeds.

If there are host network configuration issues that might cause an installation to fail, you can use the console application to make adjustments to your network configurations.

If the agent console application detects host network configuration issues, the installation workflow will be halted until the user manually stops the console application and signals the intention to proceed.

Procedure

Wait for the agent console application to check whether or not the configured release image can be pulled from a registry.

If the agent console application states that the installer connectivity checks have passed, wait for the prompt to time out to continue with the installation.

You can still choose to view or change network configuration settings even if the connectivity checks have passed.

However, if you choose to interact with the agent console application rather than letting it time out, you must manually quit the TUI to proceed with the installation.

If the agent console application checks have failed, which is indicated by a red icon beside the Release image URL pull check, use the following steps to reconfigure the host’s network settings:
1. Read the Check Errors section of the TUI. This section displays error messages specific to the failed checks.
2. Select Configure network to launch the NetworkManager TUI.
3. Select Edit a connection and select the connection you want to reconfigure.
4. Edit the configuration and select OK to save your changes.
5. Select Back to return to the main screen of the NetworkManager TUI.
6. Select Activate a Connection.
7. Select the reconfigured network to deactivate it.
8. Select the reconfigured network again to reactivate it.
9. Select Back and then select Quit to return to the agent console application.
10. Wait at least five seconds for the continuous network checks to restart using the new network configuration.
11. If the Release image URL pull check succeeds and displays a green icon beside the URL, select Quit to exit the agent console application and continue with the installation.

Tracking and verifying installation progress

Use the following procedure to track installation progress and to verify a successful installation.

Prerequisites

You have configured a DNS record for the Kubernetes API server.

Procedure

Optional: To know when the bootstrap host (rendezvous host) reboots, run the following command:
```
$ ./openshift-install --dir <install_directory> agent wait-for bootstrap-complete \(1)
    --log-level=info (2)
```
1 For <install_directory>, specify the path to the directory where the agent ISO was generated.

2 To view different installation details, specify warn, debug, or error instead of info.
Example output
```
...................................................................
...................................................................
INFO Bootstrap configMap status is complete
INFO cluster bootstrap is complete
```
The command succeeds when the Kubernetes API server signals that it has been bootstrapped on the control plane machines.

To track the progress and verify successful installation, run the following command:

$ openshift-install --dir <install_directory> agent wait-for install-complete (1)

1	For `<install_directory>` directory, specify the path to the directory where the agent ISO was generated.

Example output

...................................................................
...................................................................
INFO Cluster is installed
INFO Install complete!
INFO To access the cluster as the system:admin user when using 'oc', run
INFO     export KUBECONFIG=/home/core/installer/auth/kubeconfig
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.sno-cluster.test.example.com

If you are using the optional method of GitOps ZTP manifests, you can configure IP address endpoints for cluster nodes through the AgentClusterInstall.yaml file in three ways:

IPv4
IPv6
IPv4 and IPv6 in parallel (dual-stack)

IPv6 is supported only on bare metal platforms.

Example of dual-stack networking

apiVIP: 192.168.11.3
ingressVIP: 192.168.11.4
clusterDeploymentRef:
  name: mycluster
imageSetRef:
  name: openshift-4.15
networking:
  clusterNetwork:
  - cidr: 172.21.0.0/16
    hostPrefix: 23
  - cidr: fd02::/48
    hostPrefix: 64
  machineNetwork:
  - cidr: 192.168.11.0/16
  - cidr: 2001:DB8::/32
  serviceNetwork:
  - 172.22.0.0/16
  - fd03::/112
  networkType: OVNKubernetes

Additional resources

See Deploying with dual-stack networking.
See Configuring the install-config yaml file.
See Configuring a three-node cluster to deploy three-node clusters in bare metal environments.
See About root device hints.
See NMState state examples.

Sample GitOps ZTP custom resources

Optional: You can use GitOps Zero Touch Provisioning (ZTP) custom resource (CR) objects to install an OpenShift Container Platform cluster with the Agent-based Installer.

You can customize the following GitOps ZTP custom resources to specify more details about your OpenShift Container Platform cluster. The following sample GitOps ZTP custom resources are for a single-node cluster.

agent-cluster-install.yaml

  apiVersion: extensions.hive.openshift.io/v1beta1
  kind: AgentClusterInstall
  metadata:
    name: test-agent-cluster-install
    namespace: cluster0
  spec:
    clusterDeploymentRef:
      name: ostest
    imageSetRef:
      name: openshift-4.15
    networking:
      clusterNetwork:
      - cidr: 10.128.0.0/14
        hostPrefix: 23
      serviceNetwork:
      - 172.30.0.0/16
    provisionRequirements:
      controlPlaneAgents: 1
      workerAgents: 0
    sshPublicKey: <YOUR_SSH_PUBLIC_KEY>

cluster-deployment.yaml

apiVersion: hive.openshift.io/v1
kind: ClusterDeployment
metadata:
  name: ostest
  namespace: cluster0
spec:
  baseDomain: test.metalkube.org
  clusterInstallRef:
    group: extensions.hive.openshift.io
    kind: AgentClusterInstall
    name: test-agent-cluster-install
    version: v1beta1
  clusterName: ostest
  controlPlaneConfig:
    servingCertificates: {}
  platform:
    agentBareMetal:
      agentSelector:
        matchLabels:
          bla: aaa
  pullSecretRef:
    name: pull-secret

cluster-image-set.yaml

apiVersion: hive.openshift.io/v1
kind: ClusterImageSet
metadata:
  name: openshift-4.15
spec:
  releaseImage: registry.ci.openshift.org/ocp/release:4.15.0-0.nightly-2022-06-06-025509

infra-env.yaml

apiVersion: agent-install.openshift.io/v1beta1
kind: InfraEnv
metadata:
  name: myinfraenv
  namespace: cluster0
spec:
  clusterRef:
    name: ostest
    namespace: cluster0
  cpuArchitecture: aarch64
  pullSecretRef:
    name: pull-secret
  sshAuthorizedKey: <YOUR_SSH_PUBLIC_KEY>
  nmStateConfigLabelSelector:
    matchLabels:
      cluster0-nmstate-label-name: cluster0-nmstate-label-value

nmstateconfig.yaml

apiVersion: agent-install.openshift.io/v1beta1
kind: NMStateConfig
metadata:
  name: master-0
  namespace: openshift-machine-api
  labels:
    cluster0-nmstate-label-name: cluster0-nmstate-label-value
spec:
  config:
    interfaces:
      - name: eth0
        type: ethernet
        state: up
        mac-address: 52:54:01:aa:aa:a1
        ipv4:
          enabled: true
          address:
            - ip: 192.168.122.2
              prefix-length: 23
          dhcp: false
    dns-resolver:
      config:
        server:
          - 192.168.122.1
    routes:
      config:
        - destination: 0.0.0.0/0
          next-hop-address: 192.168.122.1
          next-hop-interface: eth0
          table-id: 254
  interfaces:
    - name: "eth0"
      macAddress: 52:54:01:aa:aa:a1

pull-secret.yaml

apiVersion: v1
kind: Secret
type: kubernetes.io/dockerconfigjson
metadata:
  name: pull-secret
  namespace: cluster0
stringData:
  .dockerconfigjson: 'YOUR_PULL_SECRET'

Additional resources

See Challenges of the network far edge to learn more about GitOps Zero Touch Provisioning (ZTP).

Gathering log data from a failed Agent-based installation

Use the following procedure to gather log data about a failed Agent-based installation to provide for a support case.

Prerequisites

You have configured a DNS record for the Kubernetes API server.

Procedure

Run the following command and collect the output:

$ ./openshift-install --dir <install_directory> agent wait-for bootstrap-complete --log-level=debug

Example error message

...
ERROR Bootstrap failed to complete: : bootstrap process timed out: context deadline exceeded

If the output from the previous command indicates a failure, or if the bootstrap is not progressing, run the following command to connect to the rendezvous host and collect the output:
```
$ ssh core@<node-ip> agent-gather -O >agent-gather.tar.xz
```
Red Hat Support can diagnose most issues using the data gathered from the rendezvous host, but if some hosts are not able to register, gathering this data from every host might be helpful.
If the bootstrap completes and the cluster nodes reboot, run the following command and collect the output:
```
$ ./openshift-install --dir <install_directory> agent wait-for install-complete --log-level=debug
```
If the output from the previous command indicates a failure, perform the following steps:
1. Export the kubeconfig file to your environment by running the following command:
  $ export KUBECONFIG=<install_directory>/auth/kubeconfig
2. To gather information for debugging, run the following command:
  $ oc adm must-gather
3. Create a compressed file from the must-gather directory that was just created in your working directory by running the following command:
  $ tar cvaf must-gather.tar.gz <must_gather_directory>
Excluding the /auth subdirectory, attach the installation directory used during the deployment to your support case on the Red Hat Customer Portal.
Attach all other data gathered from this procedure to your support case.

1	For `<install_directory>`, specify the path to the directory where the agent ISO was generated.
2	To view different installation details, specify `warn`, `debug`, or `error` instead of `info`.

Installing an OpenShift Container Platform cluster with the Agent-based Installer

Prerequisites

Installing OpenShift Container Platform with the Agent-based Installer

Downloading the Agent-based Installer

Creating the preferred configuration inputs

Optional: Creating additional manifest files

Disk partitioning

Optional: Using ZTP manifests

Optional: Encrypting the disk

Creating and booting the agent image

Verifying that the current installation host can pull release images

Tracking and verifying installation progress

Sample GitOps ZTP custom resources

Gathering log data from a failed Agent-based installation