×

Use zero touch provisioning (ZTP) to provision distributed units at new edge sites in a disconnected environment. The workflow starts when the site is connected to the network and ends with the CNF workload deployed and running on the site nodes.

ZTP for RAN deployments is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Provisioning edge sites at scale

Telco edge computing presents extraordinary challenges with managing hundreds to tens of thousands of clusters in hundreds of thousands of locations. These challenges require fully-automated management solutions with, as closely as possible, zero human interaction.

Zero touch provisioning (ZTP) allows you to provision new edge sites with declarative configurations of bare-metal equipment at remote sites. Template or overlay configurations install OpenShift Container Platform features that are required for CNF workloads. End-to-end functional test suites are used to verify CNF related features. All configurations are declarative in nature.

You start the workflow by creating declarative configurations for ISO images that are delivered to the edge nodes to begin the installation process. The images are used to repeatedly provision large numbers of nodes efficiently and quickly, allowing you keep up with requirements from the field for far edge nodes.

Service providers are deploying a more distributed mobile network architecture allowed by the modular functional framework defined for 5G. This allows service providers to move from appliance-based radio access networks (RAN) to open cloud RAN architecture, gaining flexibility and agility in delivering services to end users.

The following diagram shows how ZTP works within a far edge framework.

ZTP in a far edge framework

The GitOps approach

ZTP uses the GitOps deployment set of practices for infrastructure deployment that allows developers to perform tasks that would otherwise fall under the purview of IT operations. GitOps achieves these tasks using declarative specifications stored in Git repositories, such as YAML files and other defined patterns, that provide a framework for deploying the infrastructure. The declarative output is leveraged by the Open Cluster Manager for multisite deployment.

One of the motivators for a GitOps approach is the requirement for reliability at scale. This is a significant challenge that GitOps helps solve.

GitOps addresses the reliability issue by providing traceability, RBAC, and a single source of truth for the desired state of each site. Scale issues are addressed by GitOps providing structure, tooling, and event driven operations through webhooks.

About ZTP and distributed units on single nodes

You can install a distributed unit (DU) on a single node at scale with Red Hat Advanced Cluster Management (RHACM) (ACM) using the assisted installer (AI) and the policy generator with core-reduction technology enabled. The DU installation is done using zero touch provisioning (ZTP) in a disconnected environment.

ACM manages clusters in a hub and spoke architecture, where a single hub cluster manages many spoke clusters. ACM applies radio access network (RAN) policies from predefined custom resources (CRs). Hub clusters running ACM provision and deploy the spoke clusters using ZTP and AI. DU installation follows the AI installation of OpenShift Container Platform on a single node.

The AI service handles provisioning of OpenShift Container Platform on single nodes running on bare metal. ACM ships with and deploys the assisted installer when the MultiClusterHub custom resource is installed.

With ZTP and AI, you can provision OpenShift Container Platform single nodes to run your DUs at scale. A high level overview of ZTP for distributed units in a disconnected environment is as follows:

  • A hub cluster running ACM manages a disconnected internal registry that mirrors the OpenShift Container Platform release images. The internal registry is used to provision the spoke single nodes.

  • You manage the bare-metal host machines for your DUs in an inventory file that uses YAML for formatting. You store the inventory file in a Git repository.

  • You install the DU bare-metal host machines on site, and make the hosts ready for provisioning. To be ready for provisioning, the following is required for each bare-metal host:

    • Network connectivity - including DNS for your network. Hosts should be reachable through the hub and managed spoke clusters. Ensure there is layer 3 connectivity between the hub and the host where you want to install your hub cluster.

    • Baseboard Management Controller (BMC) details for each host - ZTP uses BMC details to connect the URL and credentials for accessing the BMC. Create spoke cluster definition CRs. These define the relevant elements for the managed clusters. Required CRs are as follows:

      Custom Resource Description

      Namespace

      Namespace for the managed single-node cluster.

      BMCSecret CR

      Credentials for the host BMC.

      Image Pull Secret CR

      Pull secret for the disconnected registry.

      AgentClusterInstall

      Specifies the single-node cluster’s configuration such as networking, number of supervisor (control plane) nodes, and so on.

      ClusterDeployment

      Defines the cluster name, domain, and other details.

      KlusterletAddonConfig

      Manages installation and termination of add-ons on the ManagedCluster for ACM.

      ManagedCluster

      Describes the managed cluster for ACM.

      InfraEnv

      Describes the installation ISO to be mounted on the destination node that the assisted installer service creates. This is the final step of the manifest creation phase.

      BareMetalHost

      Describes the details of the bare-metal host, including BMC and credentials details.

  • When a change is detected in the host inventory repository, a host management event is triggered to provision the new or updated host.

  • The host is provisioned. When the host is provisioned and successfully rebooted, the host agent reports Ready status to the hub cluster.

Zero touch provisioning building blocks

ACM deploys single-node OpenShift, which is OpenShift Container Platform installed on single nodes, leveraging zero touch provisioning (ZTP). The initial site plan is broken down into smaller components and initial configuration data is stored in a Git repository. Zero touch provisioning uses a declarative GitOps approach to deploy these nodes. The deployment of the nodes includes:

  • Installing the host operating system (RHCOS) on a blank server.

  • Deploying OpenShift Container Platform on single nodes.

  • Creating cluster policies and site subscriptions.

  • Leveraging a GitOps deployment topology for a develop once, deploy anywhere model.

  • Making the necessary network configurations to the server operating system.

  • Deploying profile Operators and performing any needed software-related configuration, such as performance profile, PTP, and SR-IOV.

  • Downloading images needed to run workloads (CNFs).

Single-node clusters

You use zero touch provisioning (ZTP) to deploy single-node OpenShift clusters to run distributed units (DUs) on small hardware footprints at disconnected far edge sites. A single-node cluster runs OpenShift Container Platform on top of one bare-metal host, hence the single node. Edge servers contain a single node with supervisor functions and worker functions on the same host that are deployed at low bandwidth or disconnected edge sites.

OpenShift Container Platform is configured on the single node to use workload partitioning. Workload partitioning separates cluster management workloads from user workloads and can run the cluster management workloads on a reserved set of CPUs. Workload partitioning is useful for resource-constrained environments, such as single-node production deployments, where you want to reserve most of the CPU resources for user workloads and configure OpenShift Container Platform to use fewer CPU resources within the host.

A single-node cluster hosting a DU application on a node is divided into the following configuration categories:

  • Common - Values are the same for all single-node cluster sites managed by a hub cluster.

  • Pools of sites - Common across a pool of sites where a pool size can be 1 to n.

  • Site specific - Likely specific to a site with no overlap with other sites, for example, a vlan.

Site planning considerations for distributed unit deployments

Site planning for distributed units (DU) deployments is complex. The following is an overview of the tasks that you complete before the DU hosts are brought online in the production environment.

  • Develop a network model. The network model depends on various factors such as the size of the area of coverage, number of hosts, projected traffic load, DNS, and DHCP requirements.

  • Decide how many DU radio nodes are required to provide sufficient coverage and redundancy for your network.

  • Develop mechanical and electrical specifications for the DU host hardware.

  • Develop a construction plan for individual DU site installations.

  • Tune host BIOS settings for production, and deploy the BIOS configuration to the hosts.

  • Install the equipment on-site, connect hosts to the network, and apply power.

  • Configure on-site switches and routers.

  • Perform basic connectivity tests for the host machines.

  • Establish production network connectivity, and verify host connections to the network.

  • Provision and deploy on-site DU hosts at scale.

  • Test and verify on-site operations, performing load and scale testing of the DU hosts before finally bringing the DU infrastructure online in the live production environment.

Low latency for distributed units (DUs)

Low latency is an integral part of the development of 5G networks. Telecommunications networks require as little signal delay as possible to ensure quality of service in a variety of critical use cases.

Low latency processing is essential for any communication with timing constraints that affect functionality and security. For example, 5G Telco applications require a guaranteed one millisecond one-way latency to meet Internet of Things (IoT) requirements. Low latency is also critical for the future development of autonomous vehicles, smart factories, and online gaming. Networks in these environments require almost a real-time flow of data.

Low latency systems are about guarantees with regards to response and processing times. This includes keeping a communication protocol running smoothly, ensuring device security with fast responses to error conditions, or just making sure a system is not lagging behind when receiving a lot of data. Low latency is key for optimal synchronization of radio transmissions.

OpenShift Container Platform enables low latency processing for DUs running on COTS hardware by using a number of technologies and specialized hardware devices:

Real-time kernel for RHCOS

Ensures workloads are handled with a high degree of process determinism.

CPU isolation

Avoids CPU scheduling delays and ensures CPU capacity is available consistently.

NUMA awareness

Aligns memory and huge pages with CPU and PCI devices to pin guaranteed container memory and huge pages to the NUMA node. This decreases latency and improves performance of the node.

Huge pages memory management

Using huge page sizes improves system performance by reducing the amount of system resources required to access page tables.

Precision timing synchronization using PTP

Allows synchronization between nodes in the network with sub-microsecond accuracy.

Configuring BIOS for distributed unit bare-metal hosts

Distributed unit (DU) hosts require the BIOS to be configured before the host can be provisioned. The BIOS configuration is dependent on the specific hardware that runs your DUs and the particular requirements of your installation.

In this Developer Preview release, configuration and tuning of BIOS for DU bare-metal host machines is the responsibility of the customer. Automatic setting of BIOS is not handled by the zero touch provisioning workflow.

Procedure
  1. Set the UEFI/BIOS Boot Mode to UEFI.

  2. In the host boot sequence order, set Hard drive first.

  3. Apply the specific BIOS configuration for your hardware. The following table describes a representative BIOS configuration for an Intel Xeon Skylake or Intel Cascade Lake server, based on the Intel FlexRAN 4G and 5G baseband PHY reference design.

    The exact BIOS configuration depends on your specific hardware and network requirements. The following sample configuration is for illustrative purposes only.

    Table 1. Sample BIOS configuration for an Intel Xeon Skylake or Cascade Lake server
    BIOS Setting Configuration

    CPU Power and Performance Policy

    Performance

    Uncore Frequency Scaling

    Disabled

    Performance P-limit

    Disabled

    Enhanced Intel SpeedStep ® Tech

    Enabled

    Intel Configurable TDP

    Enabled

    Configurable TDP Level

    Level 2

    Intel® Turbo Boost Technology

    Enabled

    Energy Efficient Turbo

    Disabled

    Hardware P-States

    Disabled

    Package C-State

    C0/C1 state

    C1E

    Disabled

    Processor C6

    Disabled

Enable global SR-IOV and VT-d settings in the BIOS for the host. These settings are relevant to bare-metal environments.

Preparing the disconnected environment

Before you can provision distributed units (DU) at scale, you must install Red Hat Advanced Cluster Management (RHACM), which handles the provisioning of the DUs.

RHACM is deployed as an Operator on the OpenShift Container Platform hub cluster. It controls clusters and applications from a single console with built-in security policies. RHACM provisions and manage your DU hosts. To install RHACM in a disconnected environment, you create a mirror registry that mirrors the Operator Lifecycle Manager (OLM) catalog that contains the required Operator images. OLM manages, installs, and upgrades Operators and their dependencies in the cluster.

You also use a disconnected mirror host to serve the RHCOS ISO and RootFS disk images that provision the DU bare-metal host operating system.

Before you install a cluster on infrastructure that you provision in a restricted network, you must mirror the required container images into that environment. You can also use this procedure in unrestricted networks to ensure your clusters only use container images that have satisfied your organizational controls on external content.

You must have access to the internet to obtain the necessary container images. In this procedure, you place the mirror registry on a mirror host that has access to both your network and the internet. If you do not have access to a mirror host, use the disconnected procedure to copy images to a device that you can move across network boundaries.

Disconnected environment prerequisites

You must have a container image registry that supports Docker v2-2 in the location that will host the OpenShift Container Platform cluster, such as one of the following registries:

If you have an entitlement to Red Hat Quay, see the documentation on deploying Red Hat Quay for proof-of-concept purposes or by using the Quay Operator. If you need additional assistance selecting and installing a registry, contact your sales representative or Red Hat support.

Red Hat does not test third party registries with OpenShift Container Platform.

About the mirror registry

You can mirror the images that are required for OpenShift Container Platform installation and subsequent product updates to a container mirror registry such as Red Hat Quay, JFrog Artifactory, Sonatype Nexus Repository, or Harbor. If you do not have access to a large-scale container registry, you can use the mirror registry for Red Hat OpenShift, a small-scale container registry included with OpenShift Container Platform subscriptions.

You can use any container registry that supports Docker v2-2, such as Red Hat Quay, the mirror registry for Red Hat OpenShift, Artifactory, Sonatype Nexus Repository, or Harbor. Regardless of your chosen registry, the procedure to mirror content from Red Hat hosted sites on the internet to an isolated image registry is the same. After you mirror the content, you configure each cluster to retrieve this content from your mirror registry.

The internal registry of the OpenShift Container Platform cluster cannot be used as the target registry because it does not support pushing without a tag, which is required during the mirroring process.

If choosing a container registry that is not the mirror registry for Red Hat OpenShift, it must be reachable by every machine in the clusters that you provision. If the registry is unreachable, installation, updating, or normal operations such as workload relocation might fail. For that reason, you must run mirror registries in a highly available way, and the mirror registries must at least match the production availability of your OpenShift Container Platform clusters.

When you populate your mirror registry with OpenShift Container Platform images, you can follow two scenarios. If you have a host that can access both the internet and your mirror registry, but not your cluster nodes, you can directly mirror the content from that machine. This process is referred to as connected mirroring. If you have no such host, you must mirror the images to a file system and then bring that host or removable media into your restricted environment. This process is referred to as disconnected mirroring.

For mirrored registries, to view the source of pulled images, you must review the Trying to access log entry in the CRI-O logs. Other methods to view the image pull source, such as using the crictl images command on a node, show the non-mirrored image name, even though the image is pulled from the mirrored location.

Red Hat does not test third party registries with OpenShift Container Platform.

Additional resources

Preparing your mirror host

Before you perform the mirror procedure, you must prepare the host to retrieve content and push it to the remote location.

Installing the OpenShift CLI by downloading the binary

You can install the OpenShift CLI (oc) to interact with OpenShift Container Platform from a command-line interface. You can install oc on Linux, Windows, or macOS.

If you installed an earlier version of oc, you cannot use it to complete all of the commands in OpenShift Container Platform 4.9. Download and install the new version of oc.

Installing the OpenShift CLI on Linux

You can install the OpenShift CLI (oc) binary on Linux by using the following procedure.

Procedure
  1. Navigate to the OpenShift Container Platform downloads page on the Red Hat Customer Portal.

  2. Select the appropriate version in the Version drop-down menu.

  3. Click Download Now next to the OpenShift v4.9 Linux Client entry and save the file.

  4. Unpack the archive:

    $ tar xvf <file>
  5. Place the oc binary in a directory that is on your PATH.

    To check your PATH, execute the following command:

    $ echo $PATH

After you install the OpenShift CLI, it is available using the oc command:

$ oc <command>
Installing the OpenShift CLI on Windows

You can install the OpenShift CLI (oc) binary on Windows by using the following procedure.

Procedure
  1. Navigate to the OpenShift Container Platform downloads page on the Red Hat Customer Portal.

  2. Select the appropriate version in the Version drop-down menu.

  3. Click Download Now next to the OpenShift v4.9 Windows Client entry and save the file.

  4. Unzip the archive with a ZIP program.

  5. Move the oc binary to a directory that is on your PATH.

    To check your PATH, open the command prompt and execute the following command:

    C:\> path

After you install the OpenShift CLI, it is available using the oc command:

C:\> oc <command>
Installing the OpenShift CLI on macOS

You can install the OpenShift CLI (oc) binary on macOS by using the following procedure.

Procedure
  1. Navigate to the OpenShift Container Platform downloads page on the Red Hat Customer Portal.

  2. Select the appropriate version in the Version drop-down menu.

  3. Click Download Now next to the OpenShift v4.9 MacOSX Client entry and save the file.

  4. Unpack and unzip the archive.

  5. Move the oc binary to a directory on your PATH.

    To check your PATH, open a terminal and execute the following command:

    $ echo $PATH

After you install the OpenShift CLI, it is available using the oc command:

$ oc <command>

Configuring credentials that allow images to be mirrored

Create a container image registry credentials file that allows mirroring images from Red Hat to your mirror.

Prerequisites
  • You configured a mirror registry to use in your disconnected environment.

Procedure

Complete the following steps on the installation host:

  1. Download your registry.redhat.io pull secret from the Red Hat OpenShift Cluster Manager and save it to a .json file.

  2. Generate the base64-encoded user name and password or token for your mirror registry:

    $ echo -n '<user_name>:<password>' | base64 -w0 (1)
    BGVtbYk3ZHAtqXs=
    1 For <user_name> and <password>, specify the user name and password that you configured for your registry.
  3. Make a copy of your pull secret in JSON format:

    $ cat ./pull-secret.text | jq .  > <path>/<pull_secret_file_in_json>(1)
    1 Specify the path to the folder to store the pull secret in and a name for the JSON file that you create.
  4. Save the file either as ~/.docker/config.json or $XDG_RUNTIME_DIR/containers/auth.json.

    The contents of the file resemble the following example:

    {
      "auths": {
        "cloud.openshift.com": {
          "auth": "b3BlbnNo...",
          "email": "you@example.com"
        },
        "quay.io": {
          "auth": "b3BlbnNo...",
          "email": "you@example.com"
        },
        "registry.connect.redhat.com": {
          "auth": "NTE3Njg5Nj...",
          "email": "you@example.com"
        },
        "registry.redhat.io": {
          "auth": "NTE3Njg5Nj...",
          "email": "you@example.com"
        }
      }
    }
  5. Edit the new file and add a section that describes your registry to it:

      "auths": {
        "<mirror_registry>": { (1)
          "auth": "<credentials>", (2)
          "email": "you@example.com"
        }
      },
    1 For <mirror_registry>, specify the registry domain name, and optionally the port, that your mirror registry uses to serve content. For example, registry.example.com or registry.example.com:8443
    2 For <credentials>, specify the base64-encoded user name and password for the mirror registry.

    The file resembles the following example:

    {
      "auths": {
        "registry.example.com": {
          "auth": "BGVtbYk3ZHAtqXs=",
          "email": "you@example.com"
        },
        "cloud.openshift.com": {
          "auth": "b3BlbnNo...",
          "email": "you@example.com"
        },
        "quay.io": {
          "auth": "b3BlbnNo...",
          "email": "you@example.com"
        },
        "registry.connect.redhat.com": {
          "auth": "NTE3Njg5Nj...",
          "email": "you@example.com"
        },
        "registry.redhat.io": {
          "auth": "NTE3Njg5Nj...",
          "email": "you@example.com"
        }
      }
    }

Mirroring the OpenShift Container Platform image repository

Mirror the OpenShift Container Platform image repository to your registry to use during cluster installation or upgrade.

Prerequisites
  • Your mirror host has access to the internet.

  • You configured a mirror registry to use in your restricted network and can access the certificate and credentials that you configured.

  • You downloaded the pull secret from the Red Hat OpenShift Cluster Manager and modified it to include authentication to your mirror repository.

  • If you use self-signed certificates that do not set a Subject Alternative Name, you must precede the oc commands in this procedure with GODEBUG=x509ignoreCN=0. If you do not set this variable, the oc commands will fail with the following error:

    x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0
Procedure

Complete the following steps on the mirror host:

  1. Review the OpenShift Container Platform downloads page to determine the version of OpenShift Container Platform that you want to install and determine the corresponding tag on the Repository Tags page.

  2. Set the required environment variables:

    1. Export the release version:

      $ OCP_RELEASE=<release_version>

      For <release_version>, specify the tag that corresponds to the version of OpenShift Container Platform to install, such as 4.5.4.

    2. Export the local registry name and host port:

      $ LOCAL_REGISTRY='<local_registry_host_name>:<local_registry_host_port>'

      For <local_registry_host_name>, specify the registry domain name for your mirror repository, and for <local_registry_host_port>, specify the port that it serves content on.

    3. Export the local repository name:

      $ LOCAL_REPOSITORY='<local_repository_name>'

      For <local_repository_name>, specify the name of the repository to create in your registry, such as ocp4/openshift4.

    4. Export the name of the repository to mirror:

      $ PRODUCT_REPO='openshift-release-dev'

      For a production release, you must specify openshift-release-dev.

    5. Export the path to your registry pull secret:

      $ LOCAL_SECRET_JSON='<path_to_pull_secret>'

      For <path_to_pull_secret>, specify the absolute path to and file name of the pull secret for your mirror registry that you created.

    6. Export the release mirror:

      $ RELEASE_NAME="ocp-release"

      For a production release, you must specify ocp-release.

    7. Export the type of architecture for your server, such as x86_64:

      $ ARCHITECTURE=<server_architecture>
    8. Export the path to the directory to host the mirrored images:

      $ REMOVABLE_MEDIA_PATH=<path> (1)
      1 Specify the full path, including the initial forward slash (/) character.
  3. Mirror the version images to the mirror registry:

    • If your mirror host does not have internet access, take the following actions:

      1. Connect the removable media to a system that is connected to the internet.

      2. Review the images and configuration manifests to mirror:

        $ oc adm release mirror -a ${LOCAL_SECRET_JSON}  \
             --from=quay.io/${PRODUCT_REPO}/${RELEASE_NAME}:${OCP_RELEASE}-${ARCHITECTURE} \
             --to=${LOCAL_REGISTRY}/${LOCAL_REPOSITORY} \
             --to-release-image=${LOCAL_REGISTRY}/${LOCAL_REPOSITORY}:${OCP_RELEASE}-${ARCHITECTURE} --dry-run
      3. Record the entire imageContentSources section from the output of the previous command. The information about your mirrors is unique to your mirrored repository, and you must add the imageContentSources section to the install-config.yaml file during installation.

      4. Mirror the images to a directory on the removable media:

        $ oc adm release mirror -a ${LOCAL_SECRET_JSON} --to-dir=${REMOVABLE_MEDIA_PATH}/mirror quay.io/${PRODUCT_REPO}/${RELEASE_NAME}:${OCP_RELEASE}-${ARCHITECTURE}
      5. Take the media to the restricted network environment and upload the images to the local container registry.

        $ oc image mirror -a ${LOCAL_SECRET_JSON} --from-dir=${REMOVABLE_MEDIA_PATH}/mirror "file://openshift/release:${OCP_RELEASE}*" ${LOCAL_REGISTRY}/${LOCAL_REPOSITORY} (1)
        1 For REMOVABLE_MEDIA_PATH, you must use the same path that you specified when you mirrored the images.
    • If the local container registry is connected to the mirror host, take the following actions:

      1. Directly push the release images to the local registry by using following command:

        $ oc adm release mirror -a ${LOCAL_SECRET_JSON}  \
             --from=quay.io/${PRODUCT_REPO}/${RELEASE_NAME}:${OCP_RELEASE}-${ARCHITECTURE} \
             --to=${LOCAL_REGISTRY}/${LOCAL_REPOSITORY} \
             --to-release-image=${LOCAL_REGISTRY}/${LOCAL_REPOSITORY}:${OCP_RELEASE}-${ARCHITECTURE}

        This command pulls the release information as a digest, and its output includes the imageContentSources data that you require when you install your cluster.

      2. Record the entire imageContentSources section from the output of the previous command. The information about your mirrors is unique to your mirrored repository, and you must add the imageContentSources section to the install-config.yaml file during installation.

        The image name gets patched to Quay.io during the mirroring process, and the podman images will show Quay.io in the registry on the bootstrap virtual machine.

  4. To create the installation program that is based on the content that you mirrored, extract it and pin it to the release:

    • If your mirror host does not have internet access, run the following command:

      $ oc adm release extract -a ${LOCAL_SECRET_JSON} --command=openshift-install "${LOCAL_REGISTRY}/${LOCAL_REPOSITORY}:${OCP_RELEASE}"
    • If the local container registry is connected to the mirror host, run the following command:

      $ oc adm release extract -a ${LOCAL_SECRET_JSON} --command=openshift-install "${LOCAL_REGISTRY}/${LOCAL_REPOSITORY}:${OCP_RELEASE}-${ARCHITECTURE}"

      To ensure that you use the correct images for the version of OpenShift Container Platform that you selected, you must extract the installation program from the mirrored content.

      You must perform this step on a machine with an active internet connection.

      If you are in a disconnected environment, use the --image flag as part of must-gather and point to the payload image.

  5. For clusters using installer-provisioned infrastructure, run the following command:

    $ openshift-install

Adding RHCOS ISO and RootFS images to a disconnected mirror host

Before you install a cluster on infrastructure that you provision, you must create Red Hat Enterprise Linux CoreOS (RHCOS) machines for it to use. Use a disconnected mirror to host the RHCOS images you require to provision your distributed unit (DU) bare-metal hosts.

Prerequisites
  • Deploy and configure an HTTP server to host the RHCOS image resources on the network. You must be able to access the HTTP server from your computer, and from the machines that you create.

The RHCOS images might not change with every release of OpenShift Container Platform. You must download images with the highest version that is less than or equal to the OpenShift Container Platform version that you install. Use the image versions that match your OpenShift Container Platform version if they are available. You require ISO and RootFS images to install RHCOS on the DU hosts. RHCOS qcow2 images are not supported for this installation type.

Procedure
  1. Log in to the mirror host.

  2. Obtain the RHCOS ISO and RootFS images from mirror.openshift.com, for example:

    1. Export the required image names and OpenShift Container Platform version as environment variables:

      $ export ISO_IMAGE_NAME=<iso_image_name> (1)
      $ export ROOTFS_IMAGE_NAME=<rootfs_image_name> (2)
      $ export OCP_VERSION=<ocp_version> (3)
      1 ISO image name, for example, rhcos-4.9.0-fc.1-x86_64-live.x86_64.iso
      2 RootFS image name, for example, rhcos-4.9.0-fc.1-x86_64-live-rootfs.x86_64.img
      3 OpenShift Container Platform version, for example, latest-4.9
    2. Download the required images:

      $ sudo wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/${OCP_VERSION}/${ISO_IMAGE_NAME} -O /var/www/html/${ISO_IMAGE_NAME}
      $ sudo wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/${OCP_VERSION}/${ROOTFS_IMAGE_NAME} -O /var/www/html/${ROOTFS_IMAGE_NAME}
Verification steps
  • Verify that the images downloaded successfully and are being served on the disconnected mirror host, for example:

    $ wget http://$(hostname)/${ISO_IMAGE_NAME}
    Expected output
    ...
    Saving to: rhcos-4.9.0-fc.1-x86_64-live.x86_64.iso
    rhcos-4.9.0-fc.1-x86_64-  11%[====>    ]  10.01M  4.71MB/s
    ...

Installing Red Hat Advanced Cluster Management in a disconnected environment

You use Red Hat Advanced Cluster Management (RHACM) on a hub cluster in the disconnected environment to manage the deployment of distributed unit (DU) profiles on multiple managed spoke clusters.

Prerequisites
  • Install the OpenShift Container Platform CLI (oc).

  • Log in as a user with cluster-admin privileges.

  • Configure a disconnected mirror registry for use in the cluster.

    If you want to deploy Operators to the spoke clusters, you must also add them to this registry. See Mirroring an Operator catalog for more information.

Procedure

Enabling assisted installer service on bare metal

The Assisted Installer Service (AIS) deploys OpenShift Container Platform clusters. Red Hat Advanced Cluster Management (RHACM) ships with AIS. AIS is deployed when you enable the MultiClusterHub Operator on the RHACM hub cluster.

For distributed units (DUs), RHACM supports OpenShift Container Platform deployments that run on a single bare-metal host. The single-node cluster acts as both a control plane and a worker node.

Prerequisites
  • Install OpenShift Container Platform 4.9 on a hub cluster.

  • Install RHACM and create the MultiClusterHub resource.

  • Create persistent volume custom resources (CR) for database and file system storage.

  • You have installed the OpenShift CLI (oc).

Procedure
  1. Modify the HiveConfig resource to enable the feature gate for Assisted Installer:

     $ oc patch hiveconfig hive --type merge -p '{"spec":{"targetNamespace":"hive","logLevel":"debug","featureGates":{"custom":{"enabled":["AlphaAgentInstallStrategy"]},"featureSet":"Custom"}}}'
  2. Modify the Provisioning resource to allow the Bare Metal Operator to watch all namespaces:

     $ oc patch provisioning provisioning-configuration --type merge -p '{"spec":{"watchAllNamespaces": true }}'
  3. Create the AgentServiceConfig CR.

    1. Save the following YAML in the agent_service_config.yaml file:

      apiVersion: agent-install.openshift.io/v1beta1
      kind: AgentServiceConfig
      metadata:
       name: agent
      spec:
        databaseStorage:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: <db_volume_size> (1)
        filesystemStorage:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: <fs_volume_size> (2)
        osImages: (3)
          - openshiftVersion: "<ocp_version>" (4)
            version: "<ocp_release_version>" (5)
            url: "<iso_url>" (6)
            rootFSUrl: "<root_fs_url>" (7)
            cpuArchitecture: "x86_64"
      1 Volume size for the databaseStorage field, for example 10Gi.
      2 Volume size for the filesystemStorage field, for example 20Gi.
      3 List of OS image details. Example describes a single OpenShift Container Platform OS version.
      4 OpenShift Container Platform version to install, for example, 4.8.
      5 Specific install version, for example, 47.83.202103251640-0.
      6 ISO url, for example, https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.7/4.7.7/rhcos-4.7.7-x86_64-live.x86_64.iso.
      7 Root FS image URL, for example, https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.7/4.7.7/rhcos-live-rootfs.x86_64.img
    2. Create the AgentServiceConfig CR by running the following command:

      $ oc create -f agent_service_config.yaml
      Example output
      agentserviceconfig.agent-install.openshift.io/agent created

ZTP custom resources

Zero touch provisioning (ZTP) uses custom resource (CR) objects to extend the Kubernetes API or introduce your own API into a project or a cluster. These CRs contain the site-specific data required to install and configure a cluster for RAN applications.

A custom resource definition (CRD) file defines your own object kinds. Deploying a CRD into the managed cluster causes the Kubernetes API server to begin serving the specified CR for the entire lifecycle.

For each CR in the <site>.yaml file on the managed cluster, ZTP uses the data to create installation CRs in a directory named for the cluster.

ZTP provides two ways for defining and installing CRs on managed clusters: a manual approach when you are provisioning a single cluster and an automated approach when provisioning multiple clusters.

Manual CR creation for single clusters

Use this method when you are creating CRs for a single cluster. This is a good way to test your CRs before deploying on a larger scale.

Automated CR creation for multiple managed clusters

Use the automated SiteConfig method when you are installing multiple managed clusters, for example, in batches of up to 100 clusters. SiteConfig uses ArgoCD as the engine for the GitOps method of site deployment. After completing a site plan that contains all of the required parameters for deployment, a policy generator creates the manifests and applies them to the hub cluster.

Both methods create the CRs shown in the following table. On the cluster site, an automated Discovery image ISO file creates a directory with the site name and a file with the cluster name. Every cluster has its own namespace, and all of the CRs are under that namespace. The namespace and the CR names match the cluster name.

Resource Description Usage

BareMetalHost

Contains the connection information for the Baseboard Management Controller (BMC) of the target bare-metal host.

Provides access to the BMC in order to load and boot the Discovery image ISO on the target server by using the Redfish protocol.

InfraEnv

Contains information for pulling OpenShift Container Platform onto the target bare-metal host.

Used with ClusterDeployment to generate the Discovery ISO for the managed cluster.

AgentClusterInstall

Specifies the managed cluster’s configuration such as networking and the number of supervisor (control plane) nodes. Shows the kubeconfig and credentials when the installation is complete.

Specifies the managed cluster configuration information and provides status during the installation of the cluster.

ClusterDeployment

References the AgentClusterInstall to use.

Used with InfraEnv to generate the Discovery ISO for the managed cluster.

NMStateConfig

Provides network configuration information such as MAC to IP mapping, DNS server, default route, and other network settings. This is not needed if DHCP is used.

Sets up a static IP address for the managed cluster’s Kube API server.

Agent

Contains hardware information about the target bare-metal host.

Created automatically on the hub when the target machine’s Discovery image ISO boots.

ManagedCluster

When a cluster is managed by the hub, it must be imported and known. This Kubernetes object provides that interface.

The hub uses this resource to manage and show the status of managed clusters.

KlusterletAddonConfig

Contains the list of services provided by the hub to be deployed to a ManagedCluster.

Tells the hub which addon services to deploy to a ManagedCluster.

Namespace

Logical space for ManagedCluster resources existing on the hub. Unique per site.

Propagates resources to the ManagedCluster.

Secret

Two custom resources are created: BMC Secret and Image Pull Secret.

  • BMC Secret authenticates into the target bare-metal host using its username and password.

  • Image Pull Secret contains authentication information for the OpenShift Container Platform image installed on the target bare-metal host.

ClusterImageSet

Contains OpenShift Container Platform image information such as the repository and image name.

Passed into resources to provide OpenShift Container Platform images.

Creating custom resources to install a single managed cluster

This procedure tells you how to manually create and deploy a single managed cluster. If you are creating multiple clusters, perhaps hundreds, use the SiteConfig method described in “Creating ZTP custom resources for multiple managed clusters”.

Prerequisites
  • Enable Assisted Installer Service.

  • Ensure network connectivity:

    • The container within the hub must be able to reach the Baseboard Management Controller (BMC) address of the target bare-metal host.

    • The managed cluster must be able to resolve and reach the hub’s API hostname and *.app hostname. Example of the hub’s API and *.app hostname:

      console-openshift-console.apps.hub-cluster.internal.domain.com
      api.hub-cluster.internal.domain.com
    • The hub must be able to resolve and reach the API and *.app hostname of the managed cluster. Here is an example of the managed cluster’s API and *.app hostname:

      console-openshift-console.apps.sno-managed-cluster-1.internal.domain.com
      api.sno-managed-cluster-1.internal.domain.com
    • A DNS Server that is IP reachable from the target bare-metal host.

  • A target bare-metal host for the managed cluster with the following hardware minimums:

    • 4 CPU or 8 vCPU

    • 32 GiB RAM

    • 120 GiB Disk for root filesystem

  • When working in a disconnected environment, the release image needs to be mirrored. Use this command to mirror the release image:

    oc adm release mirror -a <pull_secret.json>
    --from=quay.io/openshift-release-dev/ocp-release:{{ mirror_version_spoke_release }}
    --to={{ provisioner_cluster_registry }}/ocp4 --to-release-image={{
    provisioner_cluster_registry }}/ocp4:{{ mirror_version_spoke_release }}
  • You mirrored the ISO and rootfs used to generate the spoke cluster ISO to an HTTP server and configured the settings to pull images from there.

    The images must match the version of the ClusterImageSet. To deploy a 4.9.0 version, the rootfs and ISO need to be set at 4.9.0.

Procedure
  1. Create a ClusterImageSet for each specific cluster version that needs to be deployed. A ClusterImageSet has the following format:

    apiVersion: hive.openshift.io/v1
    kind: ClusterImageSet
    metadata:
      name: openshift-4.9.0-rc.0 (1)
    spec:
       releaseImage: quay.io/openshift-release-dev/ocp-release:4.9.0-x86_64 (2)
    1 The descriptive version that you want to deploy.
    2 Points to the specific release image to deploy.
  2. Create the Namespace definition for the managed cluster:

    apiVersion: v1
    kind: Namespace
    metadata:
         name: <cluster_name> (1)
         labels:
            name: <cluster_name> (1)
    1 The name of the managed cluster to provision.
  3. Create the BMC Secret custom resource:

    apiVersion: v1
    data:
      password: <bmc_password> (1)
      username: <bmc_username> (2)
    kind: Secret
    metadata:
      name: <cluster_name>-bmc-secret
      namespace: <cluster_name>
    type: Opaque
    1 The password to the target bare-metal host. Must be base-64 encoded.
    2 The username to the target bare-metal host. Must be base-64 encoded.
  4. Create the Image Pull Secret custom resource:

    apiVersion: v1
    data:
      .dockerconfigjson: <pull_secret> (1)
    kind: Secret
    metadata:
      name: assisted-deployment-pull-secret
      namespace: <cluster_name>
    type: kubernetes.io/dockerconfigjson
    1 The OpenShift Container Platform pull secret. Must be base-64 encoded.
  5. Create the AgentClusterInstall custom resource:

    apiVersion: extensions.hive.openshift.io/v1beta1
    kind: AgentClusterInstall
    metadata:
      # Only include the annotation if using OVN, otherwise omit the annotation
      annotations:
        agent-install.openshift.io/install-config-overrides: '{"networking":{"networkType":"OVNKubernetes"}}'
      name: <cluster_name>
      namespace: <cluster_name>
    spec:
      clusterDeploymentRef:
        name: <cluster_name>
      imageSetRef:
        name: <cluster_image_set> (1)
      networking:
        clusterNetwork:
        - cidr: <cluster_network_cidr> (2)
          hostPrefix: 23
        machineNetwork:
        - cidr: <machine_network_cidr> (3)
        serviceNetwork:
        - <service_network_cidr> (4)
      provisionRequirements:
        controlPlaneAgents: 1
        workerAgents: 0
      sshPublicKey: <public_key> (5)
    1 The name of the ClusterImageSet custom resource used to install OpenShift Container Platform on the bare-metal host.
    2 A block of IPv4 or IPv6 addresses in CIDR notation used for communication among cluster nodes.
    3 A block of IPv4 or IPv6 addresses in CIDR notation used for the target bare-metal host external communication. Also used to determine the API and Ingress VIP addresses when provisioning DU single-node clusters.
    4 A block of IPv4 or IPv6 addresses in CIDR notation used for cluster services internal communication.
    5 Entered as plain text. You can use the public key to SSH into the node after it has finished installing.

    If you want to configure a static IP for the managed cluster at this point, see the procedure in this document for configuring static IP addresses for managed clusters.

  6. Create the ClusterDeployment custom resource:

    apiVersion: hive.openshift.io/v1
    kind: ClusterDeployment
    metadata:
      name: <cluster_name>
      namespace: <cluster_name>
    spec:
      baseDomain: <base_domain> (1)
      clusterInstallRef:
        group: extensions.hive.openshift.io
        kind: AgentClusterInstall
        name: <cluster_name>
        version: v1beta1
      clusterName: <cluster_name>
      platform:
        agentBareMetal:
          agentSelector:
            matchLabels:
              cluster-name: <cluster_name>
      pullSecretRef:
        name: assisted-deployment-pull-secret
    1 The managed cluster’s base domain.
  7. Create the KlusterletAddonConfig custom resource:

    apiVersion: agent.open-cluster-management.io/v1
    kind: KlusterletAddonConfig
    metadata:
      name: <cluster_name>
      namespace: <cluster_name>
    spec:
      clusterName: <cluster_name>
      clusterNamespace: <cluster_name>
      clusterLabels:
        cloud: auto-detect
        vendor: auto-detect
      applicationManager:
        enabled: true
      certPolicyController:
        enabled: false
      iamPolicyController:
        enabled: false
      policyController:
        enabled: true
      searchCollector:
        enabled: false (1)
    1 Set to true to enable KlusterletAddonConfig or false to disable the KlusterletAddonConfig. Keep searchCollector disabled.
  8. Create the ManagedCluster custom resource:

    apiVersion: cluster.open-cluster-management.io/v1
    kind: ManagedCluster
    metadata:
      name: <cluster_name>
    spec:
      hubAcceptsClient: true
  9. Create the InfraEnv custom resource:

    apiVersion: agent-install.openshift.io/v1beta1
    kind: InfraEnv
    metadata:
      name: <cluster_name>
      namespace: <cluster_name>
    spec:
      clusterRef:
        name: <cluster_name>
        namespace: <cluster_name>
      sshAuthorizedKey: <public_key> (1)
      agentLabels: (2)
        location: "<label-name>"
      pullSecretRef:
        name: assisted-deployment-pull-secret
    1 Entered as plain text. You can use the public key to SSH into the target bare-metal host when it boots from the ISO.
    2 Sets a label to match. The labels apply when the agents boot.
  10. Create the BareMetalHost custom resource:

    apiVersion: metal3.io/v1alpha1
    kind: BareMetalHost
    metadata:
      name: <cluster_name>
      namespace: <cluster_name>
      annotations:
        inspect.metal3.io: disabled
      labels:
        infraenvs.agent-install.openshift.io: "<cluster_name>"
    spec:
      bootMode: "UEFI"
      bmc:
        address: <bmc_address> (1)
        disableCertificateVerification: true
        credentialsName: <cluster_name>-bmc-secret
      bootMACAddress: <mac_address> (2)
      automatedCleaningMode: disabled
      online: true
    1 The baseboard management console address of the installation ISO on the target bare-metal host.
    2 The MAC address of the target bare-metal host.

    Optionally, you can add bmac.agent-install.openshift.io/hostname: <host-name> as an annotation to set the managed cluster’s hostname. If you don’t add the annotation, the hostname will default to either a hostname from the DHCP server or local host.

  11. After you have created the custom resources, push the entire directory of generated custom resources to the Git repository you created for storing the custom resources.

Next step

To provision additional clusters, repeat this procedure for each cluster.

Configuring static IP addresses for managed clusters

Optionally, after creating the AgentClusterInstall custom resource, you can configure static IP addresses for the managed clusters.

You must create this custom resource before creating the ClusterDeployment custom resource.

Prerequisites
  • Deploy and configure the AgentClusterInstall custom resource.

Procedure
  1. Create a NMStateConfig custom resource:

    apiVersion: agent-install.openshift.io/v1beta1
    kind: NMStateConfig
    metadata:
     name: <cluster_name>
     namespace: <cluster_name>
     labels:
       sno-cluster-<cluster-name>: <cluster_name>
    spec:
     config:
       interfaces:
         - name: eth0
           type: ethernet
           state: up
           ipv4:
             enabled: true
             address:
               - ip: <ip_address> (1)
                 prefix-length: <public_network_prefix> (2)
             dhcp: false
       dns-resolver:
         config:
           server:
             - <dns_resolver> (3)
       routes:
         config:
           - destination: 0.0.0.0/0
             next-hop-address: <gateway> (4)
             next-hop-interface: eth0
             table-id: 254
     interfaces:
       - name: "eth0" (5)
         macAddress: <mac_address> (6)
    1 The static IP address of the target bare-metal host.
    2 The static IP address’s subnet prefix for the target bare-metal host.
    3 The DNS server for the target bare-metal host.
    4 The gateway for the target bare-metal host.
    5 Must match the name specified in the interfaces section.
    6 The mac address of the interface.
  2. When creating the BareMetalHost custom resource, ensure that one of its mac addresses matches a mac address in the NMStateConfig target bare-metal host.

  3. When creating the InfraEnv custom resource, reference the label from the NMStateConfig custom resource in the InfraEnv custom resource:

    apiVersion: agent-install.openshift.io/v1beta1
    kind: InfraEnv
    metadata:
      name: <cluster_name>
      namespace: <cluster_name>
    spec:
      clusterRef:
        name: <cluster_name>
        namespace: <cluster_name>
      sshAuthorizedKey: <public_key>
      agentLabels: (1)
        location: "<label-name>"
      pullSecretRef:
        name: assisted-deployment-pull-secret
      nmStateConfigLabelSelector:
        matchLabels:
          sno-cluster-<cluster-name>: <cluster_name> # Match this label
    1 Sets a label to match. The labels apply when the agents boot.

Automated Discovery image ISO process for provisioning clusters

After you create the custom resources, the following actions happen automatically:

  1. A Discovery image ISO file is generated and booted on the target machine.

  2. When the ISO file successfully boots on the target machine it reports the hardware information of the target machine.

  3. After all hosts are discovered, OpenShift Container Platform is installed.

  4. When OpenShift Container Platform finishes installing, the hub installs the klusterlet service on the target cluster.

  5. The requested add-on services are installed on the target cluster.

The Discovery image ISO process finishes when the Agent custom resource is created on the hub for the managed cluster.

Checking the managed cluster status

Ensure that cluster provisioning was successful by checking the cluster status.

Prerequisites
  • All of the custom resources have been configured and provisioned, and the Agent custom resource is created on the hub for the managed cluster.

Procedure
  1. Check the status of the managed cluster:

    $ oc get managedcluster

    True indicates the managed cluster is ready.

  2. Check the agent status:

    $ oc get agent -n <cluster_name>
  3. Use the describe command to provide an in-depth description of the agent’s condition. Statuses to be aware of include BackendError, InputError, ValidationsFailing, InstallationFailed, and AgentIsConnected. These statuses are relevant to the Agent and AgentClusterInstall custom resources.

    $ oc describe agent -n <cluster_name>
  4. Check the cluster provisioning status:

    $ oc get agentclusterinstall -n <cluster_name>
  5. Use the describe command to provide an in-depth description of the cluster provisioning status:

    $ oc describe agentclusterinstall -n <cluster_name>
  6. Check the status of the managed cluster’s add-on services:

    $ oc get managedclusteraddon -n <cluster_name>
  7. Retrieve the authentication information of the kubeconfig file for the managed cluster:

    $ oc get secret -n <cluster_name> <cluster_name>-admin-kubeconfig -o jsonpath={.data.kubeconfig} | base64 -d > <directory>/<cluster_name>-kubeconfig

Configuring a managed cluster for a disconnected environment

After you have completed the preceding procedure, follow these steps to configure the managed cluster for a disconnected environment.

Prerequisites
  • A disconnected installation of Red Hat Advanced Cluster Management (RHACM) 2.3.

  • Host the rootfs and iso images on an HTTPD server.

Procedure
  1. Create a ConfigMap containing the mirror registry config:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: assisted-installer-mirror-config
      namespace: assisted-installer
      labels:
        app: assisted-service
    data:
      ca-bundle.crt: <certificate> (1)
      registries.conf: |  (2)
        unqualified-search-registries = ["registry.access.redhat.com", "docker.io"]
    
        [[registry]]
          location = <mirror_registry_url>  (3)
          insecure = false
          mirror-by-digest-only = true
    1 The mirror registry’s certificate used when creating the mirror registry.
    2 The configuration for the mirror registry.
    3 The URL of the mirror registry.

    This updates mirrorRegistryRef in the AgentServiceConfig custom resource, as shown below:

    Example output
    apiVersion: agent-install.openshift.io/v1beta1
    kind: AgentServiceConfig
    metadata:
      name: agent
      namespace: assisted-installer
    spec:
      databaseStorage:
        volumeName: <db_pv_name>
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: <db_storage_size>
      filesystemStorage:
        volumeName: <fs_pv_name>
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: <fs_storage_size>
      mirrorRegistryRef:
        name: 'assisted-installer-mirror-config'
      osImages:
        - openshiftVersion: <ocp_version>
          rootfs: <rootfs_url> (1)
          url: <iso_url> (1)
    1 Must match the URLs of the HTTPD server.
  2. For disconnected installations, you must deploy an NTP clock that is reachable through the disconnected network. You can do this by configuring chrony to act as server, editing the /etc/chrony.conf file, and adding the following allowed IPv6 range:

    # Allow NTP client access from local network.
    #allow 192.168.0.0/16
    local stratum 10
    bindcmdaddress ::
    allow 2620:52:0:1310::/64

Configuring IPv6 addresses for a disconnected environment

Optionally, when you are creating the AgentClusterInstall custom resource, you can configure IPv6 addresses for the managed clusters.

Procedure
  1. In the AgentClusterInstall custom resource, modify the IP addresses in clusterNetwork and serviceNetwork for IPv6 addresses:

    apiVersion: extensions.hive.openshift.io/v1beta1
    kind: AgentClusterInstall
    metadata:
      # Only include the annotation if using OVN, otherwise omit the annotation
      annotations:
        agent-install.openshift.io/install-config-overrides: '{"networking":{"networkType":"OVNKubernetes"}}'
      name: <cluster_name>
      namespace: <cluster_name>
    spec:
      clusterDeploymentRef:
        name: <cluster_name>
      imageSetRef:
        name: <cluster_image_set>
      networking:
        clusterNetwork:
        - cidr: "fd01::/48"
          hostPrefix: 64
        machineNetwork:
        - cidr: <machine_network_cidr>
        serviceNetwork:
        - "fd02::/112"
      provisionRequirements:
        controlPlaneAgents: 1
        workerAgents: 0
      sshPublicKey: <public_key>
  2. Update the NMStateConfig custom resource with the IPv6 addresses you defined.

Troubleshooting the managed cluster

Use this procedure to diagnose any installation issues that might occur with the managed clusters.

Procedure
  1. Check the status of the managed cluster:

    $ oc get managedcluster
    Example output
    NAME            HUB ACCEPTED   MANAGED CLUSTER URLS   JOINED   AVAILABLE   AGE
    SNO-cluster     true                                   True     True      2d19h

    If the status in the AVAILABLE column is True, the managed cluster is being managed by the hub.

    If the status in the AVAILABLE column is Unknown, the managed cluster is not being managed by the hub. Use the following steps to continue checking to get more information.

  2. Check the AgentClusterInstall install status:

    $ oc get clusterdeployment -n <cluster_name>
    Example output
    NAME        PLATFORM            REGION   CLUSTERTYPE   INSTALLED    INFRAID    VERSION  POWERSTATE AGE
    Sno0026    agent-baremetal                               false                          Initialized
    2d14h

    If the status in the INSTALLED column is false, the installation was unsuccessful.

  3. If the installation failed, enter the following command to review the status of the AgentClusterInstall resource:

    $ oc describe agentclusterinstall -n <cluster_name> <cluster_name>
  4. Resolve the errors and reset the cluster:

    1. Remove the cluster’s managed cluster resource:

      $ oc delete managedcluster <cluster_name>
    2. Remove the cluster’s namespace:

      $ oc delete namespace <cluster_name>

      This deletes all of the namespace-scoped custom resources created for this cluster. You must wait for the ManagedCluster CR deletion to complete before proceeding.

    3. Recreate the custom resources for the managed cluster.

Applying the RAN policies for monitoring cluster activity

Zero touch provisioning (ZTP) uses Red Hat Advanced Cluster Management (RHACM) to apply the radio access network (RAN) policies using a policy-based governance approach to automatically monitor cluster activity.

The policy generator (PolicyGen) is a Kustomize plugin that facilitates creating ACM policies from predefined custom resources. There are three main items: Policy Categorization, Source CR policy, and PolicyGenTemplate. PolicyGen relies on these to generate the policies and their placement bindings and rules.

The following diagram shows how the RAN policy generator interacts with GitOps and ACM.

RAN policy generator

RAN policies are categorized into three main groups:

Common

A policy that exists in the Common category is applied to all clusters to be represented by the site plan.

Groups

A policy that exists in the Groups category is applied to a group of clusters. Every group of clusters could have their own policies that exist under the Groups category. For example, Groups/group1 could have its own policies that are applied to the clusters belonging to group1.

Sites

A policy that exists in the Sites category is applied to a specific cluster. Any cluster could have its own policies that exist in the Sites category. For example, Sites/cluster1 will have its own policies applied to cluster1.

The following diagram shows how policies are generated.

Generating policies

Applying source custom resource policies

Source custom resource policies include the following:

  • SR-IOV policies

  • PTP policies

  • Performance Add-on Operator policies

  • MachineConfigPool policies

  • SCTP policies

You need to define the source custom resource that generates the ACM policy with consideration of possible overlay to its metadata or spec/data. For example, a common-namespace-policy contains a Namespace definition that exists in all managed clusters. This namespace is placed under the Common category and there are no changes for its spec or data across all clusters.

Namespace policy example

The following example shows the source custom resource for this namespace:

apiVersion: v1
kind: Namespace
metadata:
 name: openshift-sriov-network-operator
 labels:
   openshift.io/run-level: "1"
Example output

The generated policy that applies this namespace includes the namespace as it is defined above without any change, as shown in this example:

apiVersion: policy.open-cluster-management.io/v1
kind: Policy
metadata:
   name: common-sriov-sub-ns-policy
   namespace: common-sub
   annotations:
       policy.open-cluster-management.io/categories: CM Configuration Management
       policy.open-cluster-management.io/controls: CM-2 Baseline Configuration
       policy.open-cluster-management.io/standards: NIST SP 800-53
spec:
   remediationAction: enforce
   disabled: false
   policy-templates:
       - objectDefinition:
           apiVersion: policy.open-cluster-management.io/v1
           kind: ConfigurationPolicy
           metadata:
               name: common-sriov-sub-ns-policy-config
           spec:
               remediationAction: enforce
               severity: low
               namespaceselector:
                   exclude:
                       - kube-*
                   include:
                       - '*'
               object-templates:
                   - complianceType: musthave
                     objectDefinition:
                       apiVersion: v1
                       kind: Namespace
                       metadata:
                           labels:
                               openshift.io/run-level: "1"
                           name: openshift-sriov-network-operator
SRIOV policy example

The following example shows a SriovNetworkNodePolicy definition that exists in different clusters with a different specification for each cluster. The example also shows the source custom resource for the SriovNetworkNodePolicy:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: sriov-nnp
  namespace: openshift-sriov-network-operator
spec:
  # The $ tells the policy generator to overlay/remove the spec.item in the generated policy.
  deviceType: $deviceType
  isRdma: false
  nicSelector:
    pfNames: [$pfNames]
  nodeSelector:
    node-role.kubernetes.io/worker: ""
  numVfs: $numVfs
  priority: $priority
  resourceName: $resourceName
Example output

The SriovNetworkNodePolicy name and namespace are the same for all clusters, so both are defined in the source SriovNetworkNodePolicy. However, the generated policy requires the $deviceType, $numVfs, as input parameters in order to adjust the policy for each cluster. The generated policy is shown in this example:

apiVersion: policy.open-cluster-management.io/v1
kind: Policy
metadata:
    name: site-du-sno-1-sriov-nnp-mh-policy
    namespace: sites-sub
    annotations:
        policy.open-cluster-management.io/categories: CM Configuration Management
        policy.open-cluster-management.io/controls: CM-2 Baseline Configuration
        policy.open-cluster-management.io/standards: NIST SP 800-53
spec:
    remediationAction: enforce
    disabled: false
    policy-templates:
        - objectDefinition:
            apiVersion: policy.open-cluster-management.io/v1
            kind: ConfigurationPolicy
            metadata:
                name: site-du-sno-1-sriov-nnp-mh-policy-config
            spec:
                remediationAction: enforce
                severity: low
                namespaceselector:
                    exclude:
                        - kube-*
                    include:
                        - '*'
                object-templates:
                    - complianceType: musthave
                      objectDefinition:
                        apiVersion: sriovnetwork.openshift.io/v1
                        kind: SriovNetworkNodePolicy
                        metadata:
                            name: sriov-nnp-du-mh
                            namespace: openshift-sriov-network-operator
                        spec:
                            deviceType: vfio-pci
                            isRdma: false
                            nicSelector:
                                pfNames:
                                    - ens7f0
                            nodeSelector:
                                node-role.kubernetes.io/worker: ""
                            numVfs: 8
                            resourceName: du_mh

Defining the required input parameters as $value, for example $deviceType, is not mandatory. The $ tells the policy generator to overlay or remove the item from the generated policy. Otherwise, the value does not change.

The PolicyGenTemplate

The PolicyGenTemplate.yaml file is a Custom Resource Definition (CRD) that tells PolicyGen where to categorize the generated policies and which items need to be overlaid.

The following example shows the PolicyGenTemplate.yaml file:

apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
  name: "group-du-sno"
  namespace: "group-du-sno"
spec:
  bindingRules:
    group-du-sno: ""
  mcp: "master"
  sourceFiles:
    - fileName: ConsoleOperatorDisable.yaml
      policyName: "console-policy"
    - fileName: ClusterLogging.yaml
      policyName: "cluster-log-policy"
      spec:
        curation:
          curator:
            schedule: "30 3 * * *"
          collection:
            logs:
              type: "fluentd"
              fluentd: {}

The group-du-ranGen.yaml file defines a group of policies under a group named group-du. This file defines a MachineConfigPool worker-du that is used as the node selector for any other policy defined in sourceFiles. An ACM policy is generated for every source file that exists in sourceFiles. And, a single placement binding and placement rule is generated to apply the cluster selection rule for group-du policies.

Using the source file PtpConfigSlave.yaml as an example, the PtpConfigSlave has a definition of a PtpConfig custom resource (CR). The generated policy for the PtpConfigSlave example is named group-du-ptp-config-policy. The PtpConfig CR defined in the generated group-du-ptp-config-policy is named du-ptp-slave. The spec defined in PtpConfigSlave.yaml is placed under du-ptp-slave along with the other spec items defined under the source file.

The following example shows the group-du-ptp-config-policy:

apiVersion: policy.open-cluster-management.io/v1
kind: Policy
metadata:
  name: group-du-ptp-config-policy
  namespace: groups-sub
  annotations:
    policy.open-cluster-management.io/categories: CM Configuration Management
    policy.open-cluster-management.io/controls: CM-2 Baseline Configuration
    policy.open-cluster-management.io/standards: NIST SP 800-53
spec:
    remediationAction: enforce
    disabled: false
    policy-templates:
        - objectDefinition:
            apiVersion: policy.open-cluster-management.io/v1
            kind: ConfigurationPolicy
            metadata:
                name: group-du-ptp-config-policy-config
            spec:
                remediationAction: enforce
                severity: low
                namespaceselector:
                    exclude:
                        - kube-*
                    include:
                        - '*'
                object-templates:
                    - complianceType: musthave
                      objectDefinition:
                        apiVersion: ptp.openshift.io/v1
                        kind: PtpConfig
                        metadata:
                            name: slave
                            namespace: openshift-ptp
                        spec:
                            recommend:
                                - match:
                                - nodeLabel: node-role.kubernetes.io/worker-du
                                  priority: 4
                                  profile: slave
                            profile:
                                - interface: ens5f0
                                  name: slave
                                  phc2sysOpts: -a -r -n 24
                                  ptp4lConf: |
                                    [global]
                                    #
                                    # Default Data Set
                                    #
                                    twoStepFlag 1
                                    slaveOnly 0
                                    priority1 128
                                    priority2 128
                                    domainNumber 24
                                    .....

Considerations when creating custom resource policies

  • The custom resources used to create the ACM policies should be defined with consideration of possible overlay to its metadata and spec/data. For example, if the custom resource metadata.name does not change between clusters then you should set the metadata.name value in the custom resource file. If the custom resource will have multiple instances in the same cluster, then the custom resource metadata.name must be defined in the policy template file.

  • In order to apply the node selector for a specific machine config pool, you have to set the node selector value as $mcp in order to let the policy generator overlay the $mcp value with the defined mcp in the policy template.

  • Subscription source files do not change.

Generating RAN policies

Prerequisites
Procedure
  1. Configure the kustomization.yaml file to reference the policyGenerator.yaml file. The following example shows the PolicyGenerator definition:

    apiVersion: policyGenerator/v1
    kind: PolicyGenerator
    metadata:
      name: acm-policy
      namespace: acm-policy-generator
    # The arguments should be given and defined as below with same order --policyGenTempPath= --sourcePath= --outPath= --stdout --customResources
    argsOneLiner: ./ranPolicyGenTempExamples ./sourcePolicies ./out true false

    Where:

    • policyGenTempPath is the path to the policyGenTemp files.

    • sourcePath: is the path to the source policies.

    • outPath: is the path to save the generated ACM policies.

    • stdout: If true, prints the generated policies to the console.

    • customResources: If true generates the CRs from the sourcePolicies files without ACM policies.

  2. Test PolicyGen by running the following commands:

    $ cd cnf-features-deploy/ztp/ztp-policy-generator/
    $ XDG_CONFIG_HOME=./ kustomize build --enable-alpha-plugins

    An out directory is created with the expected policies, as shown in this example:

    out
    ├── common
    │   ├── common-log-sub-ns-policy.yaml
    │   ├── common-log-sub-oper-policy.yaml
    │   ├── common-log-sub-policy.yaml
    │   ├── common-pao-sub-catalog-policy.yaml
    │   ├── common-pao-sub-ns-policy.yaml
    │   ├── common-pao-sub-oper-policy.yaml
    │   ├── common-pao-sub-policy.yaml
    │   ├── common-policies-placementbinding.yaml
    │   ├── common-policies-placementrule.yaml
    │   ├── common-ptp-sub-ns-policy.yaml
    │   ├── common-ptp-sub-oper-policy.yaml
    │   ├── common-ptp-sub-policy.yaml
    │   ├── common-sriov-sub-ns-policy.yaml
    │   ├── common-sriov-sub-oper-policy.yaml
    │   └── common-sriov-sub-policy.yaml
    ├── groups
    │   ├── group-du
    │   │   ├── group-du-mc-chronyd-policy.yaml
    │   │   ├── group-du-mc-mount-ns-policy.yaml
    │   │   ├── group-du-mcp-du-policy.yaml
    │   │   ├── group-du-mc-sctp-policy.yaml
    │   │   ├── group-du-policies-placementbinding.yaml
    │   │   ├── group-du-policies-placementrule.yaml
    │   │   ├── group-du-ptp-config-policy.yaml
    │   │   └── group-du-sriov-operconfig-policy.yaml
    │   └── group-sno-du
    │       ├── group-du-sno-policies-placementbinding.yaml
    │       ├── group-du-sno-policies-placementrule.yaml
    │       ├── group-sno-du-console-policy.yaml
    │       ├── group-sno-du-log-forwarder-policy.yaml
    │       └── group-sno-du-log-policy.yaml
    └── sites
        └── site-du-sno-1
            ├── site-du-sno-1-policies-placementbinding.yaml
            ├── site-du-sno-1-policies-placementrule.yaml
            ├── site-du-sno-1-sriov-nn-fh-policy.yaml
            ├── site-du-sno-1-sriov-nnp-mh-policy.yaml
            ├── site-du-sno-1-sriov-nw-fh-policy.yaml
            ├── site-du-sno-1-sriov-nw-mh-policy.yaml
            └── site-du-sno-1-.yaml

    The common policies are flat because they will be applied to all clusters. However, the groups and sites have subdirectories for each group and site as they will be applied to different clusters.

Cluster provisioning

Zero touch provisioning (ZTP) provisions clusters using a layered approach. The base components consist of Red Hat Enterprise Linux CoreOS (RHCOS), the basic operating system for the cluster, and OpenShift Container Platform. After these components are installed, the worker node can join the existing cluster. When the node has joined the existing cluster, the 5G RAN profile Operators are applied.

The following diagram illustrates this architecture.

Cluster provisioning

The following RAN Operators are deployed on every cluster:

  • Machine Config

  • Precision Time Protocol (PTP)

  • Performance Addon Operator

  • SR-IOV

  • Local Storage Operator

  • Logging Operator

Machine Config Operator

The Machine Config Operator enables system definitions and low-level system settings such as workload partitioning, NTP, and SCTP. This Operator is installed with OpenShift Container Platform.

A performance profile and its created products are applied to a node according to an associated machine config pool (MCP). The MCP holds valuable information about the progress of applying the machine configurations created by performance addons that encompass kernel args, kube config, huge pages allocation, and deployment of the realtime kernel (rt-kernel). The performance addons controller monitors changes in the MCP and updates the performance profile status accordingly.

Performance Addon Operator

The Performance Addon Operator provides the ability to enable advanced node performance tunings on a set of nodes.

OpenShift Container Platform provides a Performance Addon Operator to implement automatic tuning to achieve low latency performance for OpenShift Container Platform applications. The cluster administrator uses this performance profile configuration that makes it easier to make these changes in a more reliable way.

The administrator can specify updating the kernel to rt-kernel, reserving CPUs for management workloads, and using CPUs for running the workloads.

SR-IOV Operator

The Single Root I/O Virtualization (SR-IOV) Network Operator manages the SR-IOV network devices and network attachments in your cluster.

The SR-IOV Operator allows network interfaces to be virtual and shared at a device level with networking functions running within the cluster.

The SR-IOV Network Operator adds the SriovOperatorConfig.sriovnetwork.openshift.io CustomResourceDefinition resource. The Operator automatically creates a SriovOperatorConfig custom resource named default in the openshift-sriov-network-operator namespace. The default custom resource contains the SR-IOV Network Operator configuration for your cluster.

Precision Time Protocol Operator

The Precision Time Protocol (PTP) Operator is a protocol used to synchronize clocks in a network. When used in conjunction with hardware support, PTP is capable of sub-microsecond accuracy. PTP support is divided between the kernel and user space.

The clocks synchronized by PTP are organized in a master-worker hierarchy. The workers are synchronized to their masters, which may be workers to their own masters. The hierarchy is created and updated automatically by the best master clock (BMC) algorithm, which runs on every clock. When a clock has only one port, it can be master or worker, such a clock is called an ordinary clock (OC). A clock with multiple ports can be master on one port and worker on another, such a clock is called a boundary clock (BC). The top-level master is called the grandmaster clock, which can be synchronized by using a Global Positioning System (GPS) time source. By using a GPS-based time source, disparate networks can be synchronized with a high-degree of accuracy.

Creating ZTP custom resources for multiple managed clusters

If you are installing multiple managed clusters, zero touch provisioning (ZTP) uses ArgoCD and SiteConfig to manage the processes that create the custom resources (CR) and generate and apply the policies for multiple clusters, in batches of no more than 100, using the GitOps approach.

Installing and deploying the clusters is a two stage process, as shown here:

GitOps approach for Installing and deploying the clusters

Prerequisites for deploying the ZTP pipeline

  • OpenShift Container Platform cluster version 4.8 or higher and Red Hat GitOps Operator is installed.

  • Red Hat Advanced Cluster Management (RHACM) version 2.3 or above is installed.

  • For disconnected environments, make sure your source data Git repository and ztp-site-generator container image are accessible from the hub cluster.

  • If you want additional custom content, such as extra install manifests or custom resources (CR) for policies, add them to the /usr/src/hook/ztp/source-crs/extra-manifest/ directory. Similarly, you can add additional configuration CRs, as referenced from a PolicyGenTemplate, to the /usr/src/hook/ztp/source-crs/ directory.

    • Create a Containerfile that adds your additional manifests to the Red Hat provided image, for example:

      FROM <registry fqdn>/ztp-site-generator:latest (1)
      COPY myInstallManifest.yaml /usr/src/hook/ztp/source-crs/extra-manifest/
      COPY mySourceCR.yaml /usr/src/hook/ztp/source-crs/
      1 <registry fqdn> must point to a registry containing the ztp-site-generator container image provided by Red Hat.
    • Build a new container image that includes these additional files:

      $> podman build Containerfile.example

Installing the GitOps ZTP pipeline

The procedures in this section tell you how to complete the following tasks:

  • Prepare the Git repository you need to host site configuration data.

  • Configure the hub cluster for generating the required installation and policy custom resources (CR).

  • Deploy the managed clusters using zero touch provisioning (ZTP).

Preparing the ZTP Git repository

Create a Git repository for hosting site configuration data. The zero touch provisioning (ZTP) pipeline requires read access to this repository.

Procedure
  1. Create a directory structure with separate paths for the SiteConfig and PolicyGenTemplate custom resources (CR).

  2. Add pre-sync.yaml and post-sync.yaml from resource-hook-example/<policygentemplates>/ to the path for the PolicyGenTemplate CRs.

  3. Add pre-sync.yaml and post-sync.yaml from resource-hook-example/<siteconfig>/ to the path for the SiteConfig CRs.

    If your hub cluster operates in a disconnected environment, you must update the image for all four pre and post sync hook CRs.

  4. Apply the policygentemplates.ran.openshift.io and siteconfigs.ran.openshift.io CR definitions.

Preparing the hub cluster for ZTP

You can configure your hub cluster with a set of ArgoCD applications that generate the required installation and policy custom resources (CR) for each site based on a zero touch provisioning (ZTP) GitOps flow.

Procedure
  1. Install the Red Hat OpenShift GitOps Operator on your hub cluster.

  2. Extract the administrator password for ArgoCD:

    $ oc get secret openshift-gitops-cluster -n openshift-gitops -o jsonpath='{.data.admin\.password}' | base64 -d
  3. Prepare the ArgoCD pipeline configuration:

    1. Extract the ArgoCD deployment CRs from the ZTP site generator container using the latest container image version:

      $ mkdir ztp
      $ podman run --rm -v `pwd`/ztp:/mnt/ztp:Z registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.9.0-1 /bin/bash -c "cp -ar /usr/src/hook/ztp/* /mnt/ztp/"

      The remaining steps in this section relate to the ztp/gitops-subscriptions/argocd/ directory.

    2. Modify the source values of the two ArgoCD applications, deployment/clusters-app.yaml and deployment/policies-app.yaml with appropriate URL, targetRevision branch, and path values. The path values must match those used in your Git repository.

      Modify deployment/clusters-app.yaml:

      apiVersion: v1
      kind: Namespace
      metadata:
           name: clusters-sub
      ---
      apiVersion: argoproj.io/v1alpha1
      kind: Application
      metadata:
           name: clusters
           namespace: openshift-gitops
      spec:
          destination:
               server: https://kubernetes.default.svc
               namespace: clusters-sub
        project: default
        source:
            path: ztp/gitops-subscriptions/argocd/resource-hook-example/siteconfig (1)
            repoURL: https://github.com/openshift-kni/cnf-features-deploy (2)
            targetRevision: master (3)
        syncPolicy:
            automated:
                prune: true
                selfHeal: true
             syncOptions:
             - CreateNamespace=true
      1 The ztp/gitops-subscriptions/argocd/ file path that contains the siteconfig CRs for the clusters.
      2 The URL of the Git repository that contains the siteconfig custom resources that define site configuration for installing clusters.
      3 The branch on the Git repository that contains the relevant site configuration data.
    3. Modify deployment/policies-app.yaml:

      apiVersion: v1
      kind: Namespace
      metadata:
                name: policies-sub
      ---
      apiVersion: argoproj.io/v1alpha1
      kind: Application
      metadata:
            name: policies
            namespace: openshift-gitops
      spec:
        destination:
              server: https://kubernetes.default.svc
              namespace: policies-sub
        project: default
        source:
             directory:
                 recurse: true
              path: ztp/gitops-subscriptions/argocd/resource-hook-example/policygentemplates (1)
              repoURL: https://github.com/openshift-kni/cnf-features-deploy (2)
              targetRevision: master (3)
        syncPolicy:
            automated:
                prune: true
                selfHeal: true
             syncOptions:
             - CreateNamespace=true
      1 The ztp/gitops-subscriptions/argocd/ file path that contains the policygentemplates CRs for the clusters.
      2 The URL of the Git repository that contains the policygentemplates custom resources that specify configuration data for the site.
      3 The branch on the Git repository that contains the relevant configuration data.
  4. To apply the pipeline configuration to your hub cluster, enter this command:

    $ oc apply -k ./deployment

Creating the site secrets

Add the required secrets for the site to the hub cluster. These resources must be in a namespace with a name that matches the cluster name.

Procedure
  1. Create a secret for authenticating to the site Baseboard Management Controller (BMC). Ensure the secret name matches the name used in the SiteConfig. In this example, the secret name is test-sno-bmh-secret:

    apiVersion: v1
    kind: Secret
    metadata:
      name: test-sno-bmh-secret
      namespace: test-sno
    data:
      password: dGVtcA==
      username: cm9vdA==
    type: Opaque
  2. Create the pull secret for the site. The pull secret must contain all credentials necessary for installing OpenShift and all add-on Operators. In this example, the secret name is assisted-deployment-pull-secret:

    apiVersion: v1
    kind: Secret
    metadata:
      name: assisted-deployment-pull-secret
      namespace: test-sno
    type: kubernetes.io/dockerconfigjson
    data:
      .dockerconfigjson: <Your pull secret base64 encoded>

The secrets are referenced from the SiteConfig custom resource (CR) by name. The namespace must match the SiteConfig namespace.

Creating the SiteConfig custom resources

ArgoCD acts as the engine for the GitOps method of site deployment. After completing a site plan that contains the required custom resources for the site installation, a policy generator creates the manifests and applies them to the hub cluster.

Procedure
  1. Create one or more SiteConfig custom resources, site-config.yaml files, that contains the site-plan data for the clusters. For example:

    apiVersion: ran.openshift.io/v1
    kind: SiteConfig
    metadata:
      name: "test-sno"
      namespace: "test-sno"
    spec:
      baseDomain: "clus2.t5g.lab.eng.bos.redhat.com"
      pullSecretRef:
        name: "assisted-deployment-pull-secret"
      clusterImageSetNameRef: "openshift-4.9"
      sshPublicKey: "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDB3dwhI5X0ZxGBb9VK7wclcPHLc8n7WAyKjTNInFjYNP9J+Zoc/ii+l3YbGUTuqilDwZN5rVIwBux2nUyVXDfaM5kPd9kACmxWtfEWTyVRootbrNWwRfKuC2h6cOd1IlcRBM1q6IzJ4d7+JVoltAxsabqLoCbK3svxaZoKAaK7jdGG030yvJzZaNM4PiTy39VQXXkCiMDmicxEBwZx1UsA8yWQsiOQ5brod9KQRXWAAST779gbvtgXR2L+MnVNROEHf1nEjZJwjwaHxoDQYHYKERxKRHlWFtmy5dNT6BbvOpJ2e5osDFPMEd41d2mUJTfxXiC1nvyjk9Irf8YJYnqJgBIxi0IxEllUKH7mTdKykHiPrDH5D2pRlp+Donl4n+sw6qoDc/3571O93+RQ6kUSAgAsvWiXrEfB/7kGgAa/BD5FeipkFrbSEpKPVu+gue1AQeJcz9BuLqdyPUQj2VUySkSg0FuGbG7fxkKeF1h3Sga7nuDOzRxck4I/8Z7FxMF/e8DmaBpgHAUIfxXnRqAImY9TyAZUEMT5ZPSvBRZNNmLbfex1n3NLcov/GEpQOqEYcjG5y57gJ60/av4oqjcVmgtaSOOAS0kZ3y9YDhjsaOcpmRYYijJn8URAH7NrW8EZsvAoF6GUt6xHq5T258c6xSYUm5L0iKvBqrOW9EjbLw== root@cnfdc2.clus2.t5g.lab.eng.bos.redhat.com"
      clusters:
      - clusterName: "test-sno"
        clusterType: "sno"
        clusterProfile: "du"
        clusterLabels:
          group-du-sno: ""
          common: true
          sites : "test-sno"
        clusterNetwork:
          - cidr: 1001:db9::/48
            hostPrefix: 64
        machineNetwork:
          - cidr: 2620:52:0:10e7::/64
        serviceNetwork:
          - 1001:db7::/112
        additionalNTPSources:
          - 2620:52:0:1310::1f6
        nodes:
          - hostName: "test-sno.clus2.t5g.lab.eng.bos.redhat.com"
            bmcAddress: "idrac-virtualmedia+https://[2620:52::10e7:f602:70ff:fee4:f4e2]/redfish/v1/Systems/System.Embedded.1"
            bmcCredentialsName:
              name: "test-sno-bmh-secret"
            bmcDisableCertificateVerification: true (1)
            bootMACAddress: "0C:42:A1:8A:74:EC"
            bootMode: "UEFI"
            rootDeviceHints:
              hctl: '0:1:0'
            cpuset: "0-1,52-53"
            nodeNetwork:
              interfaces:
                - name: eno1
                  macAddress: "0C:42:A1:8A:74:EC"
              config:
                interfaces:
                  - name: eno1
                    type: ethernet
                    state: up
                    macAddress: "0C:42:A1:8A:74:EC"
                    ipv4:
                      enabled: false
                    ipv6:
                      enabled: true
                      address:
                      - ip: 2620:52::10e7:e42:a1ff:fe8a:900
                        prefix-length: 64
                dns-resolver:
                  config:
                    search:
                    - clus2.t5g.lab.eng.bos.redhat.com
                    server:
                    - 2620:52:0:1310::1f6
                routes:
                  config:
                  - destination: ::/0
                    next-hop-interface: eno1
                    next-hop-address: 2620:52:0:10e7::fc
                    table-id: 254
    1 If you are using UEFI SecureBoot, add this line to prevent failures due to invalid or local certificates.
  2. Save the files and push them to the zero touch provisioning (ZTP) Git repository accessible from the hub cluster and defined as a source repository of the ArgoCD application.

ArgoCD detects that the application is out of sync. Upon sync, either automatic or manual, ArgoCD synchronizes the PolicyGenTemplate to the hub cluster and launches the associated resource hooks. These hooks are responsible for generating the policy wrapped configuration CRs that apply to the spoke cluster. The resource hooks convert the site definitions to installation custom resources and applies them to the hub cluster:

  • Namespace - Unique per site

  • AgentClusterInstall

  • BareMetalHost

  • ClusterDeployment

  • InfraEnv

  • NMStateConfig

  • ExtraManifestsConfigMap - Extra manifests. The additional manifests include workload partitioning, chronyd, mountpoint hiding, sctp enablement, and more.

  • ManagedCluster

  • KlusterletAddonConfig

Red Hat Advanced Cluster Management (RHACM) (ACM) deploys the hub cluster.

Creating the PolicyGenTemplates

Use the following procedure to create the PolicyGenTemplates you will need for generating policies in your Git repository for the hub cluster.

Procedure
  1. Create the PolicyGenTemplates and save them to the zero touch provisioning (ZTP) Git repository accessible from the hub cluster and defined as a source repository of the ArgoCD application.

  2. ArgoCD detects that the application is out of sync. Upon sync, either automatic or manual, ArgoCD applies the new PolicyGenTemplate to the hub cluster and launches the associated resource hooks. These hooks are responsible for generating the policy wrapped configuration CRs that apply to the spoke cluster and perform the following actions:

    1. Create the Red Hat Advanced Cluster Management (RHACM) (ACM) policies according to the basic distributed unit (DU) profile and required customizations.

    2. Apply the generated policies to the hub cluster.

The ZTP process creates policies that direct ACM to apply the desired configuration to the cluster nodes.

Checking the installation status

The ArgoCD pipeline detects the SiteConfig and PolicyGenTemplate custom resources (CRs) in the Git repository and syncs them to the hub cluster. In the process, it generates installation and policy CRs and applies them to the hub cluster. You can monitor the progress of this synchronization in the ArgoCD dashboard.

Procedure
  1. Monitor the progress of cluster installation using the following commands:

    $ export CLUSTER=<cluster_name>
    $ oc get agentclusterinstall -n $CLUSTER $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Completed")]}' | jq
    $ curl -sk $(oc get agentclusterinstall -n $CLUSTER $CLUSTER -o jsonpath='{.status.debugInfo.eventsURL}') | jq '.[-2,-1]'
  2. Use the Red Hat Advanced Cluster Management (RHACM) (ACM) dashboard to monitor the progress of policy reconciliation.

Site cleanup

To remove a site and the associated installation and policy custom resources (CRs), remove the SiteConfig and site-specific PolicyGenTemplate CRs from the Git repository. The pipeline hooks remove the generated CRs.

Before removing a SiteConfig CR you must detach the cluster from ACM.

Removing the ArgoCD pipeline

Use the following procedure if you want to remove the ArgoCD pipeline and all generated artifacts.

Procedure
  1. Detach all clusters from ACM.

  2. Delete all SiteConfig and PolicyGenTemplate custom resources (CRs) from your Git repository.

  3. Delete the following namespaces:

    • All policy namespaces:

       $ oc get policy -A
    • clusters-sub

    • policies-sub

  4. Process the directory using the Kustomize tool:

     $ oc delete -k cnf-features-deploy/ztp/gitops-subscriptions/argocd/deployment

Troubleshooting GitOps ZTP

As noted, the ArgoCD pipeline synchronizes the SiteConfig and PolicyGenTemplate custom resources (CR) from the Git repository to the hub cluster. During this process, post-sync hooks create the installation and policy CRs that are also applied to the hub cluster. Use the following procedures to troubleshoot issues that might occur in this process.

Validating the generation of installation CRs

SiteConfig applies Installation custom resources (CR) to the hub cluster in a namespace with the name matching the site name. To check the status, enter the following command:

$ oc get AgentClusterInstall -n <cluster_name>

If no object is returned, use the following procedure to troubleshoot the ArgoCD pipeline flow from SiteConfig to the installation CRs.

Procedure
  1. Check the synchronization of the SiteConfig to the hub cluster using either of the following commands:

    $ oc get siteconfig -A

    or

    $ oc get siteconfig -n clusters-sub

    If the SiteConfig is missing, one of the following situations has occurred:

    • The clusters application failed to synchronize the CR from the Git repository to the hub. Use the following command to verify this:

      $ oc describe -n openshift-gitops application clusters

      Check for Status: Synced and that the Revision: is the SHA of the commit you pushed to the subscribed repository.

    • The pre-sync hook failed, possibly due to a failure to pull the container image. Check the ArgoCD dashboard for the status of the pre-sync job in the clusters application.

  2. Verify the post hook job ran:

    $ oc describe job -n clusters-sub siteconfig-post
    • If successful, the returned output indicates succeeded: 1.

    • If the job fails, ArgoCD retries it. In some cases, the first pass will fail and the second pass will indicate that the job passed.

  3. Check for errors in the post hook job:

    $ oc get pod -n clusters-sub

    Note the name of the siteconfig-post-xxxxx pod:

    $ oc logs -n clusters-sub siteconfig-post-xxxxx

    If the logs indicate errors, correct the conditions and push the corrected SiteConfig or PolicyGenTemplate to the Git repository.

Validating the generation of policy CRs

ArgoCD generates the policy custom resources (CRs) in the same namespace as the PolicyGenTemplate from which they were created. The same troubleshooting flow applies to all policy CRs generated from PolicyGenTemplates regardless of whether they are common, group, or site based.

To check the status of the policy CRs, enter the following commands:

$ export NS=<namespace>
$ oc get policy -n $NS

The returned output displays the expected set of policy wrapped CRs. If no object is returned, use the following procedure to troubleshoot the ArgoCD pipeline flow from SiteConfig to the policy CRs.

Procedure
  1. Check the synchronization of the PolicyGenTemplate to the hub cluster:

    $ oc get policygentemplate -A

    or

    $ oc get policygentemplate -n $NS

    If the PolicyGenTemplate is not synchronized, one of the following situations has occurred:

    • The clusters application failed to synchronize the CR from the Git repository to the hub. Use the following command to verify this:

      $ oc describe -n openshift-gitops application clusters

      Check for Status: Synced and that the Revision: is the SHA of the commit you pushed to the subscribed repository.

    • The pre-sync hook failed, possibly due to a failure to pull the container image. Check the ArgoCD dashboard for the status of the pre-sync job in the clusters application.

  2. Ensure the policies were copied to the cluster namespace. When ACM recognizes that policies apply to a ManagedCluster, ACM applies the policy CR objects to the cluster namespace:

    $ oc get policy -n <cluster_name>

    ACM copies all applicable common, group, and site policies here. The policy names are <policyNamespace> and <policyName>.

  3. Check the placement rule for any policies not copied to the cluster namespace. The matchSelector in the PlacementRule for those policies should match the labels on the ManagedCluster:

    $ oc get placementrule -n $NS
  4. Make a note of the PlacementRule name for the missing common, group, or site policy:

     oc get placementrule -n $NS <placmentRuleName> -o yaml
    • The status decisions value should include your cluster name.

    • The key value of the matchSelector in the spec should match the labels on your managed cluster. Check the labels on ManagedCluster:

       oc get ManagedCluster $CLUSTER -o jsonpath='{.metadata.labels}' | jq
      Example
      apiVersion: apps.open-cluster-management.io/v1
      kind: PlacementRule
      metadata:
        name: group-test1-policies-placementrules
        namespace: group-test1-policies
      spec:
        clusterSelector:
          matchExpressions:
          - key: group-test1
            operator: In
            values:
            - ""
      status:
        decisions:
        - clusterName: <cluster_name>
          clusterNamespace: <cluster_name>
  5. Ensure all policies are compliant:

     oc get policy -n $CLUSTER

    If the Namespace, OperatorGroup, and Subscription policies are compliant but the Operator configuration policies are not it is likely that the Operators did not install.