×

You can use the Topology Aware Lifecycle Manager (TALM) to manage the software lifecycle of multiple single-node OpenShift clusters. TALM uses Red Hat Advanced Cluster Management (RHACM) policies to perform changes on the target clusters.

Topology Aware Lifecycle Manager is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.

About the Topology Aware Lifecycle Manager configuration

The Topology Aware Lifecycle Manager (TALM) manages the deployment of Red Hat Advanced Cluster Management (RHACM) policies for one or more OpenShift Container Platform clusters. Using TALM in a large network of clusters allows the phased rollout of policies to the clusters in limited batches. This helps to minimize possible service disruptions when updating. With TALM, you can control the following actions:

  • The timing of the update

  • The number of RHACM-managed clusters

  • The subset of managed clusters to apply the policies to

  • The update order of the clusters

  • The set of policies remediated to the cluster

  • The order of policies remediated to the cluster

TALM supports the orchestration of the OpenShift Container Platform y-stream and z-stream updates, and day-two operations on y-streams and z-streams.

About managed policies used with Topology Aware Lifecycle Manager

The Topology Aware Lifecycle Manager (TALM) uses RHACM policies for cluster updates.

TALM can be used to manage the rollout of any policy CR where the remediationAction field is set to inform. Supported use cases include the following:

  • Manual user creation of policy CRs

  • Automatically generated policies from the PolicyGenTemplate custom resource definition (CRD)

For policies that update an Operator subscription with manual approval, TALM provides additional functionality that approves the installation of the updated Operator.

For more information about managed policies, see Policy Overview in the RHACM documentation.

For more information about the PolicyGenTemplate CRD, see the "About the PolicyGenTemplate" section in "Deploying distributed units at scale in a disconnected environment".

Installing the Topology Aware Lifecycle Manager by using the web console

You can use the OpenShift Container Platform web console to install the Topology Aware Lifecycle Manager.

Prerequisites
  • Install the latest version of the RHACM Operator.

  • Set up a hub cluster with disconnected regitry.

  • Log in as a user with cluster-admin privileges.

Procedure
  1. In the OpenShift Container Platform web console, navigate to OperatorsOperatorHub.

  2. Search for the Topology Aware Lifecycle Manager from the list of available Operators, and then click Install.

  3. Keep the default selection of Installation mode ["All namespaces on the cluster (default)"] and Installed Namespace ("openshift-operators") to ensure that the Operator is installed properly.

  4. Click Install.

Verification

To confirm that the installation is successful:

  1. Navigate to the OperatorsInstalled Operators page.

  2. Check that the Operator is installed in the All Namespaces namespace and its status is Succeeded.

If the Operator is not installed successfully:

  1. Navigate to the OperatorsInstalled Operators page and inspect the Status column for any errors or failures.

  2. Navigate to the WorkloadsPods page and check the logs in any containers in the cluster-group-upgrades-controller-manager pod that are reporting issues.

Installing the Topology Aware Lifecycle Manager by using the CLI

You can use the OpenShift CLI (oc) to install the Topology Aware Lifecycle Manager (TALM).

Prerequisites
  • Install the OpenShift CLI (oc).

  • Install the latest version of the RHACM Operator.

  • Set up a hub cluster with disconnected registry.

  • Log in as a user with cluster-admin privileges.

Procedure
  1. Create a Subscription CR:

    1. Define the Subscription CR and save the YAML file, for example, talm-subscription.yaml:

      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: openshift-topology-aware-lifecycle-manager-subscription
        namespace: openshift-operators
      spec:
        channel: "stable"
        name: topology-aware-lifecycle-manager
        source: redhat-operators
        sourceNamespace: openshift-marketplace
    2. Create the Subscription CR by running the following command:

      $ oc create -f talm-subscription.yaml
Verification
  1. Verify that the installation succeeded by inspecting the CSV resource:

    $ oc get csv -n openshift-operators
    Example output
    NAME                                                      DISPLAY                            VERSION               REPLACES   PHASE
    topology-aware-lifecycle-manager.4.10.0-202206301927      Topology Aware Lifecycle Manager   4.10.0-202206301927              Succeeded
  2. Verify that the TALM is up and running:

    $ oc get deploy -n openshift-operators
    Example output
    NAMESPACE                                          NAME                                             READY   UP-TO-DATE   AVAILABLE   AGE
    openshift-operators                                cluster-group-upgrades-controller-manager        1/1     1            1           14s

About the ClusterGroupUpgrade CR

The Topology Aware Lifecycle Manager (TALM) builds the remediation plan from the ClusterGroupUpgrade CR for a group of clusters. You can define the following specifications in a ClusterGroupUpgrade CR:

  • Clusters in the group

  • Blocking ClusterGroupUpgrade CRs

  • Applicable list of managed policies

  • Number of concurrent updates

  • Applicable canary updates

  • Actions to perform before and after the update

  • Update timing

As TALM works through remediation of the policies to the specified clusters, the ClusterGroupUpgrade CR can have the following states:

  • UpgradeNotStarted

  • UpgradeCannotStart

  • UpgradeNotComplete

  • UpgradeTimedOut

  • UpgradeCompleted

  • PrecachingRequired

After TALM completes a cluster update, the cluster does not update again under the control of the same ClusterGroupUpgrade CR. You must create a new ClusterGroupUpgrade CR in the following cases:

  • When you need to update the cluster again

  • When the cluster changes to non-compliant with the inform policy after being updated

The UpgradeNotStarted state

The initial state of the ClusterGroupUpgrade CR is UpgradeNotStarted.

TALM builds a remediation plan based on the following fields:

  • The clusterSelector field specifies the labels of the clusters that you want to update.

  • The clusters field specifies a list of clusters to update.

  • The canaries field specifies the clusters for canary updates.

  • The maxConcurrency field specifies the number of clusters to update in a batch.

You can use the clusters and the clusterSelector fields together to create a combined list of clusters.

The remediation plan starts with the clusters listed in the canaries field. Each canary cluster forms a single-cluster batch.

Any failures during the update of a canary cluster stops the update process.

The ClusterGroupUpgrade CR transitions to the UpgradeNotCompleted state after the remediation plan is successfully created and after the enable field is set to true. At this point, TALM starts to update the non-compliant clusters with the specified managed policies.

You can only make changes to the spec fields if the ClusterGroupUpgrade CR is either in the UpgradeNotStarted or the UpgradeCannotStart state.

Sample ClusterGroupUpgrade CR in the UpgradeNotStarted state
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-upgrade-complete
  namespace: default
spec:
  clusters: (1)
  - spoke1
  enable: false
  managedPolicies: (2)
  - policy1-common-cluster-version-policy
  - policy2-common-pao-sub-policy
  remediationStrategy: (3)
    canaries: (4)
      - spoke1
    maxConcurrency: 1 (5)
    timeout: 240
status: (6)
  conditions:
  - message: The ClusterGroupUpgrade CR is not enabled
    reason: UpgradeNotStarted
    status: "False"
    type: Ready
  copiedPolicies:
  - cgu-upgrade-complete-policy1-common-cluster-version-policy
  - cgu-upgrade-complete-policy2-common-pao-sub-policy
  managedPoliciesForUpgrade:
  - name: policy1-common-cluster-version-policy
    namespace: default
  - name: policy2-common-pao-sub-policy
    namespace: default
  placementBindings:
  - cgu-upgrade-complete-policy1-common-cluster-version-policy
  - cgu-upgrade-complete-policy2-common-pao-sub-policy
  placementRules:
  - cgu-upgrade-complete-policy1-common-cluster-version-policy
  - cgu-upgrade-complete-policy2-common-pao-sub-policy
  remediationPlan:
  - - spoke1
1 Defines the list of clusters to update.
2 Lists the user-defined set of policies to remediate.
3 Defines the specifics of the cluster updates.
4 Defines the clusters for canary updates.
5 Defines the maximum number of concurrent updates in a batch. The number of remediation batches is the number of canary clusters, plus the number of clusters, except the canary clusters, divided by the maxConcurrency value. The clusters that are already compliant with all the managed policies are excluded from the remediation plan.
6 Displays information about the status of the updates.

The UpgradeCannotStart state

In the UpgradeCannotStart state, the update cannot start because of the following reasons:

  • Blocking CRs are missing from the system

  • Blocking CRs have not yet finished

The UpgradeNotCompleted state

In the UpgradeNotCompleted state, TALM enforces the policies following the remediation plan defined in the UpgradeNotStarted state.

Enforcing the policies for subsequent batches starts immediately after all the clusters of the current batch are compliant with all the managed policies. If the batch times out, TALM moves on to the next batch. The timeout value of a batch is the spec.timeout field divided by the number of batches in the remediation plan.

The managed policies apply in the order that they are listed in the managedPolicies field in the ClusterGroupUpgrade CR. One managed policy is applied to the specified clusters at a time. After the specified clusters comply with the current policy, the next managed policy is applied to the next non-compliant cluster.

Sample ClusterGroupUpgrade CR in the UpgradeNotCompleted state
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-upgrade-complete
  namespace: default
spec:
  clusters:
  - spoke1
  enable: true (1)
  managedPolicies:
  - policy1-common-cluster-version-policy
  - policy2-common-pao-sub-policy
  remediationStrategy:
    maxConcurrency: 1
    timeout: 240
status: (2)
  conditions:
  - message: The ClusterGroupUpgrade CR has upgrade policies that are still non compliant
    reason: UpgradeNotCompleted
    status: "False"
    type: Ready
  copiedPolicies:
  - cgu-upgrade-complete-policy1-common-cluster-version-policy
  - cgu-upgrade-complete-policy2-common-pao-sub-policy
  managedPoliciesForUpgrade:
  - name: policy1-common-cluster-version-policy
    namespace: default
  - name: policy2-common-pao-sub-policy
    namespace: default
  placementBindings:
  - cgu-upgrade-complete-policy1-common-cluster-version-policy
  - cgu-upgrade-complete-policy2-common-pao-sub-policy
  placementRules:
  - cgu-upgrade-complete-policy1-common-cluster-version-policy
  - cgu-upgrade-complete-policy2-common-pao-sub-policy
  remediationPlan:
  - - spoke1
  status:
    currentBatch: 1
    remediationPlanForBatch: (3)
      spoke1: 0
1 The update starts when the value of the spec.enable field is true.
2 The status fields change accordingly when the update begins.
3 Lists the clusters in the batch and the index of the policy that is being currently applied to each cluster. The index of the policies starts with 0 and the index follows the order of the status.managedPoliciesForUpgrade list.

The UpgradeTimedOut state

In the UpgradeTimedOut state, TALM checks every hour if all the policies for the ClusterGroupUpgrade CR are compliant. The checks continue until the ClusterGroupUpgrade CR is deleted or the updates are completed. The periodic checks allow the updates to complete if they get prolonged due to network, CPU, or other issues.

TALM transitions to the UpgradeTimedOut state in two cases:

  • When the current batch contains canary updates and the cluster in the batch does not comply with all the managed policies within the batch timeout.

  • When the clusters do not comply with the managed policies within the timeout value specified in the remediationStrategy field.

If the policies are compliant, TALM transitions to the UpgradeCompleted state.

The UpgradeCompleted state

In the UpgradeCompleted state, the cluster updates are complete.

Sample ClusterGroupUpgrade CR in the UpgradeCompleted state
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-upgrade-complete
  namespace: default
spec:
  actions:
    afterCompletion:
      deleteObjects: true (1)
  clusters:
  - spoke1
  enable: true
  managedPolicies:
  - policy1-common-cluster-version-policy
  - policy2-common-pao-sub-policy
  remediationStrategy:
    maxConcurrency: 1
    timeout: 240
status: (2)
  conditions:
  - message: The ClusterGroupUpgrade CR has all clusters compliant with all the managed policies
    reason: UpgradeCompleted
    status: "True"
    type: Ready
  managedPoliciesForUpgrade:
  - name: policy1-common-cluster-version-policy
    namespace: default
  - name: policy2-common-pao-sub-policy
    namespace: default
  remediationPlan:
  - - spoke1
  status:
    remediationPlanForBatch:
      spoke1: -2 (3)
1 The value of spec.action.afterCompletion.deleteObjects field is true by default. After the update is completed, TALM deletes the underlying RHACM objects that were created during the update. This option is to prevent the RHACM hub from continuously checking for compliance after a successful update.
2 The status fields show that the updates completed successfully.
3 Displays that all the policies are applied to the cluster.

The PrecachingRequired state

In the PrecachingRequired state, the clusters need to have images pre-cached before the update can start. For more information about pre-caching, see the "Using the container image pre-cache feature" section.

Blocking ClusterGroupUpgrade CRs

You can create multiple ClusterGroupUpgrade CRs and control their order of application.

For example, if you create ClusterGroupUpgrade CR C that blocks the start of ClusterGroupUpgrade CR A, then ClusterGroupUpgrade CR A cannot start until the status of ClusterGroupUpgrade CR C becomes UpgradeComplete.

One ClusterGroupUpgrade CR can have multiple blocking CRs. In this case, all the blocking CRs must complete before the upgrade for the current CR can start.

Prerequisites
  • Install the Topology Aware Lifecycle Manager (TALM).

  • Provision one or more managed clusters.

  • Log in as a user with cluster-admin privileges.

  • Create RHACM policies in the hub cluster.

Procedure
  1. Save the content of the ClusterGroupUpgrade CRs in the cgu-a.yaml, cgu-b.yaml, and cgu-c.yaml files.

    apiVersion: ran.openshift.io/v1alpha1
    kind: ClusterGroupUpgrade
    metadata:
      name: cgu-a
      namespace: default
    spec:
      blockingCRs: (1)
      - name: cgu-c
        namespace: default
      clusters:
      - spoke1
      - spoke2
      - spoke3
      enable: false
      managedPolicies:
      - policy1-common-cluster-version-policy
      - policy2-common-pao-sub-policy
      - policy3-common-ptp-sub-policy
      remediationStrategy:
        canaries:
        - spoke1
        maxConcurrency: 2
        timeout: 240
    status:
      conditions:
      - message: The ClusterGroupUpgrade CR is not enabled
        reason: UpgradeNotStarted
        status: "False"
        type: Ready
      copiedPolicies:
      - cgu-a-policy1-common-cluster-version-policy
      - cgu-a-policy2-common-pao-sub-policy
      - cgu-a-policy3-common-ptp-sub-policy
      managedPoliciesForUpgrade:
      - name: policy1-common-cluster-version-policy
        namespace: default
      - name: policy2-common-pao-sub-policy
        namespace: default
      - name: policy3-common-ptp-sub-policy
        namespace: default
      placementBindings:
      - cgu-a-policy1-common-cluster-version-policy
      - cgu-a-policy2-common-pao-sub-policy
      - cgu-a-policy3-common-ptp-sub-policy
      placementRules:
      - cgu-a-policy1-common-cluster-version-policy
      - cgu-a-policy2-common-pao-sub-policy
      - cgu-a-policy3-common-ptp-sub-policy
      remediationPlan:
      - - spoke1
      - - spoke2
    1 Defines the blocking CRs. The cgu-a update cannot start until cgu-c is complete.
    apiVersion: ran.openshift.io/v1alpha1
    kind: ClusterGroupUpgrade
    metadata:
      name: cgu-b
      namespace: default
    spec:
      blockingCRs: (1)
      - name: cgu-a
        namespace: default
      clusters:
      - spoke4
      - spoke5
      enable: false
      managedPolicies:
      - policy1-common-cluster-version-policy
      - policy2-common-pao-sub-policy
      - policy3-common-ptp-sub-policy
      - policy4-common-sriov-sub-policy
      remediationStrategy:
        maxConcurrency: 1
        timeout: 240
    status:
      conditions:
      - message: The ClusterGroupUpgrade CR is not enabled
        reason: UpgradeNotStarted
        status: "False"
        type: Ready
      copiedPolicies:
      - cgu-b-policy1-common-cluster-version-policy
      - cgu-b-policy2-common-pao-sub-policy
      - cgu-b-policy3-common-ptp-sub-policy
      - cgu-b-policy4-common-sriov-sub-policy
      managedPoliciesForUpgrade:
      - name: policy1-common-cluster-version-policy
        namespace: default
      - name: policy2-common-pao-sub-policy
        namespace: default
      - name: policy3-common-ptp-sub-policy
        namespace: default
      - name: policy4-common-sriov-sub-policy
        namespace: default
      placementBindings:
      - cgu-b-policy1-common-cluster-version-policy
      - cgu-b-policy2-common-pao-sub-policy
      - cgu-b-policy3-common-ptp-sub-policy
      - cgu-b-policy4-common-sriov-sub-policy
      placementRules:
      - cgu-b-policy1-common-cluster-version-policy
      - cgu-b-policy2-common-pao-sub-policy
      - cgu-b-policy3-common-ptp-sub-policy
      - cgu-b-policy4-common-sriov-sub-policy
      remediationPlan:
      - - spoke4
      - - spoke5
      status: {}
    1 The cgu-b update cannot start until cgu-a is complete.
    apiVersion: ran.openshift.io/v1alpha1
    kind: ClusterGroupUpgrade
    metadata:
      name: cgu-c
      namespace: default
    spec: (1)
      clusters:
      - spoke6
      enable: false
      managedPolicies:
      - policy1-common-cluster-version-policy
      - policy2-common-pao-sub-policy
      - policy3-common-ptp-sub-policy
      - policy4-common-sriov-sub-policy
      remediationStrategy:
        maxConcurrency: 1
        timeout: 240
    status:
      conditions:
      - message: The ClusterGroupUpgrade CR is not enabled
        reason: UpgradeNotStarted
        status: "False"
        type: Ready
      copiedPolicies:
      - cgu-c-policy1-common-cluster-version-policy
      - cgu-c-policy4-common-sriov-sub-policy
      managedPoliciesCompliantBeforeUpgrade:
      - policy2-common-pao-sub-policy
      - policy3-common-ptp-sub-policy
      managedPoliciesForUpgrade:
      - name: policy1-common-cluster-version-policy
        namespace: default
      - name: policy4-common-sriov-sub-policy
        namespace: default
      placementBindings:
      - cgu-c-policy1-common-cluster-version-policy
      - cgu-c-policy4-common-sriov-sub-policy
      placementRules:
      - cgu-c-policy1-common-cluster-version-policy
      - cgu-c-policy4-common-sriov-sub-policy
      remediationPlan:
      - - spoke6
      status: {}
    1 The cgu-c update does not have any blocking CRs. TALM starts the cgu-c update when the enable field is set to true.
  2. Create the ClusterGroupUpgrade CRs by running the following command for each relevant CR:

    $ oc apply -f <name>.yaml
  3. Start the update process by running the following command for each relevant CR:

    $ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/<name> \
    --type merge -p '{"spec":{"enable":true}}'

    The following examples show ClusterGroupUpgrade CRs where the enable field is set to true:

    Example for cgu-a with blocking CRs
    apiVersion: ran.openshift.io/v1alpha1
    kind: ClusterGroupUpgrade
    metadata:
      name: cgu-a
      namespace: default
    spec:
      blockingCRs:
      - name: cgu-c
        namespace: default
      clusters:
      - spoke1
      - spoke2
      - spoke3
      enable: true
      managedPolicies:
      - policy1-common-cluster-version-policy
      - policy2-common-pao-sub-policy
      - policy3-common-ptp-sub-policy
      remediationStrategy:
        canaries:
        - spoke1
        maxConcurrency: 2
        timeout: 240
    status:
      conditions:
      - message: 'The ClusterGroupUpgrade CR is blocked by other CRs that have not yet
          completed: [cgu-c]' (1)
        reason: UpgradeCannotStart
        status: "False"
        type: Ready
      copiedPolicies:
      - cgu-a-policy1-common-cluster-version-policy
      - cgu-a-policy2-common-pao-sub-policy
      - cgu-a-policy3-common-ptp-sub-policy
      managedPoliciesForUpgrade:
      - name: policy1-common-cluster-version-policy
        namespace: default
      - name: policy2-common-pao-sub-policy
        namespace: default
      - name: policy3-common-ptp-sub-policy
        namespace: default
      placementBindings:
      - cgu-a-policy1-common-cluster-version-policy
      - cgu-a-policy2-common-pao-sub-policy
      - cgu-a-policy3-common-ptp-sub-policy
      placementRules:
      - cgu-a-policy1-common-cluster-version-policy
      - cgu-a-policy2-common-pao-sub-policy
      - cgu-a-policy3-common-ptp-sub-policy
      remediationPlan:
      - - spoke1
      - - spoke2
      status: {}
    1 Shows the list of blocking CRs.
    Example for cgu-b with blocking CRs
    apiVersion: ran.openshift.io/v1alpha1
    kind: ClusterGroupUpgrade
    metadata:
      name: cgu-b
      namespace: default
    spec:
      blockingCRs:
      - name: cgu-a
        namespace: default
      clusters:
      - spoke4
      - spoke5
      enable: true
      managedPolicies:
      - policy1-common-cluster-version-policy
      - policy2-common-pao-sub-policy
      - policy3-common-ptp-sub-policy
      - policy4-common-sriov-sub-policy
      remediationStrategy:
        maxConcurrency: 1
        timeout: 240
    status:
      conditions:
      - message: 'The ClusterGroupUpgrade CR is blocked by other CRs that have not yet
          completed: [cgu-a]' (1)
        reason: UpgradeCannotStart
        status: "False"
        type: Ready
      copiedPolicies:
      - cgu-b-policy1-common-cluster-version-policy
      - cgu-b-policy2-common-pao-sub-policy
      - cgu-b-policy3-common-ptp-sub-policy
      - cgu-b-policy4-common-sriov-sub-policy
      managedPoliciesForUpgrade:
      - name: policy1-common-cluster-version-policy
        namespace: default
      - name: policy2-common-pao-sub-policy
        namespace: default
      - name: policy3-common-ptp-sub-policy
        namespace: default
      - name: policy4-common-sriov-sub-policy
        namespace: default
      placementBindings:
      - cgu-b-policy1-common-cluster-version-policy
      - cgu-b-policy2-common-pao-sub-policy
      - cgu-b-policy3-common-ptp-sub-policy
      - cgu-b-policy4-common-sriov-sub-policy
      placementRules:
      - cgu-b-policy1-common-cluster-version-policy
      - cgu-b-policy2-common-pao-sub-policy
      - cgu-b-policy3-common-ptp-sub-policy
      - cgu-b-policy4-common-sriov-sub-policy
      remediationPlan:
      - - spoke4
      - - spoke5
      status: {}
    1 Shows the list of blocking CRs.
    Example for cgu-c with blocking CRs
    apiVersion: ran.openshift.io/v1alpha1
    kind: ClusterGroupUpgrade
    metadata:
      name: cgu-c
      namespace: default
    spec:
      clusters:
      - spoke6
      enable: true
      managedPolicies:
      - policy1-common-cluster-version-policy
      - policy2-common-pao-sub-policy
      - policy3-common-ptp-sub-policy
      - policy4-common-sriov-sub-policy
      remediationStrategy:
        maxConcurrency: 1
        timeout: 240
    status:
      conditions:
      - message: The ClusterGroupUpgrade CR has upgrade policies that are still non compliant (1)
        reason: UpgradeNotCompleted
        status: "False"
        type: Ready
      copiedPolicies:
      - cgu-c-policy1-common-cluster-version-policy
      - cgu-c-policy4-common-sriov-sub-policy
      managedPoliciesCompliantBeforeUpgrade:
      - policy2-common-pao-sub-policy
      - policy3-common-ptp-sub-policy
      managedPoliciesForUpgrade:
      - name: policy1-common-cluster-version-policy
        namespace: default
      - name: policy4-common-sriov-sub-policy
        namespace: default
      placementBindings:
      - cgu-c-policy1-common-cluster-version-policy
      - cgu-c-policy4-common-sriov-sub-policy
      placementRules:
      - cgu-c-policy1-common-cluster-version-policy
      - cgu-c-policy4-common-sriov-sub-policy
      remediationPlan:
      - - spoke6
      status:
        currentBatch: 1
        remediationPlanForBatch:
          spoke6: 0
    1 The cgu-c update does not have any blocking CRs.

Update policies on managed clusters

The Topology Aware Lifecycle Manager (TALM) remediates a set of inform policies for the clusters specified in the ClusterGroupUpgrade CR. TALM remediates inform policies by making enforce copies of the managed RHACM policies. Each copied policy has its own corresponding RHACM placement rule and RHACM placement binding.

One by one, TALM adds each cluster from the current batch to the placement rule that corresponds with the applicable managed policy. If a cluster is already compliant with a policy, TALM skips applying that policy on the compliant cluster. TALM then moves on to applying the next policy to the non-compliant cluster. After TALM completes the updates in a batch, all clusters are removed from the placement rules associated with the copied policies. Then, the update of the next batch starts.

If a spoke cluster does not report any compliant state to RHACM, the managed policies on the hub cluster can be missing status information that TALM needs. TALM handles these cases in the following ways:

  • If a policy’s status.compliant field is missing, TALM ignores the policy and adds a log entry. Then, TALM continues looking at the policy’s status.status field.

  • If a policy’s status.status is missing, TALM produces an error.

  • If a cluster’s compliance status is missing in the policy’s status.status field, TALM considers that cluster to be non-compliant with that policy.

For more information about RHACM policies, see Policy overview.

Additional resources

For more information about PolicyGenTemplate CRD, see About the PolicyGenTemplate.

Applying update policies to managed clusters

You can update your managed clusters by applying your policies.

Prerequisites
  • Install the Topology Aware Lifecycle Manager (TALM).

  • Provision one or more managed clusters.

  • Log in as a user with cluster-admin privileges.

  • Create RHACM policies in the hub cluster.

Procedure
  1. Save the contents of the ClusterGroupUpgrade CR in the cgu-1.yaml file.

    apiVersion: ran.openshift.io/v1alpha1
    kind: ClusterGroupUpgrade
    metadata:
      name: cgu-1
      namespace: default
    spec:
      managedPolicies: (1)
        - policy1-common-cluster-version-policy
        - policy2-common-pao-sub-policy
        - policy3-common-ptp-sub-policy
        - policy4-common-sriov-sub-policy
      enable: false
      clusters: (2)
      - spoke1
      - spoke2
      - spoke5
      - spoke6
      remediationStrategy:
        maxConcurrency: 2 (3)
        timeout: 240 (4)
    1 The name of the policies to apply.
    2 The list of clusters to update.
    3 The maxConcurrency field signifies the number of clusters updated at the same time.
    4 The update timeout in minutes.