Upgrading using the Operator | Upgrading | Red Hat Advanced Cluster Security for Kubernetes 4.4

Preparing to upgrade
- Changing the collection method
- Setting the forceCollection parameter
Modifying Central custom resource
Modifying Central custom resource for external database
Changing subscription channel
Remove Central-attached PV
- Remove Central-attached PV using the RHACS Operator
Rolling back an Operator upgrade
- Rolling back an Operator upgrade by using the CLI
- Rolling back an Operator upgrade by using the web console
Troubleshooting Operator upgrade issues
- Central DB cannot be scheduled
- Central or Secured cluster fails to deploy

Upgrades through the Red Hat Advanced Cluster Security for Kubernetes (RHACS) Operator are performed automatically or manually, depending on the Update approval option you chose at installation.

RHACS 4.0 includes a significant architectural change, moving Central’s database to PostgreSQL. Because of this change, RHACS 4.0 Operator is published by a new subscription channel. Therefore, as part of the upgrade instructions, you must manually change the subscription channel to upgrade from RHACS 3.74 to RHACS 4.0.

Because of the database related changes introduced in RHACS 4.0, even if you have selected Automatic in the Update approval field, you must manually upgrade to RHACS 4.0.
You must be using RHACS 3.74 to upgrade to RHACS 4.0. If you are using a version older than 3.74, you must first upgrade to RHACS 3.74 and then upgrade to RHACS 4.0.

Preparing to upgrade

Before you upgrade Red Hat Advanced Cluster Security for Kubernetes (RHACS) version, you must:

If you are upgrading from version 3.74, verify that you are running the latest patch release version of the RHACS Operator 3.74.
Backup your existing Central database.
If the cluster you are upgrading contains the SecuredCluster custom resource (CR), change the collection method to EBPF or CORE_BPF.

Changing the collection method

If the cluster that you are upgrading contains the SecuredCluster CR, you must ensure that the per node collection setting is set to CORE_BPF before you upgrade, if you are upgrading from 4.1 or later. Otherwise, set the collection method to EBPF. To set the collection method to EBPF, you must set the forceCollection parameter to true after the upgrade and make sure that the collection method is EBPF.

Procedure

In the OpenShift Container Platform web console, go to the RHACS Operator page.
In the top navigation menu, select Secured Cluster.
Click the instance name, for example, stackrox-secured-cluster-services.
Use one of the following methods to change the setting:
- In the Form view, under Per Node Settings → Collector Settings → Collection, select CORE_BPF.
- Click YAML to open the YAML editor and locate the spec.perNode.collector.collection attribute. If the value is KernelModule, then change it to CORE_BPF.
  
  Only use EBPF if you are upgrading from a version earlier than 4.1 or if there is a specific reason to use it.
Click Save.

Setting the forceCollection parameter

When upgrading secured clusters, if you set the collection method to EBPF, you must set the forceCollection parameter to true after the upgrade. Then, make sure that the spec.perNode.collector.collection is still set to EBPF in the YAML editor.

Procedure

In the OpenShift Container Platform web console, go to the RHACS Operator page.
In the top navigation menu, select Secured Cluster.
Click the instance name, for example, stackrox-secured-cluster-services.
Click YAML to open the YAML editor.
Locate the spec.perNode.collector.forceCollection parameter and set it to true.
Click Save.

Additional resources

Modifying Central custom resource

The Central DB service requires persistent storage. If you have not configured a default storage class for the Central cluster that is an SSD or is high performance, you must update the Central custom resource to configure the storage class for the Central DB persistent volume claim (PVC).

Skip this section if you have already configured a default storage class for Central.

Procedure

Update the central custom resource with the following configuration:

spec:
  central:
    db:
      isEnabled: Default (1)
      persistence:
        persistentVolumeClaim: (2)
          claimName: central-db
          size: 100Gi
          storageClassName: <storage-class-name>

1	You must not change the value of `IsEnabled` to `Enabled`.
2	If this claim exists, your cluster uses the existing claim, otherwise it creates a new claim.

Modifying Central custom resource for external database

Prerequisites

You must have a database in your database instance that supports PostgreSQL 13 and a user with the following permissions:
- Connection rights to the database.
- Usage and Create on the schema.
- Select, Insert, Update, and Delete on all tables in the schema.
- Usage on all sequences in the schema.

Procedure

Create a password secret in the deployed namespace by using the OpenShift Container Platform web console or the terminal.
- On the OpenShift Container Platform web console, go to the Workloads → Secrets page. Create a Key/Value secret with the key password and the value as the path of a plain text file containing the password for the superuser of the provisioned database.
- Or, run the following command in your terminal:
  $ oc create secret generic external-db-password \ (1) --from-file=password=<password.txt> (2)
  1 If you use Kubernetes, enter kubectl instead of oc.
  
  2 Replace password.txt with the path of the file which has the plain text password.
Go to the Red Hat Advanced Cluster Security for Kubernetes operator page in the OpenShift Container Platform web console. Select Central in the top navigation bar and select the instance you want to connect to the database.
Go to the YAML editor view.
For db.passwordSecret.name specify the referenced secret that you created in earlier steps. For example, external-db-password.
For db.connectionString specify the connection string in keyword=value format, for example, host=<host> port=5432 database=stackrox user=stackrox sslmode=verify-ca
For db.persistence delete the entire block.
If necessary, you can specify a Certificate Authority for Central to trust the database certificate by adding a TLS block under the top-level spec, as shown in the following example:
- Update the central custom resource with the following configuration:
  spec: tls: additionalCAs: - name: db-ca content: | <certificate> central: db: isEnabled: Default (1) connectionString: "host=<host> port=5432 user=<user> sslmode=verify-ca" passwordSecret: name: external-db-password
  1 You must not change the value of IsEnabled to Enabled.
Click Save.

Additional resources

Provisioning a database in your PostgreSQL instance

Changing subscription channel

You can change the update channel for the RHACS Operator by using the OpenShift Container Platform web console or by using the command line. For upgrading to RHACS 4.0 from RHACS 3.74, you must change the update channel.

You must change the subscription channel for all clusters where you have installed RHACS Operator, including Central and all Secured clusters.

Prerequisites

You must verify that you are using the latest RHACS 3.74 Operator and there are no pending manual Operator upgrades.
You must verify that you have backed up your existing Central database.
You have access to an OpenShift Container Platform cluster web console using an account with cluster-admin permissions.

Changing the subscription channel by using the web console

Use the following instructions for changing the subscription channel by using the web console:

Procedure

In the Administrator perspective of the OpenShift Container Platform web console, go to Operators → Installed Operators.
Locate the RHACS Operator and click on it.
Click the Subscription tab.
Click the name of the update channel under Update Channel.
Select stable, then click Save.
For subscriptions with an Automatic approval strategy, the update begins automatically. Go back to the Operators → Installed Operators page to monitor the progress of the update. When complete, the status changes to Succeeded and Up to date.

For subscriptions with a Manual approval strategy, you can manually approve the update from the Subscription tab.

Changing the subscription channel by using command line

Use the following instructions for changing the subscription channel by using command line:

Procedure

Run the following command to change the subscription channel to stable:

$ oc -n rhacs-operator \ (1)
  patch subscriptions.operators.coreos.com rhacs-operator \
  --type=merge --patch='{ "spec": { "channel": "stable" }}'

1	If you use Kubernetes, enter `kubectl` instead of `oc`.

During the update the RHACS Operator provisions a new deployment called central-db and your data begins migrating. It takes around 30 minutes and only happens once when you upgrade.

Remove Central-attached PV

Kubernetes and OpenShift Container Platform do not delete persistent volumes (PV) automatically. When you upgrade RHACS from earlier versions, the Central PV called stackrox-db remains mounted. However, in RHACS 4.1, Central does not need the previously attached PV anymore.

The PV has data and persistent files used by earlier RHACS versions. You can use the PV to roll back to an earlier version before RHACS 4.1. Or, if you have a large RocksDB backup bundle for Central, you can use the PV to restore that data.

If you do not plan to roll back or restore from earlier RocksDB backups, you can remove the Central-attached persistent volume claim (PVC) to free up the storage.

After removing PVC, you cannot roll back Central to an earlier version before RHACS 4.1 or restore large RocksDB backups created with RocksDB.

Remove Central-attached PV using the RHACS Operator

Remove the Central-attached persistent volume claim (PVC) stackrox-db to free up storage space.

Procedure

Add the following annotation to Central:

annotations:
  platform.stackrox.io/obsolete-central-pvc: "true"

Verification

Run the following command:

$ oc -n stackrox describe pvc stackrox-db | grep -i 'Used By'
Used By: <none> (1)

1	Wait until you see `Used By: <none>`. It might take a few minutes.

Rolling back an Operator upgrade

To roll back an Operator upgrade, you must perform the steps described in one of the following sections. You can roll back an Operator upgrade by using the CLI or the OpenShift Container Platform web console.

If you are rolling back from RHACS 4.0, you can only rollback to the latest patch release version of RHACS 3.74.

Rolling back an Operator upgrade by using the CLI

You can roll back the Operator version by using CLI commands.

Procedure

Delete the OLM subscription by running the following command:
- For OpenShift Container Platform, run the following command:
  $ oc -n rhacs-operator delete subscription rhacs-operator
- For Kubernetes, run the following command:
  $ kubectl -n rhacs-operator delete subscription rhacs-operator

Delete the cluster service version (CSV) by running the following command:

For OpenShift Container Platform, run the following command:

$ oc -n rhacs-operator delete csv -l operators.coreos.com/rhacs-operator.rhacs-operator

For Kubernetes, run the following command:

$ kubectl -n rhacs-operator delete csv -l operators.coreos.com/rhacs-operator.rhacs-operator

Determine the previous version you want to roll back to by choosing one of the following options:

If the current Central instance is running, query the RHACS API to get the rollback version by running the following command:

$ curl -k -s -u <user>:<password> https://<central hostname>/v1/centralhealth/upgradestatus | jq -r .upgradeStatus.forceRollbackTo

If the current Central instance is not running, perform the following steps:

This procedure can only be used for RHACS release 3.74 and earlier when the rocksdb database is installed.

Ensure the Central deployment is scaled down by running the following command:
- For OpenShift Container Platform, run the following command:
  $ oc scale -n <central namespace> –replicas=0 deploy/central
- For Kubernetes, run the following command:
  $ kubectl scale -n <central namespace> –replicas=0 deploy/central

Save the following pod spec as a YAML file:

apiVersion: v1
kind: Pod
metadata:
  name: get-previous-db-version
spec:
  containers:
  - name: get-previous-db-version
    image: registry.redhat.io/advanced-cluster-security/rhacs-main-rhel8:<rollback version>
    command:
    - sh
    args:
    - '-c'
    - "cat /var/lib/stackrox/.previous/migration_version.yaml | grep '^image:' | cut -f 2 -d : | tr -d ' '"
    volumeMounts:
    - name: stackrox-db
      mountPath: /var/lib/stackrox
  volumes:
  - name: stackrox-db
    persistentVolumeClaim:
      claimName: stackrox-db

Create a pod in your Central namespace by running the following command using the YAML file that you saved:
- For OpenShift Container Platform, run the following command:
  $ oc create -n <central namespace> -f pod.yaml
- For Kubernetes, run the following command:
  $ kubectl create -n <central namespace> -f pod.yaml
After pod creation is complete, get the version by running the following command:
- For OpenShift Container Platform, run the following command:
  $ oc logs -n <central namespace> get-previous-db-version
- For Kubernetes, run the following command:
  $ kubectl logs -n <central namespace> get-previous-db-version

Edit the central-config.yaml ConfigMap to set the maintenance.forceRollBackVersion:<version> parameter by running the following command:

For OpenShift Container Platform, run the following command:

$ oc get configmap -n <central namespace> central-config -o yaml | sed -e "s/forceRollbackVersion: none/forceRollbackVersion: <version>/" | oc -n <central namespace> apply -f -

For Kubernetes, run the following command:

$ kubectl get configmap -n <central namespace> central-config -o yaml | sed -e "s/forceRollbackVersion: none/forceRollbackVersion: <version>/" | kubectl -n <central namespace> apply -f -

Set the image for the Central deployment using the version string shown in Step 3 as the image tag. For example, run the following command:

For OpenShift Container Platform, run the following command:

$ oc set image -n <central namespace> deploy/central central=registry.redhat.io/advanced-cluster-security/rhacs-main-rhel8:<version>

For Kubernetes, run the following command:

$ kubectl set image -n <central namespace> deploy/central central=registry.redhat.io/advanced-cluster-security/rhacs-main-rhel8:<version>

Verification

Ensure that the Central pod starts and has a ready status. If the pod crashes, check the logs to see if the backup was restored. A successful log message appears similar to the following example:
```
Clone to Migrate ".previous", ""
```
Reinstall the Operator on the rolled back channel. For example, 3.74.2 is installed on the rhacs-3.74 channel.

Rolling back an Operator upgrade by using the web console

You can roll back the Operator version by using the OpenShift Container Platform web console.

Prerequisites

You have access to an OpenShift Container Platform cluster web console using an account with cluster-admin permissions.

Procedure

Go to the Operators → Installed Operators page.
Locate the RHACS Operator and click on it.
On the Operator Details page, select Uninstall Operator from the Actions list. Following this action, the Operator stops running and no longer receives updates.
Determine the previous version you want to roll back to by choosing one of the following options:
- If the current Central instance is running, you can query the RHACS API to get the rollback version by running the following command from a terminal window:
  $ curl -k -s -u <user>:<password> https://<central hostname>/v1/centralhealth/upgradestatus | jq -r .upgradeStatus.forceRollbackTo
- You can create a pod and extract the previous version by performing the following steps:
  
  This procedure can only be used for RHACS release 3.74 and earlier when the rocksdb database is installed.
  1. Go to Workloads → Deployments → central.
  2. Under Deployment details, click the down arrow next to the pod count to scale down the pod.
  3. Go to Workloads → Pods → Create Pod and paste the contents of the pod spec as shown in the following example into the editor:
    
    apiVersion: v1 kind: Pod metadata: name: get-previous-db-version spec: containers: - name: get-previous-db-version image: registry.redhat.io/advanced-cluster-security/rhacs-main-rhel8:<rollback version> command: - sh args: - '-c' - "cat /var/lib/stackrox/.previous/migration_version.yaml | grep '^image:' | cut -f 2 -d : | tr -d ' '" volumeMounts: - name: stackrox-db mountPath: /var/lib/stackrox volumes: - name: stackrox-db persistentVolumeClaim: claimName: stackrox-db
  4. Click Create.
  5. After the pod is created, click the Logs tab to get the version string.
Update the rollback configuration by performing the following steps:
1. Go to Workloads → ConfigMaps → central-config and select Edit ConfigMap from the Actions list.
2. Find the forceRollbackVersion line in the value of the central-config.yaml key.
3. Replace none with 3.73.3, and then save the file.
Update Central to the earlier version by performing the following steps:
1. Go to Workloads → Deployments → central and select Edit Deployment from the Actions list.
2. Update the image name, and then save the changes.

Verification

Ensure that the Central pod starts and has a ready status. If the pod crashes, check the logs to see if the backup was restored. A successful log message appears similar to the following example:
```
Clone to Migrate ".previous", ""
```
Reinstall the Operator on the rolled back channel. For example, 3.74.2 is installed on the rhacs-3.74 channel.

Additional resources

Troubleshooting Operator upgrade issues

Follow the instructions in this section to investigate and resolve upgrade-related issues for the RHACS Operator.

Central DB cannot be scheduled

Follow the instructions here to troubleshoot a failing Central DB pod during an upgrade:

Check the status of the central-db pod:
```
$ oc -n <namespace> get pod -l app=central-db (1)
```
1 If you use Kubernetes, enter kubectl instead of oc.
If the status of the pod is Pending, use the describe command to get more details:
```
$ oc -n <namespace> describe po/<central-db-pod-name> (1)
```
1 If you use Kubernetes, enter kubectl instead of oc.

You might see the FailedScheduling warning message:

Type     Reason            Age   From               Message
----     ------            ----  ----               -------
Warning  FailedScheduling  54s   default-scheduler  0/7 nodes are available: 1 Insufficient memory, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 4 Insufficient cpu. preemption: 0/7 nodes are available: 3 Preemption is not helpful for scheduling, 4 No preemption victims found for incoming pod.

This warning message suggests that the scheduled node had insufficient memory to accommodate the pod’s resource requirements. If you have a small environment, consider increasing resources on the nodes or adding a larger node that can support the database.

Otherwise, consider decreasing the resource requirements for the central-db pod in the custom resource under central → db → resources. However, running central with fewer resources than the recommended minimum might lead to degraded performance for RHACS.

Central or Secured cluster fails to deploy

When RHACS Operator:

fails to deploy Central or Secured Cluster.
fails to apply CR changes to actual resources.

You must check the custom resource conditions to find the issue.

For Central, run the following command to check the conditions:
```
$ oc -n rhacs-operator describe centrals.platform.stackrox.io (1)
```
1 If you use Kubernetes, enter kubectl instead of oc.
For Secured clusters, run the following command to check the conditions:
```
$ oc -n rhacs-operator describe securedclusters.platform.stackrox.io (1)
```
1 If you use Kubernetes, enter kubectl instead of oc.

You can identify configuration errors from the conditions output:

Example output

 Conditions:
    Last Transition Time:  2023-04-19T10:49:57Z
    Status:                False
    Type:                  Deployed
    Last Transition Time:  2023-04-19T10:49:57Z
    Status:                True
    Type:                  Initialized
    Last Transition Time:  2023-04-19T10:59:10Z
    Message:               Deployment.apps "central" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: "50": must be less than or equal to cpu limit
    Reason:                ReconcileError
    Status:                True
    Type:                  Irreconcilable
    Last Transition Time:  2023-04-19T10:49:57Z
    Message:               No proxy configuration is desired
    Reason:                NoProxyConfig
    Status:                False
    Type:                  ProxyConfigFailed
    Last Transition Time:  2023-04-19T10:49:57Z
    Message:               Deployment.apps "central" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: "50": must be less than or equal to cpu limit
    Reason:                InstallError
    Status:                True
    Type:                  ReleaseFailed

Additionally, you can view RHACS pod logs to find more information about the issue. Run the following command to view the logs:

oc -n rhacs-operator logs deploy/rhacs-operator-controller-manager manager (1)

1	If you use Kubernetes, enter `kubectl` instead of `oc`.

1	If you use Kubernetes, enter `kubectl` instead of `oc`.
2	Replace `password.txt` with the path of the file which has the plain text password.

Upgrading by using the Operator

Preparing to upgrade

Changing the collection method

Setting the forceCollection parameter

Modifying Central custom resource

Modifying Central custom resource for external database

Changing subscription channel

Changing the subscription channel by using the web console

Changing the subscription channel by using command line

Remove Central-attached PV

Remove Central-attached PV using the RHACS Operator

Rolling back an Operator upgrade

Rolling back an Operator upgrade by using the CLI

Rolling back an Operator upgrade by using the web console

Troubleshooting Operator upgrade issues

Central DB cannot be scheduled

Central or Secured cluster fails to deploy