$ oc --kubeconfig <management_cluster_kubeconfig_file> \
patch hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
--type json -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "true"}]'
You can use the OpenShift API for Data Protection (OADP) Operator to perform disaster recovery on Amazon Web Services (AWS) and bare metal.
The disaster recovery process with OpenShift API for Data Protection (OADP) involves the following steps:
Preparing your platform, such as Amazon Web Services or bare metal, to use OADP
Backing up the data plane workload
Backing up the control plane workload
Restoring a hosted cluster by using OADP
You must meet the following prerequisites on the management cluster:
You created a storage class.
You have access to the cluster with cluster-admin
privileges.
You have access to the OADP subscription through a catalog source.
You have access to a cloud storage provider that is compatible with OADP, such as S3, Microsoft Azure, Google Cloud Platform, or MinIO.
In a disconnected environment, you have access to a self-hosted storage provider, for example Red Hat OpenShift Data Foundation or MinIO, that is compatible with OADP.
Your hosted control planes pods are up and running.
To perform disaster recovery for a hosted cluster, you can use OpenShift API for Data Protection (OADP) on Amazon Web Services (AWS) S3 compatible storage. After creating the DataProtectionApplication
object, new velero
deployment and node-agent
pods are created in the openshift-adp
namespace.
To prepare AWS to use OADP, see "Configuring the OpenShift API for Data Protection with Multicloud Object Gateway".
Backing up the data plane workload
Backing up the control plane workload
To perform disaster recovery for a hosted cluster, you can use OpenShift API for Data Protection (OADP) on bare metal. After creating the DataProtectionApplication
object, new velero
deployment and node-agent
pods are created in the openshift-adp
namespace.
To prepare bare metal to use OADP, see "Configuring the OpenShift API for Data Protection with AWS S3 compatible storage".
Backing up the data plane workload
Backing up the control plane workload
If the data plane workload is not important, you can skip this procedure. To back up the data plane workload by using the OADP Operator, see "Backing up applications".
Restoring a hosted cluster by using OADP
You can back up the control plane workload by creating the Backup
custom resource (CR).
To monitor and observe the backup process, see "Observing the backup and restore process".
Pause the reconciliation of the HostedCluster
resource by running the following command:
$ oc --kubeconfig <management_cluster_kubeconfig_file> \
patch hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
--type json -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "true"}]'
Pause the reconciliation of the NodePool
resource by running the following command:
$ oc --kubeconfig <management_cluster_kubeconfig_file> \
patch nodepool -n <hosted_cluster_namespace> <node_pool_name> \
--type json -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "true"}]'
Pause the reconciliation of the AgentCluster
resource by running the following command:
$ oc --kubeconfig <management_cluster_kubeconfig_file> \
annotate agentcluster -n <hosted_control_plane_namespace> \
cluster.x-k8s.io/paused=true --all'
Pause the reconciliation of the AgentMachine
resource by running the following command:
$ oc --kubeconfig <management_cluster_kubeconfig_file> \
annotate agentmachine -n <hosted_control_plane_namespace> \
cluster.x-k8s.io/paused=true --all'
Annotate the HostedCluster
resource to prevent the deletion of the hosted control plane namespace by running the following command:
$ oc --kubeconfig <management_cluster_kubeconfig_file> \
annotate hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
hypershift.openshift.io/skip-delete-hosted-controlplane-namespace=true
Create a YAML file that defines the Backup
CR:
backup-control-plane.yaml
fileapiVersion: velero.io/v1
kind: Backup
metadata:
name: <backup_resource_name> (1)
namespace: openshift-adp
labels:
velero.io/storage-location: default
spec:
hooks: {}
includedNamespaces: (2)
- <hosted_cluster_namespace> (3)
- <hosted_control_plane_namespace> (4)
includedResources:
- sa
- role
- rolebinding
- pod
- pvc
- pv
- bmh
- configmap
- infraenv (5)
- priorityclasses
- pdb
- agents
- hostedcluster
- nodepool
- secrets
- services
- deployments
- hostedcontrolplane
- cluster
- agentcluster
- agentmachinetemplate
- agentmachine
- machinedeployment
- machineset
- machine
excludedResources: []
storageLocation: default
ttl: 2h0m0s
snapshotMoveData: true (6)
datamover: "velero" (6)
defaultVolumesToFsBackup: true (7)
1 | Replace backup_resource_name with the name of your Backup resource. |
2 | Selects specific namespaces to back up objects from them. You must include your hosted cluster namespace and the hosted control plane namespace. |
3 | Replace <hosted_cluster_namespace> with the name of the hosted cluster namespace, for example, clusters . |
4 | Replace <hosted_control_plane_namespace> with the name of the hosted control plane namespace, for example, clusters-hosted . |
5 | You must create the infraenv resource in a separate namespace. Do not delete the infraenv resource during the backup process. |
6 | Enables the CSI volume snapshots and uploads the control plane workload automatically to the cloud storage. |
7 | Sets the fs-backup backing up method for persistent volumes (PVs) as default. This setting is useful when you use a combination of Container Storage Interface (CSI) volume snapshots and the fs-backup method. |
If you want to use CSI volume snapshots, you must add the |
Apply the Backup
CR by running the following command:
$ oc apply -f backup-control-plane.yaml
Verify if the value of the status.phase
is Completed
by running the following command:
$ oc get backup <backup_resource_name> -n openshift-adp -o jsonpath='{.status.phase}'
Restoring a hosted cluster by using OADP
You can restore the hosted cluster by creating the Restore
custom resource (CR).
If you are using an in-place update, InfraEnv does not need spare nodes. You need to re-provision the worker nodes from the new management cluster.
If you are using a replace update, you need some spare nodes for InfraEnv to deploy the worker nodes.
After you back up your hosted cluster, you must destroy it to initiate the restoring process. To initiate node provisioning, you must back up workloads in the data plane before deleting the hosted cluster. |
You completed the steps in Removing a cluster by using the console to delete your hosted cluster.
You completed the steps in Removing remaining resources after removing a cluster.
To monitor and observe the backup process, see "Observing the backup and restore process".
Verify that no pods and persistent volume claims (PVCs) are present in the hosted control plane namespace by running the following command:
$ oc get pod pvc -n <hosted_control_plane_namespace>
No resources found
Create a YAML file that defines the Restore
CR:
restore-hosted-cluster.yaml
fileapiVersion: velero.io/v1
kind: Restore
metadata:
name: <restore_resource_name> (1)
namespace: openshift-adp
spec:
backupName: <backup_resource_name> (2)
restorePVs: true (3)
existingResourcePolicy: update (4)
excludedResources:
- nodes
- events
- events.events.k8s.io
- backups.velero.io
- restores.velero.io
- resticrepositories.velero.io
1 | Replace <restore_resource_name> with the name of your Restore resource. |
2 | Replace <backup_resource_name> with the name of your Backup resource. |
3 | Initiates the recovery of persistent volumes (PVs) and its pods. |
4 | Ensures that the existing objects are overwritten with the backed up content. |
You must create the |
Apply the Restore
CR by running the following command:
$ oc apply -f restore-hosted-cluster.yaml
Verify if the value of the status.phase
is Completed
by running the following command:
$ oc get hostedcluster <hosted_cluster_name> -n <hosted_cluster_namespace> -o jsonpath='{.status.phase}'
After the restore process is complete, start the reconciliation of the HostedCluster
and NodePool
resources that you paused during backing up of the control plane workload:
Start the reconciliation of the HostedCluster
resource by running the following command:
$ oc --kubeconfig <management_cluster_kubeconfig_file> \
patch hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
--type json -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "false"}]'
Start the reconciliation of the NodePool
resource by running the following command:
$ oc --kubeconfig <management_cluster_kubeconfig_file> \
patch nodepool -n <hosted_cluster_namespace> <node_pool_name> \
--type json -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "false"}]'
Start the reconciliation of the Agent provider resources that you paused during backing up of the control plane workload:
Start the reconciliation of the AgentCluster
resource by running the following command:
$ oc --kubeconfig <management_cluster_kubeconfig_file> \
annotate agentcluster -n <hosted_control_plane_namespace> \
cluster.x-k8s.io/paused- --overwrite=true --all
Start the reconciliation of the AgentMachine
resource by running the following command:
$ oc --kubeconfig <management_cluster_kubeconfig_file> \
annotate agentmachine -n <hosted_control_plane_namespace> \
cluster.x-k8s.io/paused- --overwrite=true --all
Remove the hypershift.openshift.io/skip-delete-hosted-controlplane-namespace-
annotation in the HostedCluster
resource to avoid manually deleting the hosted control plane namespace by running the following command:
$ oc --kubeconfig <management_cluster_kubeconfig_file> \
annotate hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
hypershift.openshift.io/skip-delete-hosted-controlplane-namespace- \
--overwrite=true --all
Scale the NodePool
resource to the desired number of replicas by running the following command:
$ oc --kubeconfig <management_cluster_kubeconfig_file> \
scale nodepool -n <hosted_cluster_namespace> <node_pool_name> \
--replicas <replica_count> (1)
1 | Replace <replica_count> by an integer value, for example, 3 . |
When using OpenShift API for Data Protection (OADP) to backup and restore a hosted cluster, you can monitor and observe the process.
Observe the backup process by running the following command:
$ watch "oc get backup -n openshift-adp <backup_resource_name> -o jsonpath='{.status}'"
Observe the restore process by running the following command:
$ watch "oc get restore -n openshift-adp <backup_resource_name> -o jsonpath='{.status}'"
Observe the Velero logs by running the following command:
$ oc logs -n openshift-adp -ldeploy=velero -f
Observe the progress of all of the OADP objects by running the following command:
$ watch "echo BackupRepositories:;echo;oc get backuprepositories.velero.io -A;echo; echo BackupStorageLocations: ;echo; oc get backupstoragelocations.velero.io -A;echo;echo DataUploads: ;echo;oc get datauploads.velero.io -A;echo;echo DataDownloads: ;echo;oc get datadownloads.velero.io -n openshift-adp; echo;echo VolumeSnapshotLocations: ;echo;oc get volumesnapshotlocations.velero.io -A;echo;echo Backups:;echo;oc get backup -A; echo;echo Restores:;echo;oc get restore -A"
When using OpenShift API for Data Protection, you can get more details of the Backup
and Restore
resources by using the velero
command-line interface (CLI).
Create an alias to use the velero
CLI from a container by running the following command:
$ alias velero='oc -n openshift-adp exec deployment/velero -c velero -it -- ./velero'
Get details of your Restore
custom resource (CR) by running the following command:
$ velero restore describe <restore_resource_name> --details (1)
1 | Replace <restore_resource_name> with the name of your Restore resource. |
Get details of your Backup
CR by running the following command:
$ velero restore describe <backup_resource_name> --details (1)
1 | Replace <backup_resource_name> with the name of your Backup resource. |