You can view the migration Custom Resources (CRs) and download logs to troubleshoot a failed migration.
If the application was stopped during the failed migration, you must roll it back manually in order to prevent data corruption.
Manual rollback is not required if the application was not stopped during migration, because the original application is still running on the source cluster. |
The Cluster Application Migration (CAM) tool creates the following Custom Resources (CRs):
MigCluster (configuration, CAM cluster): Cluster definition
MigStorage (configuration, CAM cluster): Storage definition
MigPlan (configuration, CAM cluster): Migration plan
The MigPlan CR describes the source and target clusters, repository, and namespace(s) being migrated. It is associated with 0, 1, or many MigMigration CRs.
Deleting a MigPlan CR deletes the associated MigMigration CRs. |
BackupStorageLocation (configuration, CAM cluster): Location of Velero backup objects
VolumeSnapshotLocation (configuration, CAM cluster): Location of Velero volume snapshots
MigMigration (action, CAM cluster): Migration, created during migration
A MigMigration CR is created every time you stage or migrate data. Each MigMigration CR is associated with a MigPlan CR.
Backup (action, source cluster): When you run a migration plan, the MigMigration CR creates two Velero backup CRs on each source cluster:
Backup CR #1 for Kubernetes objects
Backup CR #2 for PV data
Restore (action, target cluster): When you run a migration plan, the MigMigration CR creates two Velero restore CRs on the target cluster:
Restore CR #1 (using Backup CR #2) for PV data
Restore CR #2 (using Backup CR #1) for Kubernetes objects
Get the CR name:
$ oc get <migration_cr> -n openshift-migration (1)
1 | Specify the migration CR, for example, migmigration . |
The output is similar to the following:
NAME AGE 88435fe0-c9f8-11e9-85e6-5d593ce65e10 6m42s
View the CR:
$ oc describe <migration_cr> <88435fe0-c9f8-11e9-85e6-5d593ce65e10> -n openshift-migration
The output is similar to the following examples.
name: 88435fe0-c9f8-11e9-85e6-5d593ce65e10 namespace: openshift-migration labels: <none> annotations: touch: 3b48b543-b53e-4e44-9d34-33563f0f8147 apiVersion: migration.openshift.io/v1alpha1 kind: MigMigration metadata: creationTimestamp: 2019-08-29T01:01:29Z generation: 20 resourceVersion: 88179 selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migmigrations/88435fe0-c9f8-11e9-85e6-5d593ce65e10 uid: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6 spec: migPlanRef: name: socks-shop-mig-plan namespace: openshift-migration quiescePods: true stage: false status: conditions: category: Advisory durable: True lastTransitionTime: 2019-08-29T01:03:40Z message: The migration has completed successfully. reason: Completed status: True type: Succeeded phase: Completed startTimestamp: 2019-08-29T01:01:29Z events: <none>
apiVersion: velero.io/v1 kind: Backup metadata: annotations: openshift.io/migrate-copy-phase: final openshift.io/migrate-quiesce-pods: "true" openshift.io/migration-registry: 172.30.105.179:5000 openshift.io/migration-registry-dir: /socks-shop-mig-plan-registry-44dd3bd5-c9f8-11e9-95ad-0205fe66cbb6 creationTimestamp: "2019-08-29T01:03:15Z" generateName: 88435fe0-c9f8-11e9-85e6-5d593ce65e10- generation: 1 labels: app.kubernetes.io/part-of: migration migmigration: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6 migration-stage-backup: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6 velero.io/storage-location: myrepo-vpzq9 name: 88435fe0-c9f8-11e9-85e6-5d593ce65e10-59gb7 namespace: openshift-migration resourceVersion: "87313" selfLink: /apis/velero.io/v1/namespaces/openshift-migration/backups/88435fe0-c9f8-11e9-85e6-5d593ce65e10-59gb7 uid: c80dbbc0-c9f8-11e9-95ad-0205fe66cbb6 spec: excludedNamespaces: [] excludedResources: [] hooks: resources: [] includeClusterResources: null includedNamespaces: - sock-shop includedResources: - persistentvolumes - persistentvolumeclaims - namespaces - imagestreams - imagestreamtags - secrets - configmaps - pods labelSelector: matchLabels: migration-included-stage-backup: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6 storageLocation: myrepo-vpzq9 ttl: 720h0m0s volumeSnapshotLocations: - myrepo-wv6fx status: completionTimestamp: "2019-08-29T01:02:36Z" errors: 0 expiration: "2019-09-28T01:02:35Z" phase: Completed startTimestamp: "2019-08-29T01:02:35Z" validationErrors: null version: 1 volumeSnapshotsAttempted: 0 volumeSnapshotsCompleted: 0 warnings: 0
apiVersion: velero.io/v1 kind: Restore metadata: annotations: openshift.io/migrate-copy-phase: final openshift.io/migrate-quiesce-pods: "true" openshift.io/migration-registry: 172.30.90.187:5000 openshift.io/migration-registry-dir: /socks-shop-mig-plan-registry-36f54ca7-c925-11e9-825a-06fa9fb68c88 creationTimestamp: "2019-08-28T00:09:49Z" generateName: e13a1b60-c927-11e9-9555-d129df7f3b96- generation: 3 labels: app.kubernetes.io/part-of: migration migmigration: e18252c9-c927-11e9-825a-06fa9fb68c88 migration-final-restore: e18252c9-c927-11e9-825a-06fa9fb68c88 name: e13a1b60-c927-11e9-9555-d129df7f3b96-gb8nx namespace: openshift-migration resourceVersion: "82329" selfLink: /apis/velero.io/v1/namespaces/openshift-migration/restores/e13a1b60-c927-11e9-9555-d129df7f3b96-gb8nx uid: 26983ec0-c928-11e9-825a-06fa9fb68c88 spec: backupName: e13a1b60-c927-11e9-9555-d129df7f3b96-sz24f excludedNamespaces: null excludedResources: - nodes - events - events.events.k8s.io - backups.velero.io - restores.velero.io - resticrepositories.velero.io includedNamespaces: null includedResources: null namespaceMapping: null restorePVs: true status: errors: 0 failureReason: "" phase: Completed validationErrors: null warnings: 15
You can download the Velero, Restic, and Migration controller logs in the CAM web console to troubleshoot a failed migration.
Log in to the CAM console.
Click Plans to view the list of migration plans.
Click the Options menu of a specific migration plan and select Logs.
Click Download Logs to download the logs of the Migration controller, Velero, and Restic for all clusters.
To download a specific log:
Specify the log options:
Cluster: Select the source, target, or CAM host cluster.
Log source: Select Velero, Restic, or Controller.
Pod source: Select the Pod name, for example, controller-manager-78c469849c-v6wcf
The selected log is displayed.
You can clear the log selection settings by changing your selection.
Click Download Selected to download the selected log.
Optionally, you can access the logs by using the CLI, as in the following example:
$ oc get pods -n openshift-migration | grep controller controller-manager-78c469849c-v6wcf 1/1 Running 0 4h49m $ oc logs controller-manager-78c469849c-v6wcf -f -n openshift-migration
If a migration fails because Restic times out, the following error appears in the Velero Pod log:
level=error msg="Error backing up item" backup=velero/monitoring error="timed out waiting for all PodVolumeBackups to complete" error.file="/go/src/github.com/heptio/velero/pkg/restic/backupper.go:165" error.function="github.com/heptio/velero/pkg/restic.(*backupper).BackupPodVolumes" group=v1
The default value of restic_timeout
is one hour. You can increase this parameter for large migrations, keeping in mind that a higher value may delay the return of error messages.
In the OpenShift Container Platform web console, navigate to Operators → Installed Operators.
Click Cluster Application Migration Operator.
In the MigrationController tab, click migration-controller.
In the YAML tab, update the following parameter value:
spec:
restic_timeout: 1h (1)
1 | Valid units are h (hours), m (minutes), and s (seconds), for example, 3h30m15s . |
Click Save.
ResticVerifyErrors
in the MigMigration Custom ResourceIf data verification fails when migrating a PV with the filesystem data copy method, the following error appears in the MigMigration Custom Resource (CR):
status: conditions: - category: Warn durable: true lastTransitionTime: 2020-04-16T20:35:16Z message: There were verify errors found in 1 Restic volume restores. See restore `<registry-example-migration-rvwcm>` for details (1) status: "True" type: ResticVerifyErrors (2)
1 | The error message identifies the Restore CR name. |
2 | ResticErrors also appears. ResticErrors is a general error warning that includes verification errors. |
A data verification error does not cause the migration process to fail. |
You can check the target cluster’s Restore CR to identify the source of the data verification error.
Log in to the target cluster.
View the Restore CR:
$ oc describe <registry-example-migration-rvwcm> -n openshift-migration
The output identifies the PV with PodVolumeRestore
errors:
status: phase: Completed podVolumeRestoreErrors: - kind: PodVolumeRestore name: <registry-example-migration-rvwcm-98t49> namespace: openshift-migration podVolumeRestoreResticErrors: - kind: PodVolumeRestore name: <registry-example-migration-rvwcm-98t49> namespace: openshift-migration
View the PodVolumeRestore
CR:
$ oc describe <migration-example-rvwcm-98t49>
The output identifies the Restic Pod that logged the errors:
completionTimestamp: 2020-05-01T20:49:12Z errors: 1 resticErrors: 1 ... resticPod: <restic-nr2v5>
View the Restic Pod log:
$ oc logs -f restic-nr2v5
If your application was stopped during a failed migration, you must roll it back manually in order to prevent data corruption in the PV.
This procedure is not required if the application was not stopped during migration, because the original application is still running on the source cluster.
On the target cluster, switch to the migrated project:
$ oc project <project>
Get the deployed resources:
$ oc get all
Delete the deployed resources to ensure that the application is not running on the target cluster and accessing data on the PVC:
$ oc delete <resource_type>
To stop a DaemonSet without deleting it, update the nodeSelector
in the YAML file:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: hello-daemonset
spec:
selector:
matchLabels:
name: hello-daemonset
template:
metadata:
labels:
name: hello-daemonset
spec:
nodeSelector:
role: worker (1)
1 | Specify a nodeSelector value that does not exist on any node. |
Update each PV’s reclaim policy so that unnecessary data is removed. During migration, the reclaim policy for bound PVs is Retain
, to ensure that data is not lost when an application is removed from the source cluster. You can remove these PVs during rollback.
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv0001
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain (1)
...
status:
...
1 | Specify Recycle or Delete . |
On the source cluster, switch to your migrated project:
$ oc project <project_name>
Obtain the project’s deployed resources:
$ oc get all
Start one or more replicas of each deployed resource:
$ oc scale --replicas=1 <resource_type>/<resource_name>
Update the nodeSelector
of a DaemonSet to its original value, if you changed it during the procedure.
If you open a customer support case, you can run the must-gather
tool with the openshift-migration-must-gather-rhel8
image to collect information about your cluster and upload it to the Red Hat Customer Portal.
The openshift-migration-must-gather-rhel8
image collects logs and Custom Resource data that are not collected by the default must-gather
image.
Navigate to the directory where you want to store the must-gather
data.
Run the oc adm must-gather
command:
$ oc adm must-gather --image=registry.redhat.io/rhcam-1-2/openshift-migration-must-gather-rhel8
The must-gather
tool collects the cluster information and stores it in a must-gather.local.<uid>
directory.
Remove authentication keys and other sensitive information from the must-gather
data.
Create an archive file containing the contents of the must-gather.local.<uid>
directory:
$ tar cvaf must-gather.tar.gz must-gather.local.<uid>/
You can attach the compressed file to your customer support case on the Red Hat Customer Portal.
This release has the following known issues:
During migration, the Cluster Application Migration (CAM) tool preserves the following namespace annotations:
openshift.io/sa.scc.mcs
openshift.io/sa.scc.supplemental-groups
openshift.io/sa.scc.uid-range
These annotations preserve the UID range, ensuring that the containers retain their file system permissions on the target cluster. There is a risk that the migrated UIDs could duplicate UIDs within an existing or future namespace on the target cluster. (BZ#1748440)
If an AWS bucket is added to the CAM web console and then deleted, its status remains True
because the MigStorage CR is not updated. (BZ#1738564)
Most cluster-scoped resources are not yet handled by the CAM tool. If your applications require cluster-scoped resources, you may have to create them manually on the target cluster.
If a migration fails, the migration plan does not retain custom PV settings for quiesced pods. You must manually roll back the migration, delete the migration plan, and create a new migration plan with your PV settings. (BZ#1784899)
If a large migration fails because Restic times out, you can increase the restic_timeout
parameter value (default: 1h
) in the Migration controller CR.
If you select the data verification option for PVs that are migrated with the filesystem copy method, performance is significantly slower. Velero generates a checksum for each file and checks it when the file is restored.