must-gatherto collect data
You can view the Migration Toolkit for Containers (MTC) custom resources and download logs to troubleshoot a failed migration.
If the application was stopped during the failed migration, you must roll it back manually in order to prevent data corruption.
Manual rollback is not required if the application was not stopped during migration because the original application is still running on the source cluster.
The Migration Toolkit for Containers (MTC) creates the following custom resources (CRs):
MigCluster (configuration, MTC cluster): Cluster definition
MigStorage (configuration, MTC cluster): Storage definition
MigPlan (configuration, MTC cluster): Migration plan
MigPlan CR describes the source and target clusters, replication repository, and namespaces being migrated. It is associated with 0, 1, or many
BackupStorageLocation (configuration, MTC cluster): Location of
Velero backup objects
VolumeSnapshotLocation (configuration, MTC cluster): Location of
Velero volume snapshots
MigMigration (action, MTC cluster): Migration, created every time you stage or migrate data. Each
MigMigration CR is associated with a
Backup (action, source cluster): When you run a migration plan, the
MigMigration CR creates two
Velero backup CRs on each source cluster:
Backup CR #1 for Kubernetes objects
Backup CR #2 for PV data
Restore (action, target cluster): When you run a migration plan, the
MigMigration CR creates two
Velero restore CRs on the target cluster:
Restore CR #1 (using Backup CR #2) for PV data
Restore CR #2 (using Backup CR #1) for Kubernetes objects
MigMigration CRs in the
$ oc get migmigration -n openshift-migration
NAME AGE 88435fe0-c9f8-11e9-85e6-5d593ce65e10 6m42s
$ oc describe migmigration 88435fe0-c9f8-11e9-85e6-5d593ce65e10 -n openshift-migration
The output is similar to the following examples.
name: 88435fe0-c9f8-11e9-85e6-5d593ce65e10 namespace: openshift-migration labels: <none> annotations: touch: 3b48b543-b53e-4e44-9d34-33563f0f8147 apiVersion: migration.openshift.io/v1alpha1 kind: MigMigration metadata: creationTimestamp: 2019-08-29T01:01:29Z generation: 20 resourceVersion: 88179 selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migmigrations/88435fe0-c9f8-11e9-85e6-5d593ce65e10 uid: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6 spec: migPlanRef: name: socks-shop-mig-plan namespace: openshift-migration quiescePods: true stage: false status: conditions: category: Advisory durable: True lastTransitionTime: 2019-08-29T01:03:40Z message: The migration has completed successfully. reason: Completed status: True type: Succeeded phase: Completed startTimestamp: 2019-08-29T01:01:29Z events: <none>
Velerobackup CR #2 example output that describes the PV data
apiVersion: velero.io/v1 kind: Backup metadata: annotations: openshift.io/migrate-copy-phase: final openshift.io/migrate-quiesce-pods: "true" openshift.io/migration-registry: 172.30.105.179:5000 openshift.io/migration-registry-dir: /socks-shop-mig-plan-registry-44dd3bd5-c9f8-11e9-95ad-0205fe66cbb6 creationTimestamp: "2019-08-29T01:03:15Z" generateName: 88435fe0-c9f8-11e9-85e6-5d593ce65e10- generation: 1 labels: app.kubernetes.io/part-of: migration migmigration: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6 migration-stage-backup: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6 velero.io/storage-location: myrepo-vpzq9 name: 88435fe0-c9f8-11e9-85e6-5d593ce65e10-59gb7 namespace: openshift-migration resourceVersion: "87313" selfLink: /apis/velero.io/v1/namespaces/openshift-migration/backups/88435fe0-c9f8-11e9-85e6-5d593ce65e10-59gb7 uid: c80dbbc0-c9f8-11e9-95ad-0205fe66cbb6 spec: excludedNamespaces:  excludedResources:  hooks: resources:  includeClusterResources: null includedNamespaces: - sock-shop includedResources: - persistentvolumes - persistentvolumeclaims - namespaces - imagestreams - imagestreamtags - secrets - configmaps - pods labelSelector: matchLabels: migration-included-stage-backup: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6 storageLocation: myrepo-vpzq9 ttl: 720h0m0s volumeSnapshotLocations: - myrepo-wv6fx status: completionTimestamp: "2019-08-29T01:02:36Z" errors: 0 expiration: "2019-09-28T01:02:35Z" phase: Completed startTimestamp: "2019-08-29T01:02:35Z" validationErrors: null version: 1 volumeSnapshotsAttempted: 0 volumeSnapshotsCompleted: 0 warnings: 0
Velerorestore CR #2 example output that describes the Kubernetes resources
apiVersion: velero.io/v1 kind: Restore metadata: annotations: openshift.io/migrate-copy-phase: final openshift.io/migrate-quiesce-pods: "true" openshift.io/migration-registry: 172.30.90.187:5000 openshift.io/migration-registry-dir: /socks-shop-mig-plan-registry-36f54ca7-c925-11e9-825a-06fa9fb68c88 creationTimestamp: "2019-08-28T00:09:49Z" generateName: e13a1b60-c927-11e9-9555-d129df7f3b96- generation: 3 labels: app.kubernetes.io/part-of: migration migmigration: e18252c9-c927-11e9-825a-06fa9fb68c88 migration-final-restore: e18252c9-c927-11e9-825a-06fa9fb68c88 name: e13a1b60-c927-11e9-9555-d129df7f3b96-gb8nx namespace: openshift-migration resourceVersion: "82329" selfLink: /apis/velero.io/v1/namespaces/openshift-migration/restores/e13a1b60-c927-11e9-9555-d129df7f3b96-gb8nx uid: 26983ec0-c928-11e9-825a-06fa9fb68c88 spec: backupName: e13a1b60-c927-11e9-9555-d129df7f3b96-sz24f excludedNamespaces: null excludedResources: - nodes - events - events.events.k8s.io - backups.velero.io - restores.velero.io - resticrepositories.velero.io includedNamespaces: null includedResources: null namespaceMapping: null restorePVs: true status: errors: 0 failureReason: "" phase: Completed validationErrors: null warnings: 15
You can use the migration log reader to display a single filtered view of all the migration logs.
$ oc -n openshift-migration get pods | grep log
Enter the following command to display a single migration log:
$ oc -n openshift-migration logs -f <mig-log-reader-pod> -c color (1)
You can download the
MigrationController pod logs in the Migration Toolkit for Containers (MTC) web console to troubleshoot a failed migration.
In the MTC console, click Migration plans to view the list of migration plans.
Click the Options menu of a specific migration plan and select Logs.
Click Download Logs to download the logs of the
Restic pods for all clusters.
You can download a single log by selecting the cluster, log source, and pod source, and then clicking Download Selected.
You can access a pod log from the CLI by using the
oc logs command:
$ oc logs <pod-name> -f -n openshift-migration (1)
|1||Specify the pod name.|
This section describes common error messages you might encounter with the Migration Toolkit for Containers (MTC) and how to resolve their underlying causes.
CA certificate error message is displayed the first time you try to access the MTC console, the likely cause is the use of self-signed CA certificates in one of the clusters.
To resolve this issue, navigate to the
oauth-authorization-server URL displayed in the error message and accept the certificate. To resolve this issue permanently, add the certificate to the trust store of your web browser.
Unauthorized message is displayed after you have accepted the certificate, navigate to the MTC console and refresh the web page.
connection has timed out message is displayed in the MTC console after you have accepted a self-signed certificate, the causes are likely to be the following:
Interrupted network access to the OAuth server
Interrupted network access to the OpenShift Container Platform console
Proxy configuration that blocks access to the
oauth-authorization-server URL. See MTC console inaccessible because of OAuth timeout error for details.
You can determine the cause of the timeout.
Navigate to the MTC console and inspect the elements with the browser web inspector.
MigrationUI pod log:
$ oc logs <MigrationUI_Pod> -n openshift-migration
If a migration fails because Restic times out, the following error is displayed in the
Velero pod log.
level=error msg="Error backing up item" backup=velero/monitoring error="timed out waiting for all PodVolumeBackups to complete" error.file="/go/src/github.com/heptio/velero/pkg/restic/backupper.go:165" error.function="github.com/heptio/velero/pkg/restic.(*backupper).BackupPodVolumes" group=v1
The default value of
restic_timeout is one hour. You can increase this parameter for large migrations, keeping in mind that a higher value may delay the return of error messages.
In the OpenShift Container Platform web console, navigate to Operators → Installed Operators.
Click Migration Toolkit for Containers Operator.
In the MigrationController tab, click migration-controller.
In the YAML tab, update the following parameter value:
spec: restic_timeout: 1h (1)
|1||Valid units are
If data verification fails when migrating a persistent volume with the file system data copy method, the following error is displayed in the
status: conditions: - category: Warn durable: true lastTransitionTime: 2020-04-16T20:35:16Z message: There were verify errors found in 1 Restic volume restores. See restore `<registry-example-migration-rvwcm>` for details (1) status: "True" type: ResticVerifyErrors (2)
|1||The error message identifies the
A data verification error does not cause the migration process to fail.
You can check the
Restore CR to identify the source of the data verification error.
Log in to the target cluster.
$ oc describe <registry-example-migration-rvwcm> -n openshift-migration
The output identifies the persistent volume with
status: phase: Completed podVolumeRestoreErrors: - kind: PodVolumeRestore name: <registry-example-migration-rvwcm-98t49> namespace: openshift-migration podVolumeRestoreResticErrors: - kind: PodVolumeRestore name: <registry-example-migration-rvwcm-98t49> namespace: openshift-migration
$ oc describe <migration-example-rvwcm-98t49>
The output identifies the
Restic pod that logged the errors.
completionTimestamp: 2020-05-01T20:49:12Z errors: 1 resticErrors: 1 ... resticPod: <restic-nr2v5>
Restic pod log to locate the errors:
$ oc logs -f <restic-nr2v5>
If your application was stopped during a failed migration with the Migration Toolkit for Containers (MTC), you must roll it back manually in order to prevent data corruption in the persistent volume.
This procedure is not required if the application was not stopped during migration, because the original application is still running on the source cluster.
On the target cluster, switch to the migrated project:
$ oc project <project>
Get the deployed resources:
$ oc get all
Delete the deployed resources to ensure that the application is not running on the target cluster and accessing data on the persistent volume claim:
$ oc delete <resource_type>
To stop a daemon set without deleting it, update the
nodeSelector in the YAML file:
apiVersion: apps/v1 kind: DaemonSet metadata: name: hello-daemonset spec: selector: matchLabels: name: hello-daemonset template: metadata: labels: name: hello-daemonset spec: nodeSelector: role: worker (1)
Update the reclaim policy for each PV so that unnecessary data is removed. During migration, the reclaim policy for bound PVs is
Retain, to ensure that data is not lost when an application is removed from the source cluster. You can remove these PVs during rollback.
apiVersion: v1 kind: PersistentVolume metadata: name: pv0001 spec: capacity: storage: 5Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Retain (1) ... status: ...
On the source cluster, switch to your migrated project:
$ oc project <project_name>
Obtain the project’s deployed resources:
$ oc get all
Start one or more replicas of each deployed resource:
$ oc scale --replicas=1 <resource_type>/<resource_name>
nodeSelector of the
DaemonSet resource to its original value, if you changed it during the procedure.
You must run the
must-gather tool if you open a customer support case on the Red Hat Customer Portal for the Migration Toolkit for Containers (MTC).
openshift-migration-must-gather-rhel8 image for MTC collects migration-specific logs and data that are not collected by the default
Navigate to the directory where you want to store the
$ oc adm must-gather --image=openshift-migration-must-gather-rhel8:v1.3.2
Remove authentication keys and other sensitive information.
Create an archive file containing the contents of the
must-gather data directory:
$ tar cvaf must-gather.tar.gz must-gather.local.<uid>/
Upload the compressed file as an attachment to your customer support case.
This release has the following known issues:
During migration, the Migration Toolkit for Containers (MTC) preserves the following namespace annotations:
These annotations preserve the UID range, ensuring that the containers retain their file system permissions on the target cluster. There is a risk that the migrated UIDs could duplicate UIDs within an existing or future namespace on the target cluster. (BZ#1748440)
If an AWS bucket is added to the MTC web console and then deleted, its status remains
True because the
MigStorage CR is not updated. (BZ#1738564)
Most cluster-scoped resources are not yet handled by MTC. If your applications require cluster-scoped resources, you might have to create them manually on the target cluster.
If a migration fails, the migration plan does not retain custom PV settings for quiesced pods. You must manually roll back the migration, delete the migration plan, and create a new migration plan with your PV settings. (BZ#1784899)
If a large migration fails because Restic times out, you can increase the
restic_timeout parameter value (default:
1h) in the
If you select the data verification option for PVs that are migrated with the file system copy method, performance is significantly slower.
If you are migrating data from NFS storage and
root_squash is enabled,
Restic maps to
nfsnobody. The migration fails and a permission error is displayed in the
Restic pod log. You can resolve this issue by creating a supplemental group for Restic. (BZ#1873641)
If Velero has an invalid
BackupStorageLocation during start-up, it will crash-loop until the invalid
BackupStorageLocation is removed. This scenario is triggered by incorrect credentials, a non-existent S3 bucket, and other configuration errors. (BZ#1881707)