You can view the Migration Toolkit for Containers (MTC) custom resources and download logs to troubleshoot a failed migration.

If the application was stopped during the failed migration, you must roll it back manually in order to prevent data corruption.

Manual rollback is not required if the application was not stopped during migration because the original application is still running on the source cluster.

Viewing migration Custom Resources

The Migration Toolkit for Containers (MTC) creates the following custom resources (CRs):

migration architecture diagram

20 MigCluster (configuration, MTC cluster): Cluster definition

20 MigStorage (configuration, MTC cluster): Storage definition

20 MigPlan (configuration, MTC cluster): Migration plan

The MigPlan CR describes the source and target clusters, replication repository, and namespaces being migrated. It is associated with 0, 1, or many MigMigration CRs.

Deleting a MigPlan CR deletes the associated MigMigration CRs.

20 BackupStorageLocation (configuration, MTC cluster): Location of Velero backup objects

20 VolumeSnapshotLocation (configuration, MTC cluster): Location of Velero volume snapshots

20 MigMigration (action, MTC cluster): Migration, created every time you stage or migrate data. Each MigMigration CR is associated with a MigPlan CR.

20 Backup (action, source cluster): When you run a migration plan, the MigMigration CR creates two Velero backup CRs on each source cluster:

  • Backup CR #1 for Kubernetes objects

  • Backup CR #2 for PV data

20 Restore (action, target cluster): When you run a migration plan, the MigMigration CR creates two Velero restore CRs on the target cluster:

  • Restore CR #1 (using Backup CR #2) for PV data

  • Restore CR #2 (using Backup CR #1) for Kubernetes objects

Procedure
  1. List the MigMigration CRs in the openshift-migration namespace:

    $ oc get migmigration -n openshift-migration
    Example output
    NAME                                   AGE
    88435fe0-c9f8-11e9-85e6-5d593ce65e10   6m42s
  2. Inspect the MigMigration CR:

    $ oc describe migmigration 88435fe0-c9f8-11e9-85e6-5d593ce65e10 -n openshift-migration

    The output is similar to the following examples.

MigMigration example output
name:         88435fe0-c9f8-11e9-85e6-5d593ce65e10
namespace:    openshift-migration
labels:       <none>
annotations:  touch: 3b48b543-b53e-4e44-9d34-33563f0f8147
apiVersion:  migration.openshift.io/v1alpha1
kind:         MigMigration
metadata:
  creationTimestamp:  2019-08-29T01:01:29Z
  generation:          20
  resourceVersion:    88179
  selfLink:           /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migmigrations/88435fe0-c9f8-11e9-85e6-5d593ce65e10
  uid:                 8886de4c-c9f8-11e9-95ad-0205fe66cbb6
spec:
  migPlanRef:
    name:        socks-shop-mig-plan
    namespace:   openshift-migration
  quiescePods:  true
  stage:         false
status:
  conditions:
    category:              Advisory
    durable:               True
    lastTransitionTime:  2019-08-29T01:03:40Z
    message:               The migration has completed successfully.
    reason:                Completed
    status:                True
    type:                  Succeeded
  phase:                   Completed
  startTimestamp:         2019-08-29T01:01:29Z
events:                    <none>
Velero backup CR #2 example output that describes the PV data
apiVersion: velero.io/v1
kind: Backup
metadata:
  annotations:
    openshift.io/migrate-copy-phase: final
    openshift.io/migrate-quiesce-pods: "true"
    openshift.io/migration-registry: 172.30.105.179:5000
    openshift.io/migration-registry-dir: /socks-shop-mig-plan-registry-44dd3bd5-c9f8-11e9-95ad-0205fe66cbb6
  creationTimestamp: "2019-08-29T01:03:15Z"
  generateName: 88435fe0-c9f8-11e9-85e6-5d593ce65e10-
  generation: 1
  labels:
    app.kubernetes.io/part-of: migration
    migmigration: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6
    migration-stage-backup: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6
    velero.io/storage-location: myrepo-vpzq9
  name: 88435fe0-c9f8-11e9-85e6-5d593ce65e10-59gb7
  namespace: openshift-migration
  resourceVersion: "87313"
  selfLink: /apis/velero.io/v1/namespaces/openshift-migration/backups/88435fe0-c9f8-11e9-85e6-5d593ce65e10-59gb7
  uid: c80dbbc0-c9f8-11e9-95ad-0205fe66cbb6
spec:
  excludedNamespaces: []
  excludedResources: []
  hooks:
    resources: []
  includeClusterResources: null
  includedNamespaces:
  - sock-shop
  includedResources:
  - persistentvolumes
  - persistentvolumeclaims
  - namespaces
  - imagestreams
  - imagestreamtags
  - secrets
  - configmaps
  - pods
  labelSelector:
    matchLabels:
      migration-included-stage-backup: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6
  storageLocation: myrepo-vpzq9
  ttl: 720h0m0s
  volumeSnapshotLocations:
  - myrepo-wv6fx
status:
  completionTimestamp: "2019-08-29T01:02:36Z"
  errors: 0
  expiration: "2019-09-28T01:02:35Z"
  phase: Completed
  startTimestamp: "2019-08-29T01:02:35Z"
  validationErrors: null
  version: 1
  volumeSnapshotsAttempted: 0
  volumeSnapshotsCompleted: 0
  warnings: 0
Velero restore CR #2 example output that describes the Kubernetes resources
apiVersion: velero.io/v1
kind: Restore
metadata:
  annotations:
    openshift.io/migrate-copy-phase: final
    openshift.io/migrate-quiesce-pods: "true"
    openshift.io/migration-registry: 172.30.90.187:5000
    openshift.io/migration-registry-dir: /socks-shop-mig-plan-registry-36f54ca7-c925-11e9-825a-06fa9fb68c88
  creationTimestamp: "2019-08-28T00:09:49Z"
  generateName: e13a1b60-c927-11e9-9555-d129df7f3b96-
  generation: 3
  labels:
    app.kubernetes.io/part-of: migration
    migmigration: e18252c9-c927-11e9-825a-06fa9fb68c88
    migration-final-restore: e18252c9-c927-11e9-825a-06fa9fb68c88
  name: e13a1b60-c927-11e9-9555-d129df7f3b96-gb8nx
  namespace: openshift-migration
  resourceVersion: "82329"
  selfLink: /apis/velero.io/v1/namespaces/openshift-migration/restores/e13a1b60-c927-11e9-9555-d129df7f3b96-gb8nx
  uid: 26983ec0-c928-11e9-825a-06fa9fb68c88
spec:
  backupName: e13a1b60-c927-11e9-9555-d129df7f3b96-sz24f
  excludedNamespaces: null
  excludedResources:
  - nodes
  - events
  - events.events.k8s.io
  - backups.velero.io
  - restores.velero.io
  - resticrepositories.velero.io
  includedNamespaces: null
  includedResources: null
  namespaceMapping: null
  restorePVs: true
status:
  errors: 0
  failureReason: ""
  phase: Completed
  validationErrors: null
  warnings: 15

Using the migration log reader

You can use the migration log reader to display a single filtered view of all the migration logs.

Procedure
  1. Get the mig-log-reader pod:

    $ oc -n openshift-migration get pods | grep log
  2. Enter the following command to display a single migration log:

    $ oc -n openshift-migration logs -f <mig-log-reader-pod> -c color (1)
    1 The -c plain option displays the log without colors.

Downloading migration logs

You can download the Velero, Restic, and MigrationController pod logs in the Migration Toolkit for Containers (MTC) web console to troubleshoot a failed migration.

Procedure
  1. In the MTC console, click Migration plans to view the list of migration plans.

  2. Click the Options menu kebab of a specific migration plan and select Logs.

  3. Click Download Logs to download the logs of the MigrationController, Velero, and Restic pods for all clusters.

    You can download a single log by selecting the cluster, log source, and pod source, and then clicking Download Selected.

    You can access a pod log from the CLI by using the oc logs command:

    $ oc logs <pod-name> -f -n openshift-migration (1)
    1 Specify the pod name.

Updating deprecated APIs

In OpenShift Container Platform 4.5, some APIs that are used by OpenShift Container Platform 3.x are deprecated.

If your source cluster uses deprecated APIs, the following warning message is displayed when you create a migration plan in the Migration Toolkit for Containers (MTC):

Some namespaces contain GVKs incompatible with destination cluster

You can click See details to view the namespace and the incompatible APIs. This warning message does not block the migration.

During migration with the Migration Toolkit for Containers (MTC), the deprecated APIs are saved in the Velero Backup #1 for Kubernetes objects. You can download the Velero Backup, extract the deprecated API yaml files, and update them with the oc convert command. Then you can create the updated APIs on the target cluster.

Procedure
  1. Run the migration plan.

  2. View the MigPlan CR:

    $ oc describe migplan <migplan_name> -n openshift-migration (1)
    1 Specify the name of the migration plan.

    The output is similar to the following:

    metadata:
      ...
      uid: 79509e05-61d6-11e9-bc55-02ce4781844a (1)
    status:
      ...
      conditions:
      - category: Warn
        lastTransitionTime: 2020-04-30T17:16:23Z
        message: 'Some namespaces contain GVKs incompatible with destination cluster.
          See: `incompatibleNamespaces` for details'
        status: "True"
        type: GVKsIncompatible
      incompatibleNamespaces:
      - gvks: (2)
        - group: batch
          kind: cronjobs
          version: v2alpha1
        - group: batch
          kind: scheduledjobs
          version: v2alpha1
    1 Record the MigPlan UID.
    2 Record the deprecated APIs listed in the gvks section.
  3. Get the MigMigration name associated with the MigPlan UID:

    $ oc get migmigration -o json | jq -r '.items[] | select(.metadata.ownerReferences[].uid=="<migplan_uid>") | .metadata.name' (1)
    1 Specify the MigPlan UID.
  4. Get the MigMigration UID associated with the MigMigration name:

    $ oc get migmigration <migmigration_name> -o jsonpath='{.metadata.uid}' (1)
    1 Specify the MigMigration name.
  5. Get the Velero Backup name associated with the MigMigration UID:

    $ oc get backup.velero.io --selector migration-initial-backup="<migmigration_uid>" -o jsonpath={.items[*].metadata.name} (1)
    1 Specify the MigMigration UID.
  6. Download the contents of the Velero Backup to your local machine by running the command for your storage provider:

    • AWS S3:

      $ aws s3 cp s3://<bucket_name>/velero/backups/<backup_name> <backup_local_dir> --recursive (1)
      1 Specify the bucket, backup name, and your local backup directory name.
    • GCP:

      $ gsutil cp gs://<bucket_name>/velero/backups/<backup_name> <backup_local_dir> --recursive (1)
      1 Specify the bucket, backup name, and your local backup directory name.
    • Azure:

      $ azcopy copy 'https://velerobackups.blob.core.windows.net/velero/backups/<backup_name>' '<backup_local_dir>' --recursive (1)
      1 Specify the backup name and your local backup directory name.
  7. Extract the Velero Backup archive file:

    $ tar -xfv <backup_local_dir>/<backup_name>.tar.gz -C <backup_local_dir>
  8. Run oc convert in offline mode on each deprecated API:

    $ oc convert -f <backup_local_dir>/resources/<gvk>.json
  9. Create the converted API on the target cluster:

    $ oc create -f <gvk>.json

Error messages and resolutions

This section describes common error messages you might encounter with the Migration Toolkit for Containers (MTC) and how to resolve their underlying causes.

Restic timeout error

If a CA certificate error message is displayed the first time you try to access the MTC console, the likely cause is the use of self-signed CA certificates in one of the clusters.

To resolve this issue, navigate to the oauth-authorization-server URL displayed in the error message and accept the certificate. To resolve this issue permanently, add the certificate to the trust store of your web browser.

If an Unauthorized message is displayed after you have accepted the certificate, navigate to the MTC console and refresh the web page.

OAuth timeout error in the MTC console

If a connection has timed out message is displayed in the MTC console after you have accepted a self-signed certificate, the causes are likely to be the following:

You can determine the cause of the timeout.

Procedure
  1. Navigate to the MTC console and inspect the elements with the browser web inspector.

  2. Check the MigrationUI pod log:

    $ oc logs <MigrationUI_Pod> -n openshift-migration

PodVolumeBackups timeout error in Velero pod log

If a migration fails because Restic times out, the following error is displayed in the Velero pod log.

Example output
level=error msg="Error backing up item" backup=velero/monitoring error="timed out waiting for all PodVolumeBackups to complete" error.file="/go/src/github.com/heptio/velero/pkg/restic/backupper.go:165" error.function="github.com/heptio/velero/pkg/restic.(*backupper).BackupPodVolumes" group=v1

The default value of restic_timeout is one hour. You can increase this parameter for large migrations, keeping in mind that a higher value may delay the return of error messages.

Procedure
  1. In the OpenShift Container Platform web console, navigate to OperatorsInstalled Operators.

  2. Click Migration Toolkit for Containers Operator.

  3. In the MigrationController tab, click migration-controller.

  4. In the YAML tab, update the following parameter value:

    spec:
      restic_timeout: 1h (1)
    1 Valid units are h (hours), m (minutes), and s (seconds), for example, 3h30m15s.
  5. Click Save.

ResticVerifyErrors in the MigMigration custom resource

If data verification fails when migrating a persistent volume with the file system data copy method, the following error is displayed in the MigMigration CR.

Example output
status:
  conditions:
  - category: Warn
    durable: true
    lastTransitionTime: 2020-04-16T20:35:16Z
    message: There were verify errors found in 1 Restic volume restores. See restore `<registry-example-migration-rvwcm>`
      for details (1)
    status: "True"
    type: ResticVerifyErrors (2)
1 The error message identifies the Restore CR name.
2 ResticVerifyErrors is a general error warning type that includes verification errors.

A data verification error does not cause the migration process to fail.

You can check the Restore CR to identify the source of the data verification error.

Procedure
  1. Log in to the target cluster.

  2. View the Restore CR:

    $ oc describe <registry-example-migration-rvwcm> -n openshift-migration

    The output identifies the persistent volume with PodVolumeRestore errors.

    Example output
    status:
      phase: Completed
      podVolumeRestoreErrors:
      - kind: PodVolumeRestore
        name: <registry-example-migration-rvwcm-98t49>
        namespace: openshift-migration
      podVolumeRestoreResticErrors:
      - kind: PodVolumeRestore
        name: <registry-example-migration-rvwcm-98t49>
        namespace: openshift-migration
  3. View the PodVolumeRestore CR:

    $ oc describe <migration-example-rvwcm-98t49>

    The output identifies the Restic pod that logged the errors.

    Example output
      completionTimestamp: 2020-05-01T20:49:12Z
      errors: 1
      resticErrors: 1
      ...
      resticPod: <restic-nr2v5>
  4. View the Restic pod log to locate the errors:

    $ oc logs -f <restic-nr2v5>

Manually rolling back a migration

If your application was stopped during a failed migration with the Migration Toolkit for Containers (MTC), you must roll it back manually in order to prevent data corruption in the persistent volume.

This procedure is not required if the application was not stopped during migration, because the original application is still running on the source cluster.

Procedure
  1. On the target cluster, switch to the migrated project:

    $ oc project <project>
  2. Get the deployed resources:

    $ oc get all
  3. Delete the deployed resources to ensure that the application is not running on the target cluster and accessing data on the persistent volume claim:

    $ oc delete <resource_type>
  4. To stop a daemon set without deleting it, update the nodeSelector in the YAML file:

    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: hello-daemonset
    spec:
      selector:
          matchLabels:
            name: hello-daemonset
      template:
        metadata:
          labels:
            name: hello-daemonset
        spec:
          nodeSelector:
            role: worker (1)
    1 Specify a nodeSelector value that does not exist on any node.
  5. Update the reclaim policy for each PV so that unnecessary data is removed. During migration, the reclaim policy for bound PVs is Retain, to ensure that data is not lost when an application is removed from the source cluster. You can remove these PVs during rollback.

    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: pv0001
    spec:
      capacity:
        storage: 5Gi
      accessModes:
        - ReadWriteOnce
      persistentVolumeReclaimPolicy: Retain (1)
      ...
    status:
      ...
    1 Specify Recycle or Delete.
  6. On the source cluster, switch to your migrated project:

    $ oc project <project_name>
  7. Obtain the project’s deployed resources:

    $ oc get all
  8. Start one or more replicas of each deployed resource:

    $ oc scale --replicas=1 <resource_type>/<resource_name>
  9. Update the nodeSelector of the DaemonSet resource to its original value, if you changed it during the procedure.

Using must-gather to collect data

You must run the must-gather tool if you open a customer support case on the Red Hat Customer Portal for the Migration Toolkit for Containers (MTC).

The openshift-migration-must-gather-rhel8 image for MTC collects migration-specific logs and data that are not collected by the default must-gather image.

Procedure
  1. Navigate to the directory where you want to store the must-gather data.

  2. Run the must-gather command:

    $ oc adm must-gather --image=openshift-migration-must-gather-rhel8:v1.3.2
  3. Remove authentication keys and other sensitive information.

  4. Create an archive file containing the contents of the must-gather data directory:

    $ tar cvaf must-gather.tar.gz must-gather.local.<uid>/
  5. Upload the compressed file as an attachment to your customer support case.

Known issues

This release has the following known issues:

  • During migration, the Migration Toolkit for Containers (MTC) preserves the following namespace annotations:

    • openshift.io/sa.scc.mcs

    • openshift.io/sa.scc.supplemental-groups

    • openshift.io/sa.scc.uid-range

      These annotations preserve the UID range, ensuring that the containers retain their file system permissions on the target cluster. There is a risk that the migrated UIDs could duplicate UIDs within an existing or future namespace on the target cluster. (BZ#1748440)

  • If an AWS bucket is added to the MTC web console and then deleted, its status remains True because the MigStorage CR is not updated. (BZ#1738564)

  • Most cluster-scoped resources are not yet handled by MTC. If your applications require cluster-scoped resources, you might have to create them manually on the target cluster.

  • If a migration fails, the migration plan does not retain custom PV settings for quiesced pods. You must manually roll back the migration, delete the migration plan, and create a new migration plan with your PV settings. (BZ#1784899)

  • If a large migration fails because Restic times out, you can increase the restic_timeout parameter value (default: 1h) in the MigrationController CR.

  • If you select the data verification option for PVs that are migrated with the file system copy method, performance is significantly slower.

  • If you are migrating data from NFS storage and root_squash is enabled, Restic maps to nfsnobody. The migration fails and a permission error is displayed in the Restic pod log. You can resolve this issue by creating a supplemental group for Restic. (BZ#1873641)

  • If Velero has an invalid BackupStorageLocation during start-up, it will crash-loop until the invalid BackupStorageLocation is removed. This scenario is triggered by incorrect credentials, a non-existent S3 bucket, and other configuration errors. (BZ#1881707)