$ oc -n openshift-migration get pods | grep log
This section describes resources for troubleshooting the Migration Toolkit for Containers (MTC).
This section describes logs and debugging tools that you can use for troubleshooting.
You can view migration plan resources to monitor a running migration or to troubleshoot a failed migration by using the MTC web console and the command line interface (CLI).
In the MTC web console, click Migration Plans.
Click the Migrations number next to a migration plan to view the Migrations page.
The Migrations page displays the migration types associated with the migration plan, for example, Stage, Migration, or Rollback.
Click the Type link to view the Migration details page.
Expand Migration resources to view the migration resources and their status.
To troubleshoot a failed migration, start with a high-level resource that has failed and then work down the resource tree towards the lower-level resources. |
Click the Options menu next to a resource and select one of the following options:
Copy oc describe
command copies the command to your clipboard.
Log in to the relevant cluster and then run the command.
The conditions and events of the resource are displayed in YAML format.
Copy oc logs
command copies the command to your clipboard.
Log in to the relevant cluster and then run the command.
If the resource supports log filtering, a filtered log is displayed.
View JSON displays the resource data in JSON format in a web browser.
The data is the same as the output for the oc get <resource>
command.
You can view an aggregated log for a migration plan. You use the MTC web console to copy a command to your clipboard and then run the command from the command line interface (CLI).
The command displays the filtered logs of the following pods:
Migration Controller
Velero
Restic
Rsync
Stunnel
Registry
In the MTC web console, click Migration Plans.
Click the Migrations number next to a migration plan to view the Migrations page.
The Migrations page displays the migration types associated with the migration plan, for example, Stage or Cutover for warm migration.
Click View logs.
Click the Copy icon to copy the oc logs
command to your clipboard.
Log in to the relevant cluster and enter the command on the CLI.
The aggregated log for the migration plan is displayed.
You can use the migration log reader to display a single filtered view of all the migration logs.
Get the mig-log-reader
pod:
$ oc -n openshift-migration get pods | grep log
Enter the following command to display a single migration log:
$ oc -n openshift-migration logs -f <mig-log-reader-pod> -c color (1)
1 | The -c plain option displays the log without colors. |
You can collect logs, metrics, and information about MTC custom resources by using the must-gather
tool.
The must-gather
data must be attached to all customer cases.
You can collect data for a one-hour or a 24-hour period and view the data with the Prometheus console.
You must be logged in to the OpenShift Container Platform cluster as a user with the cluster-admin
role.
You must have the OpenShift CLI installed.
Navigate to the directory where you want to store the must-gather
data.
Run the oc adm must-gather
command:
To gather data for the past hour:
$ oc adm must-gather --image=registry.redhat.io/rhmtc/openshift-migration-must-gather-rhel8:v1.4
The data is saved as /must-gather/must-gather.tar.gz
. You can upload this file to a support case on the Red Hat Customer Portal.
To gather data for the past 24 hours:
$ oc adm must-gather --image= \
registry.redhat.io/rhmtc/openshift-migration-must-gather-rhel8: \
v1.4 -- /usr/bin/gather_metrics_dump
This operation can take a long time. The data is saved as /must-gather/metrics/prom_data.tar.gz
. You can view this file with the Prometheus console.
Create a local Prometheus instance:
$ make prometheus-run
The command outputs the Prometheus URL:
Started Prometheus on http://localhost:9090
Launch a web browser and navigate to the URL to view the data by using the Prometheus web console.
After you have viewed the data, delete the Prometheus instance and data:
$ make prometheus-cleanup
You can debug the Backup
and Restore
custom resources (CRs) and partial migration failures with the Velero command line interface (CLI). The Velero CLI runs in the velero
pod.
Velero CLI commands use the following syntax:
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero <resource> <command> <resource_id>
You can specify velero-<pod> -n openshift-migration
in place of $(oc get pods -n openshift-migration -o name | grep velero)
.
The Velero help
command lists all the Velero CLI commands:
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero --help
The Velero describe
command provides a summary of warnings and errors associated with a Velero resource:
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero <resource> describe <resource_id>
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero backup describe 0e44ae00-5dc3-11eb-9ca8-df7e5254778b-2d8ql
You can debug a partial migration failure warning message by using the Velero CLI to examine the Restore
custom resource (CR) logs.
A partial failure occurs when Velero encounters an issue that does not cause a migration to fail. For example, if a custom resource definition (CRD) is missing or if there is a discrepancy between CRD versions on the source and target clusters, the migration completes but the CR is not created on the target cluster.
Velero logs the issue as a partial failure and then processes the rest of the objects in the Backup
CR.
Check the status of a MigMigration
CR:
$ oc get migmigration <migmigration> -o yaml
status:
conditions:
- category: Warn
durable: true
lastTransitionTime: "2021-01-26T20:48:40Z"
message: 'Final Restore openshift-migration/ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf: partially failed on destination cluster'
status: "True"
type: VeleroFinalRestorePartiallyFailed
- category: Advisory
durable: true
lastTransitionTime: "2021-01-26T20:48:42Z"
message: The migration has completed with warnings, please look at `Warn` conditions.
reason: Completed
status: "True"
type: SucceededWithWarnings
Check the status of the Restore
CR by using the Velero describe
command:
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -n openshift-migration -- ./velero restore describe <restore>
Phase: PartiallyFailed (run 'velero restore logs ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf' for more information)
Errors:
Velero: <none>
Cluster: <none>
Namespaces:
migration-example: error restoring example.com/migration-example/migration-example: the server could not find the requested resource
Check the Restore
CR logs by using the Velero logs
command:
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -n openshift-migration -- ./velero restore logs <restore>
time="2021-01-26T20:48:37Z" level=info msg="Attempting to restore migration-example: migration-example" logSource="pkg/restore/restore.go:1107" restore=openshift-migration/ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf
time="2021-01-26T20:48:37Z" level=info msg="error restoring migration-example: the server could not find the requested resource" logSource="pkg/restore/restore.go:1170" restore=openshift-migration/ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf
The Restore
CR log error message, the server could not find the requested resource
, indicates the cause of the partially failed migration.
You can check the following Migration Toolkit for Containers (MTC) custom resources (CRs) to troubleshoot a failed migration:
MigCluster (configuration, MTC cluster): Cluster definition
MigStorage (configuration, MTC cluster): Storage definition
MigPlan (configuration, MTC cluster): Migration plan
The MigPlan
CR describes the source and target clusters, replication repository, and namespaces being migrated. It is associated with 0, 1, or many MigMigration
CRs.
Deleting a |
BackupStorageLocation (configuration, MTC cluster): Location of Velero
backup objects
VolumeSnapshotLocation (configuration, MTC cluster): Location of Velero
volume snapshots
MigMigration (action, MTC cluster): Migration, created every time you stage or migrate data. Each MigMigration
CR is associated with a MigPlan
CR.
Backup (action, source cluster): When you run a migration plan, the MigMigration
CR creates two Velero
backup CRs on each source cluster:
Backup CR #1 for Kubernetes objects
Backup CR #2 for PV data
Restore (action, target cluster): When you run a migration plan, the MigMigration
CR creates two Velero
restore CRs on the target cluster:
Restore CR #1 (using Backup CR #2) for PV data
Restore CR #2 (using Backup CR #1) for Kubernetes objects
List the MigMigration
CRs in the openshift-migration
namespace:
$ oc get migmigration -n openshift-migration
NAME AGE
88435fe0-c9f8-11e9-85e6-5d593ce65e10 6m42s
Inspect the MigMigration
CR:
$ oc describe migmigration 88435fe0-c9f8-11e9-85e6-5d593ce65e10 -n openshift-migration
The output is similar to the following examples.
MigMigration
example outputname: 88435fe0-c9f8-11e9-85e6-5d593ce65e10
namespace: openshift-migration
labels: <none>
annotations: touch: 3b48b543-b53e-4e44-9d34-33563f0f8147
apiVersion: migration.openshift.io/v1alpha1
kind: MigMigration
metadata:
creationTimestamp: 2019-08-29T01:01:29Z
generation: 20
resourceVersion: 88179
selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migmigrations/88435fe0-c9f8-11e9-85e6-5d593ce65e10
uid: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6
spec:
migPlanRef:
name: socks-shop-mig-plan
namespace: openshift-migration
quiescePods: true
stage: false
status:
conditions:
category: Advisory
durable: True
lastTransitionTime: 2019-08-29T01:03:40Z
message: The migration has completed successfully.
reason: Completed
status: True
type: Succeeded
phase: Completed
startTimestamp: 2019-08-29T01:01:29Z
events: <none>
Velero
backup CR #2 example output that describes the PV dataapiVersion: velero.io/v1
kind: Backup
metadata:
annotations:
openshift.io/migrate-copy-phase: final
openshift.io/migrate-quiesce-pods: "true"
openshift.io/migration-registry: 172.30.105.179:5000
openshift.io/migration-registry-dir: /socks-shop-mig-plan-registry-44dd3bd5-c9f8-11e9-95ad-0205fe66cbb6
creationTimestamp: "2019-08-29T01:03:15Z"
generateName: 88435fe0-c9f8-11e9-85e6-5d593ce65e10-
generation: 1
labels:
app.kubernetes.io/part-of: migration
migmigration: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6
migration-stage-backup: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6
velero.io/storage-location: myrepo-vpzq9
name: 88435fe0-c9f8-11e9-85e6-5d593ce65e10-59gb7
namespace: openshift-migration
resourceVersion: "87313"
selfLink: /apis/velero.io/v1/namespaces/openshift-migration/backups/88435fe0-c9f8-11e9-85e6-5d593ce65e10-59gb7
uid: c80dbbc0-c9f8-11e9-95ad-0205fe66cbb6
spec:
excludedNamespaces: []
excludedResources: []
hooks:
resources: []
includeClusterResources: null
includedNamespaces:
- sock-shop
includedResources:
- persistentvolumes
- persistentvolumeclaims
- namespaces
- imagestreams
- imagestreamtags
- secrets
- configmaps
- pods
labelSelector:
matchLabels:
migration-included-stage-backup: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6
storageLocation: myrepo-vpzq9
ttl: 720h0m0s
volumeSnapshotLocations:
- myrepo-wv6fx
status:
completionTimestamp: "2019-08-29T01:02:36Z"
errors: 0
expiration: "2019-09-28T01:02:35Z"
phase: Completed
startTimestamp: "2019-08-29T01:02:35Z"
validationErrors: null
version: 1
volumeSnapshotsAttempted: 0
volumeSnapshotsCompleted: 0
warnings: 0
Velero
restore CR #2 example output that describes the Kubernetes resourcesapiVersion: velero.io/v1
kind: Restore
metadata:
annotations:
openshift.io/migrate-copy-phase: final
openshift.io/migrate-quiesce-pods: "true"
openshift.io/migration-registry: 172.30.90.187:5000
openshift.io/migration-registry-dir: /socks-shop-mig-plan-registry-36f54ca7-c925-11e9-825a-06fa9fb68c88
creationTimestamp: "2019-08-28T00:09:49Z"
generateName: e13a1b60-c927-11e9-9555-d129df7f3b96-
generation: 3
labels:
app.kubernetes.io/part-of: migration
migmigration: e18252c9-c927-11e9-825a-06fa9fb68c88
migration-final-restore: e18252c9-c927-11e9-825a-06fa9fb68c88
name: e13a1b60-c927-11e9-9555-d129df7f3b96-gb8nx
namespace: openshift-migration
resourceVersion: "82329"
selfLink: /apis/velero.io/v1/namespaces/openshift-migration/restores/e13a1b60-c927-11e9-9555-d129df7f3b96-gb8nx
uid: 26983ec0-c928-11e9-825a-06fa9fb68c88
spec:
backupName: e13a1b60-c927-11e9-9555-d129df7f3b96-sz24f
excludedNamespaces: null
excludedResources:
- nodes
- events
- events.events.k8s.io
- backups.velero.io
- restores.velero.io
- resticrepositories.velero.io
includedNamespaces: null
includedResources: null
namespaceMapping: null
restorePVs: true
status:
errors: 0
failureReason: ""
phase: Completed
validationErrors: null
warnings: 15
This section describes common issues and concerns that can cause issues during migration.
If direct volume migration does not complete, the target cluster might not have the same node-selector
annotations as the source cluster.
Migration Toolkit for Containers (MTC) migrates namespaces with all annotations in order to preserve security context constraints and scheduling requirements. During direct volume migration, MTC creates Rsync transfer pods on the target cluster in the namespaces that were migrated from the source cluster. If a target cluster namespace does not have the same annotations as the source cluster namespace, the Rsync transfer pods cannot be scheduled. The Rsync pods remain in a Pending
state.
You can identify and fix this issue by performing the following procedure.
Check the status of the MigMigration
CR:
$ oc describe migmigration <pod> -n openshift-migration
The output includes the following status message:
Some or all transfer pods are not running for more than 10 mins on destination cluster
On the source cluster, obtain the details of a migrated namespace:
$ oc get namespace <namespace> -o yaml (1)
1 | Specify the migrated namespace. |
On the target cluster, edit the migrated namespace:
$ oc edit namespace <namespace>
Add the missing openshift.io/node-selector
annotations to the migrated namespace as in the following example:
apiVersion: v1
kind: Namespace
metadata:
annotations:
openshift.io/node-selector: "region=east"
...
Run the migration plan again.
This section describes common error messages you might encounter with the Migration Toolkit for Containers (MTC) and how to resolve their underlying causes.
If a CA certificate error
message is displayed the first time you try to access the MTC console, the likely cause is the use of self-signed CA certificates in one of the clusters.
To resolve this issue, navigate to the oauth-authorization-server
URL displayed in the error message and accept the certificate. To resolve this issue permanently, add the certificate to the trust store of your web browser.
If an Unauthorized
message is displayed after you have accepted the certificate, navigate to the MTC console and refresh the web page.
If a connection has timed out
message is displayed in the MTC console after you have accepted a self-signed certificate, the causes are likely to be the following:
Interrupted network access to the OAuth server
Interrupted network access to the OpenShift Container Platform console
Proxy configuration that blocks access to the oauth-authorization-server
URL. See MTC console inaccessible because of OAuth timeout error for details.
You can determine the cause of the timeout.
Inspect the MTC console web page with a browser web inspector.
Check the Migration UI
pod log for errors.
If you use a self-signed certificate to secure a cluster or a replication repository for the Migration Toolkit for Containers (MTC), certificate verification might fail with the following error message: Certificate signed by unknown authority
.
You can create a custom CA certificate bundle file and upload it in the MTC web console when you add a cluster or a replication repository.
Download a CA certificate from a remote endpoint and save it as a CA bundle file:
$ echo -n | openssl s_client -connect <host_FQDN>:<port> \ (1)
| sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > <ca_bundle.cert> (2)
1 | Specify the host FQDN and port of the endpoint, for example, api.my-cluster.example.com:6443 . |
2 | Specify the name of the CA bundle file. |
If a Velero
Backup
custom resource contains a reference to a backup storage location (BSL) that does not exist, the Velero
pod log might display the following error messages:
$ oc logs <MigrationUI_Pod> -n openshift-migration
You can ignore these error messages. A missing BSL cannot cause a migration to fail.
If a migration fails because Restic times out, the following error is displayed in the Velero
pod log.
level=error msg="Error backing up item" backup=velero/monitoring error="timed out waiting for all PodVolumeBackups to complete" error.file="/go/src/github.com/heptio/velero/pkg/restic/backupper.go:165" error.function="github.com/heptio/velero/pkg/restic.(*backupper).BackupPodVolumes" group=v1
The default value of restic_timeout
is one hour. You can increase this parameter for large migrations, keeping in mind that a higher value may delay the return of error messages.
In the OpenShift Container Platform web console, navigate to Operators → Installed Operators.
Click Migration Toolkit for Containers Operator.
In the MigrationController tab, click migration-controller.
In the YAML tab, update the following parameter value:
spec:
restic_timeout: 1h (1)
1 | Valid units are h (hours), m (minutes), and s (seconds), for example, 3h30m15s . |
Click Save.
If data verification fails when migrating a persistent volume with the file system data copy method, the following error is displayed in the MigMigration
CR.
status:
conditions:
- category: Warn
durable: true
lastTransitionTime: 2020-04-16T20:35:16Z
message: There were verify errors found in 1 Restic volume restores. See restore `<registry-example-migration-rvwcm>`
for details (1)
status: "True"
type: ResticVerifyErrors (2)
1 | The error message identifies the Restore CR name. |
2 | ResticVerifyErrors is a general error warning type that includes verification errors. |
A data verification error does not cause the migration process to fail. |
You can check the Restore
CR to identify the source of the data verification error.
Log in to the target cluster.
View the Restore
CR:
$ oc describe <registry-example-migration-rvwcm> -n openshift-migration
The output identifies the persistent volume with PodVolumeRestore
errors.
status:
phase: Completed
podVolumeRestoreErrors:
- kind: PodVolumeRestore
name: <registry-example-migration-rvwcm-98t49>
namespace: openshift-migration
podVolumeRestoreResticErrors:
- kind: PodVolumeRestore
name: <registry-example-migration-rvwcm-98t49>
namespace: openshift-migration
View the PodVolumeRestore
CR:
$ oc describe <migration-example-rvwcm-98t49>
The output identifies the Restic
pod that logged the errors.
completionTimestamp: 2020-05-01T20:49:12Z
errors: 1
resticErrors: 1
...
resticPod: <restic-nr2v5>
View the Restic
pod log to locate the errors:
$ oc logs -f <restic-nr2v5>
If you are migrating data from NFS storage and root_squash
is enabled, Restic
maps to nfsnobody
and does not have permission to perform the migration. The following error is displayed in the Restic
pod log.
backup=openshift-migration/<backup_id> controller=pod-volume-backup error="fork/exec /usr/bin/restic: permission denied" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/controller/pod_volume_backup_controller.go:280" error.function="github.com/vmware-tanzu/velero/pkg/controller.(*podVolumeBackupController).processBackup" logSource="pkg/controller/pod_volume_backup_controller.go:280" name=<backup_id> namespace=openshift-migration
You can resolve this issue by creating a supplemental group for Restic and adding the group ID to the MigrationController
CR manifest.
Create a supplemental group for Restic on the NFS storage.
Set the setgid
bit on the NFS directories so that group ownership is inherited.
Add the restic_supplemental_groups
parameter to the MigrationController
CR manifest on the source and target clusters:
spec:
restic_supplemental_groups: <group_id> (1)
1 | Specify the supplemental group ID. |
Wait for the Restic
pods to restart so that the changes are applied.
This release has the following known issues:
During migration, the Migration Toolkit for Containers (MTC) preserves the following namespace annotations:
openshift.io/sa.scc.mcs
openshift.io/sa.scc.supplemental-groups
openshift.io/sa.scc.uid-range
These annotations preserve the UID range, ensuring that the containers retain their file system permissions on the target cluster. There is a risk that the migrated UIDs could duplicate UIDs within an existing or future namespace on the target cluster. (BZ#1748440)
Most cluster-scoped resources are not yet handled by MTC. If your applications require cluster-scoped resources, you might have to create them manually on the target cluster.
If a migration fails, the migration plan does not retain custom PV settings for quiesced pods. You must manually roll back the migration, delete the migration plan, and create a new migration plan with your PV settings. (BZ#1784899)
If a large migration fails because Restic times out, you can increase the restic_timeout
parameter value (default: 1h
) in the MigrationController
custom resource (CR) manifest.
If you select the data verification option for PVs that are migrated with the file system copy method, performance is significantly slower.
If you are migrating data from NFS storage and root_squash
is enabled, Restic
maps to nfsnobody
. The migration fails and a permission error is displayed in the Restic
pod log. (BZ#1873641)
You can resolve this issue by adding supplemental groups for Restic
to the MigrationController
CR manifest:
spec:
...
restic_supplemental_groups:
- 5555
- 6666
If you perform direct volume migration with nodes that are in different availability zones, the migration might fail because the migrated pods cannot access the PVC. (BZ#1947487)
You can roll back a migration by using the MTC web console or the CLI.
You can roll back a migration by using the Migration Toolkit for Containers (MTC) web console.
If you roll back a failed direct volume migration, the following resources are preserved in the namespaces specified in the migration plan to help you debug the failed migration:
These resources must be deleted manually. If you later run the same migration plan successfully, the resources from the failed migration are deleted automatically. |
If your application was stopped during a failed migration, you must roll back the migration to prevent data corruption in the persistent volume.
Rollback is not required if the application was not stopped during migration because the original application is still running on the source cluster.
In the MTC web console, click Migration plans.
Click the Options menu beside a migration plan and select Rollback.
Click Rollback and wait for rollback to complete.
In the migration plan details, Rollback succeeded is displayed.
Verify that rollback was successful in the OpenShift Container Platform web console of the source cluster:
Click Home → Projects.
Click the migrated project to view its status.
In the Routes section, click Location to verify that the application is functioning, if applicable.
Click Workloads → Pods to verify that the pods are running in the migrated namespace.
Click Storage → Persistent volumes to verify that the migrated persistent volume is correctly provisioned.
You can roll back a migration by creating a MigMigration
custom resource (CR) from the command line interface.
If you roll back a failed direct volume migration, the following resources are preserved in the namespaces specified in the
These resources must be deleted manually. If you later run the same migration plan successfully, the resources from the failed migration are deleted automatically. |
If your application was stopped during a failed migration, you must roll back the migration to prevent data corruption in the persistent volume.
Rollback is not required if the application was not stopped during migration because the original application is still running on the source cluster.
Create a MigMigration
CR based on the following example:
$ cat << EOF | oc apply -f -
apiVersion: migration.openshift.io/v1alpha1
kind: MigMigration
metadata:
labels:
controller-tools.k8s.io: "1.0"
name: <migmigration>
namespace: openshift-migration
spec:
...
rollback: true
...
migPlanRef:
name: <migplan> (1)
namespace: openshift-migration
EOF
1 | Specify the name of the associated MigPlan CR. |
In the MTC web console, verify that the migrated project resources have been removed from the target cluster.
Verify that the migrated project resources are present in the source cluster and that the application is running.