Overview

In OpenShift Container Platform, you can back up (saving state to separate storage) and restore (recreating state from separate storage) at the cluster level. There is also some preliminary support for per-project backup. The full state of a cluster installation includes:

  • etcd data on each master

  • API objects

  • registry storage

  • volume storage

This topic does not cover how to back up and restore persistent storage, as those topics are left to the underlying storage provider. However, an example of how to perform a generic backup of application data is provided.

This topic only provides a generic way of backing up applications and the OpenShift Container Platform cluster. It can not take into account custom requirements. Therefore, you should create a full backup and restore procedure. To prevent data loss, necessary precautions should be taken.

Prerequisites

  1. Because the restore procedure involves a complete reinstallation, save all the files used in the initial installation. This may include:

  2. Backup the procedures for post-installation steps. Some installations may involve steps that are not included in the installer. This may include changes to the services outside of the control of OpenShift Container Platform or the installation of extra services like monitoring agents. Additional configuration that is not supported yet by the advanced installer might also be affected, for example when using multiple authentication providers.

  3. Install packages that provide various utility commands:

    # yum install etcd
  4. If using a container-based installation, pull the etcd image instead:

    # docker pull rhel7/etcd

Note the location of the etcd data directory (or $ETCD_DATA_DIR in the following sections), which depends on how etcd is deployed.

Deployment Type Data Directory

all-in-one cluster

/var/lib/openshift/openshift.local.etcd

external etcd (not on master)

/var/lib/etcd

embedded etcd (on master)

/var/lib/origin/etcd

Cluster Backup

  1. Save all the certificates and keys, on each master:

    # cd /etc/origin/master
    # tar cf /tmp/certs-and-keys-$(hostname).tar \
        master.proxy-client.crt \
        master.proxy-client.key \
        proxyca.crt \
        proxyca.key \
        master.server.crt \
        master.server.key \
        ca.crt \
        ca.key \
        master.etcd-client.crt \
        master.etcd-client.key \
        master.etcd-ca.crt
  2. If etcd is running on more than one host, stop it on each host:

    # sudo systemctl stop etcd

    Although this step is not strictly necessary, doing so ensures that the etcd data is fully synchronized.

  3. Create an etcd backup:

    # etcdctl backup \
        --data-dir $ETCD_DATA_DIR \
        --backup-dir $ETCD_DATA_DIR.bak

    If etcd is running on more than one host, the various instances regularly synchronize their data, so creating a backup for one of them is sufficient.

    For a container-based installation, you must use docker exec to run etcdctl inside the container.

Cluster Restore

This restore operation only works for single-member etcd clusters. For multiple-member etcd clusters, see Restoring etcd.

To restore the cluster:

  1. Reinstall OpenShift Container Platform.

    This should be done in the same way that OpenShift Container Platform was previously installed.

  2. Run all necessary post-installation steps.

  3. Restore the certificates and keys, on each master:

    # cd /etc/origin/master
    # tar xvf /tmp/certs-and-keys-$(hostname).tar
  4. Restore from the etcd backup:

    # mv $ETCD_DATA_DIR $ETCD_DATA_DIR.orig
    # cp -Rp $ETCD_DATA_DIR.bak $ETCD_DATA_DIR
    # chcon -R --reference $ETCD_DATA_DIR.orig $ETCD_DATA_DIR
    # chown -R etcd:etcd $ETCD_DATA_DIR

Adding New etcd Hosts

In cases where etcd hosts have failed, but you have at least one host still running, you can use the one surviving host to recover etcd hosts without downtime.

Suggested Cluster Size

Having a cluster with an odd number of etcd hosts can account for fault tolerance. Having an odd number of etcd hosts does not change the number needed for majority, but increases the tolerance for failure. For example, a cluster size of seven hosts has a majority of four, leaving a failure tolerance of three. This ensures that four hosts will be guaranteed to operate.

Having an in-production cluster of seven etcd hosts is recommended.

The following presumes you have a backup of the /etc/etcd configuration for the etcd hosts.

  1. Add the desired number of hosts to the cluster. The rest of this procedure presumes you have added just one host, but if adding multiple, perform all steps on each host.

  2. Upgrade etcd on the surviving node:

    # yum install etcd iptables-services

    Ensure version etcd-2.3.7-4.el7.x86_64 or greater is installed, and that the same version is installed on each host.

  3. On the new host, add the appropriate iptables rules:

    # systemctl enable iptables.service --now
    # iptables -N OS_FIREWALL_ALLOW
    # iptables -t filter -I INPUT -j OS_FIREWALL_ALLOW
    # iptables -A OS_FIREWALL_ALLOW -p tcp -m state \
      --state NEW -m tcp --dport 2379 -j ACCEPT
    # iptables -A OS_FIREWALL_ALLOW -p tcp -m state \
      --state NEW -m tcp --dport 2380 -j ACCEPT
    # iptables-save
  4. Generate the required certificates for the new host. On a surviving etcd host:

    1. Create a copy of the /etc/etcd/ca/ directory.

    2. Set the variables and working directory for the certificates, ensuring to create the PREFIX directory if one has not been created:

      # cd /etc/etcd
      # export NEW_ETCD="<NEW_HOST_NAME>"
      
      # export CN=$NEW_ETCD
      # export SAN="IP:<NEW_HOST_IP>"
      # export PREFIX="./generated_certs/etcd-$CN/"
    3. Create the server.csr and server.crt certificates:

      # openssl req -new -keyout ${PREFIX}server.key \
        -config ca/openssl.cnf \
        -out ${PREFIX}server.csr \
        -reqexts etcd_v3_req -batch -nodes \
        -subj /CN=$CN
      
      # openssl ca -name etcd_ca -config ca/openssl.cnf \
        -out ${PREFIX}server.crt \
        -in ${PREFIX}server.csr \
        -extensions etcd_v3_ca_server -batch
    4. Create the peer.csr and peer.crt certificates:

      # openssl req -new -keyout ${PREFIX}peer.key \
        -config ca/openssl.cnf \
        -out ${PREFIX}peer.csr \
        -reqexts etcd_v3_req -batch -nodes \
        -subj /CN=$CN
      
      # openssl ca -name etcd_ca -config ca/openssl.cnf \
        -out ${PREFIX}peer.crt \
        -in ${PREFIX}peer.csr \
        -extensions etcd_v3_ca_peer -batch
    5. Copy the ca.crt and etcd.conf files, and archive the contents of the directory:

      # cp ca.crt ${PREFIX}
      # cp etcd.conf ${PREFIX}
      # tar -czvf ${PREFIX}${CN}.tgz -C ${PREFIX} .
    6. Transfer the files to the new etcd hosts:

      # scp ${PREFIX}${CN}.tgz  $CN:/etc/etcd/
  5. While still on the surviving etcd host, add the new host to the cluster, take the copy of etcd, and transfer it to the new host:

    1. Add the new host to the cluster:

      # export ETCD_CA_HOST="<SURVIVING_ETCD_HOSTNAME>"
      # export NEW_ETCD="<NEW_ETCD_HOSTNAME>"
      # export NEW_ETCD_IP="<NEW_HOST_IP>"
      
      # etcdctl -C https://${ETCD_CA_HOST}:2379 \
        --ca-file=/etc/etcd/ca.crt     \
        --cert-file=/etc/etcd/peer.crt     \
        --key-file=/etc/etcd/peer.key member add ${NEW_ETCD} https://${NEW_ETCD_IP}:2380
      
      ETCD_NAME="<NEW_ETCD_HOSTNAME>"
      ETCD_INITIAL_CLUSTER="<NEW_ETCD_HOSTNAME>=https://<NEW_HOST_IP>:2380,<SURVIVING_ETCD_HOST>=https:/<SURVIVING_HOST_IP>:2380
      ETCD_INITIAL_CLUSTER_STATE="existing"
    2. Create a backup of the surviving etcd host, and transfer the contents to the new host:

      Skip this step if version is lower than etcd-2.3.7-4 or if etcd database size is smaller than 700 MB.

      If the etcd backup is larger than 700 MB, prune the resource, or clear transistor to transistor logic (TTL) data, such as events. If the backup is still larger than 700 MB, stop the other hosts before performing this step.

      # export NODE_ID="<NEW_NODE_ID>"
      # etcdctl backup --keep-cluster-id --node-id ${NODE_ID} \
        --data-dir /var/lib/etcd --backup-dir /var/lib/etcd/$NEW_ETCD-backup
      # tar -cvf $NEW_ETCD-backup.tar.gz -C /var/lib/etcd/$NEW_ETCD-backup/ .
      # scp $NEW_ETCD-backup.tar.gz $NEW_ETCD:/var/lib/etcd/
  6. On the new host, extract the backup data and set the permissions:

    # tar -xf /etc/etcd/<NEW_ETCD_HOSTNAME> -C /etc/etcd/ --overwrite
    # chown etcd:etcd /etc/etcd/*
    
    # rm -rf /var/lib/etcd/member
    # tar -xf /var/lib/etcd/<NEW_ETCD_HOSTNAME> -C /var/lib/etcd/
    # chown -R etcd:etcd /var/lib/etcd/
  7. On the new etcd host’s etcd.conf file:

    1. Replace the following with the values generated in the previous step:

      • ETCD_NAME

      • ETCD_INITIAL_CLUSTER

      • ETCD_INITIAL_CLUSTER_STATE

        Replace the IP address with the "NEW_ETCD" value for:

      • ETCD_LISTEN_PEER_URLS

      • ETCD_LISTEN_CLIENT_URLS

      • ETCD_INITIAL_ADVERTISE_PEER_URLS

      • ETCD_ADVERTISE_CLIENT_URLS

        For replacing failed hosts, you will need to remove the failed hosts from the etcd configuration.

  8. Start etcd on the new host:

    # systemctl enable etcd --now
  9. To verify that the new host has been added successfully:

    etcdctl -C https://${ETCD_CA_HOST}:2379 --ca-file=/etc/etcd/ca.crt \
      --cert-file=/etc/etcd/peer.crt     \
      --key-file=/etc/etcd/peer.key cluster-health

Project Backup

A future release of OpenShift Container Platform will feature specific support for per-project back up and restore.

For now, to back up API objects at the project level, use oc export for each object to be saved. For example, to save the deployment configuration frontend in YAML format:

$ oc export dc frontend -o yaml > dc-frontend.yaml

To back up all of the project (with the exception of cluster objects like namespaces and projects):

$ oc export all -o yaml > project.yaml

Role Bindings

Sometimes custom policy role bindings are used in a project. For example, a project administrator can give another user a certain role in the project and grant that user project access.

These role bindings can be exported:

$ oc get rolebindings -o yaml --export=true > rolebindings.yaml

Service Accounts

If custom service accounts are created in a project, these need to be exported:

$ oc get serviceaccount -o yaml --export=true > serviceaccount.yaml

Secrets

Custom secrets like source control management secrets (SSH Public Keys, Username/Password) should be exported if they are used:

$ oc get secret -o yaml --export=true > secret.yaml

Persistent Volume Claims

If the an application within a project uses a persistent volume through a persistent volume claim (PVC), these should be backed up:

$ oc get pvc -o yaml --export=true > pvc.yaml

Project Restore

To restore a project, recreate the project and recreate all all of the objects that were exported during the backup:

$ oc new-project myproject
$ oc create -f project.yaml
$ oc create -f secret.yaml
$ oc create -f serviceaccount.yaml
$ oc create -f pvc.yaml
$ oc create -f rolebindings.yaml

Some resources can fail to be created (for example, pods and default service accounts).

Application Data Backup

In many cases, application data can be backed up using the oc rsync command, assuming rsync is installed within the container image. The Red Hat rhel7 base image does contain rsync. Therefore, all images that are based on rhel7 contain it as well.

This is a generic backup of application data and does not take into account application-specific backup procedures, for example special export/import procedures for database systems.

Other means of backup may exist depending on the type of the persistent volume (for example, Cinder, NFS, Gluster, or others).

The paths to back up are also application specific. You can determine what path to back up by looking at the mountPath for volumes in the deploymentconfig.

Example of Backing up a Jenkins Deployment’s Application Data
  1. Get the application data mountPath from the deploymentconfig:

    $ oc export dc/jenkins|grep mountPath
            - mountPath: /var/lib/jenkins
  2. Get the name of the pod that is currently running:

    $ oc get po --selector=deploymentconfig=jenkins
    NAME              READY     STATUS    RESTARTS   AGE
    jenkins-1-a3347   1/1       Running   0          18h
  3. Use the oc rsync command to copy application data:

    $ oc rsync jenkins-1-37nux:/var/lib/jenkins /tmp/

This type of application data backup can only be performed while an application pod is currently running.

Application Data Restore

The process for restoring application data is similar to the application backup procedure using the oc rsync tool. The same restrictions apply and the process of restoring application data requires a persistent volume.

Example of Restoring a Jenkins Deployment’s Application Data
  1. Verify the backup:

    $ ls -la /tmp/jenkins-backup/
    total 8
    drwxrwxr-x.  3 user     user   20 Sep  6 11:14 .
    drwxrwxrwt. 17 root     root 4096 Sep  6 11:16 ..
    drwxrwsrwx. 12 user     user 4096 Sep  6 11:14 jenkins
  2. Use the oc rsync tool to copy the data into the running pod:

    $ oc rsync /tmp/jenkins-backup/jenkins jenkins-1-37nux:/var/lib

    Depending on the application, you may be required to restart the application.

  3. Restart the application with new data (optional):

    $ oc delete po jenkins-1-37nux

    Alternatively, you can scale down the deployment to 0, and then up again:

    $ oc scale --replicas=0 dc/jenkins
    $ oc scale --replicas=1 dc/jenkins