Overview

Following an OpenShift Enterprise upgrade, it may be desirable in extreme cases to downgrade your cluster to a previous version. The following sections outline the required steps for each system in a cluster to perform such a downgrade for the OpenShift Enterprise 3.2 to 3.1 downgrade path.

These steps are currently only supported for RPM-based installations of OpenShift Enterprise and assumes downtime of the entire cluster.

For the OpenShift Enterprise 3.1 to 3.0 downgrade path, see the OpenShift Enterprise 3.1 documentation, which has modified steps.

Verifying Backups

The Ansible playbook used during the upgrade process should have created a backup of the master-config.yaml file and the etcd data directory. Ensure these exist on your masters and etcd members:

/etc/origin/master/master-config.yaml.<timestamp>
/var/lib/origin/etcd-backup-<timestamp>

Also, back up the node-config.yaml file on each node (including masters, which have the node component on them) with a timestamp:

/etc/origin/node/node-config.yaml.<timestamp>

If you are using an external etcd cluster (versus the single embedded etcd), the backup is likely created on all etcd members, though only one is required for the recovery process.

The RPM downgrade process in a later step should create .rpmsave backups of the following files, but it may be a good idea to keep a separate copy regardless:

/etc/sysconfig/atomic-openshift-master
/etc/etcd/etcd.conf (1)
1 Only required if using external etcd.

Shutting Down the Cluster

On all masters, nodes, and etcd members (if using an external etcd cluster), ensure the relevant services are stopped.

On the master in a single master cluster:

# systemctl stop atomic-openshift-master

On each master in a multi-master cluster:

# systemctl stop atomic-openshift-master-api
# systemctl stop atomic-openshift-master-controllers

On all master and node hosts:

# systemctl stop atomic-openshift-node

On any external etcd hosts:

# systemctl stop etcd

Removing RPMs

On all masters, nodes, and etcd members (if using an external etcd cluster), remove the following packages:

# yum remove atomic-openshift \
    atomic-openshift-clients \
    atomic-openshift-node \
    atomic-openshift-master \
    openvswitch \
    atomic-openshift-sdn-ovs \
    tuned-profiles-atomic-openshift-node

If you are using external etcd, also remove the etcd package:

# yum remove etcd

For embedded etcd, you can leave the etcd package installed, as the package is only required so that the etcdctl command is available to issue operations in later steps.

Downgrading Docker

OpenShift Enterprise 3.2 requires Docker 1.9.1 and also supports Docker 1.10.3, however OpenShift Enterprise 3.1 requires Docker 1.8.2.

Downgrade to Docker 1.8.2 on each host using the following steps:

  1. Remove all local containers and images on the host. Any pods backed by a replication controller will be recreated.

    The following commands are destructive and should be used with caution.

    Delete all containers:

    # docker rm $(docker ps -a -q)

    Delete all images:

    # docker rmi $(docker images -q)
  2. Use yum swap (instead of yum downgrade) to install Docker 1.8.2:

    # yum swap docker-* docker-*1.8.2
    # sed -i 's/--storage-opt dm.use_deferred_deletion=true//' /etc/sysconfig/docker-storage
    # systemctl restart docker
  3. You should now have Docker 1.8.2 installed and running on the host. Verify with the following:

    # docker version
    Client:
     Version:      1.8.2-el7
     API version:  1.20
     Package Version: docker-1.8.2-10.el7.x86_64
    [...]
    
    # systemctl status docker
    ‚óŹ docker.service - Docker Application Container Engine
       Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
       Active: active (running) since Mon 2016-06-27 15:44:20 EDT; 33min ago
    [...]

Reinstalling RPMs

Disable the OpenShift Enterprise 3.2 repositories, and re-enable the 3.1 repositories:

# subscription-manager repos \
    --disable=rhel-7-server-ose-3.2-rpms \
    --enable=rhel-7-server-ose-3.1-rpms

On each master, install the following packages:

# yum install atomic-openshift \
    atomic-openshift-clients \
    atomic-openshift-node \
    atomic-openshift-master \
    openvswitch \
    atomic-openshift-sdn-ovs \
    tuned-profiles-atomic-openshift-node

On each node, install the following packages:

# yum install atomic-openshift \
    atomic-openshift-node \
    openvswitch \
    atomic-openshift-sdn-ovs \
    tuned-profiles-atomic-openshift-node

If using an external etcd cluster, install the following package on each etcd member:

# yum install etcd

Restoring etcd

Whether using embedded or external etcd, you must first restore the etcd backup by creating a new, single node etcd cluster. If using external etcd with multiple members, you must then also add any additional etcd members to the cluster one by one.

However, the details of the restoration process differ between embedded and external etcd. See the following section that matches your etcd configuration and follow the relevant steps before continuing to Bringing OpenShift Enterprise Services Back Online.

Follow the Cluster Restore procedure to restore single-member etcd clusters.

Embedded etcd

Restore your etcd backup and configuration:

  1. Run the following on the master with the embedded etcd:

    # ETCD_DIR=/var/lib/origin/openshift.local.etcd
    # mv $ETCD_DIR /var/lib/etcd.orig
    # cp -Rp /var/lib/origin/etcd-backup-<timestamp>/ $ETCD_DIR
    # chcon -R --reference /var/lib/etcd.orig/ $ETCD_DIR
    # chown -R etcd:etcd $ETCD_DIR

    The $ETCD_DIR location differs between external and embedded etcd.

  2. Create the new, single node etcd cluster:

    # etcd -data-dir=/var/lib/origin/openshift.local.etcd \
        -force-new-cluster

    Verify etcd has started successfully by checking the output from the above command, which should look similar to the following near the end:

    [...]
    2016-06-24 12:14:45.644073 I | etcdserver: starting server... [version: 2.2.5, cluster version: 2.2]
    [...]
    2016-06-24 12:14:46.834394 I | etcdserver: published {Name:default ClientURLs:[http://localhost:2379 http://localhost:4001]} to cluster 5580663a6e0002
  3. Shut down the process by running the following from a separate terminal:

    # pkill etcd
  4. Continue to Bringing OpenShift Enterprise Services Back Online.

External etcd

Choose a system to be the initial etcd member, and restore its etcd backup and configuration:

  1. Run the following on the etcd host:

    # ETCD_DIR=/var/lib/etcd/
    # mv $ETCD_DIR /var/lib/etcd.orig
    # cp -Rp /var/lib/origin/etcd-backup-<timestamp>/ $ETCD_DIR
    # chcon -R --reference /var/lib/etcd.orig/ $ETCD_DIR
    # chown -R etcd:etcd $ETCD_DIR

    The $ETCD_DIR location differs between external and embedded etcd.

  2. Restore your /etc/etcd/etcd.conf file from backup or .rpmsave.

  3. Create the new single node cluster using etcd’s --force-new-cluster option. You can do this with a long complex command using the values from /etc/etcd/etcd.conf, or you can temporarily modify the systemd unit file and start the service normally.

    To do so, edit the /usr/lib/systemd/system/etcd.service and add --force-new-cluster:

    # sed -i '/ExecStart/s/"$/  --force-new-cluster"/' /usr/lib/systemd/system/etcd.service
    # cat /usr/lib/systemd/system/etcd.service  | grep ExecStart
    
    ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/bin/etcd --force-new-cluster"

    Then restart the etcd service:

    # systemctl daemon-reload
    # systemctl start etcd
  4. Verify the etcd service started correctly, then re-edit the /usr/lib/systemd/system/etcd.service file and remove the --force-new-cluster option:

    # sed -i '/ExecStart/s/ --force-new-cluster//' /usr/lib/systemd/system/etcd.service
    # cat /usr/lib/systemd/system/etcd.service  | grep ExecStart
    
    ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/bin/etcd"
  5. Restart the etcd service, then verify the etcd cluster is running correctly and displays OpenShift Enterprise’s configuration:

    # systemctl daemon-reload
    # systemctl restart etcd
    # etcdctl --cert-file=/etc/etcd/peer.crt \
        --key-file=/etc/etcd/peer.key \
        --ca-file=/etc/etcd/ca.crt \
        --peers="https://172.16.4.18:2379,https://172.16.4.27:2379" \
        ls /
  6. If you have additional etcd members to add to your cluster, continue to Adding Additional etcd Members. Otherwise, if you only want a single node external etcd, continue to Bringing OpenShift Enterprise Services Back Online.

Adding Additional etcd Members

To add additional etcd members to the cluster, you must first adjust the default localhost peer in the peerURLs value for the first member:

  1. Get the member ID for the first member using the member list command:

    # etcdctl --cert-file=/etc/etcd/peer.crt \
        --key-file=/etc/etcd/peer.key \
        --ca-file=/etc/etcd/ca.crt \
        --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \
        member list
  2. Update the value of peerURLs using the etcdctl member update command by passing the member ID obtained from the previous step:

    # etcdctl --cert-file=/etc/etcd/peer.crt \
        --key-file=/etc/etcd/peer.key \
        --ca-file=/etc/etcd/ca.crt \
        --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \
        member update 511b7fb6cc0001 https://172.18.1.18:2380

    Alternatively, you can use curl:

    # curl --cacert /etc/etcd/ca.crt \
        --cert /etc/etcd/peer.crt \
        --key /etc/etcd/peer.key \
        https://172.18.1.18:2379/v2/members/511b7fb6cc0001 \
        -XPUT -H "Content-Type: application/json" \
        -d '{"peerURLs":["https://172.18.1.18:2380"]}'
  3. Re-run the member list command and ensure the peer URLs no longer include localhost.

  4. Now add each additional member to the cluster, one at a time.

    Each member must be fully added and brought online one at a time. When adding each additional member to the cluster, the peerURLs list must be correct for that point in time, so it will grow by one for each member added. The etcdctl member add command will output the values that need to be set in the etcd.conf file as you add each member, as described in the following instructions.

    1. For each member, add it to the cluster using the values that can be found in that system’s etcd.conf file:

      # etcdctl --cert-file=/etc/etcd/peer.crt \
          --key-file=/etc/etcd/peer.key \
          --ca-file=/etc/etcd/ca.crt \
          --peers="https://172.16.4.18:2379,https://172.16.4.27:2379" \
          member add 10.3.9.222 https://172.16.4.27:2380
      
      Added member named 10.3.9.222 with ID 4e1db163a21d7651 to cluster
      
      ETCD_NAME="10.3.9.222"
      ETCD_INITIAL_CLUSTER="10.3.9.221=https://172.16.4.18:2380,10.3.9.222=https://172.16.4.27:2380"
      ETCD_INITIAL_CLUSTER_STATE="existing"
    2. Using the environment variables provided in the output of the above etcdctl member add command, edit the /etc/etcd/etcd.conf file on the member system itself and ensure these settings match.

    3. Now start etcd on the new member:

      # rm -rf /var/lib/etcd/member
      # systemctl enable etcd
      # systemctl start etcd
    4. Ensure the service starts correctly and the etcd cluster is now healthy:

      # etcdctl --cert-file=/etc/etcd/peer.crt \
          --key-file=/etc/etcd/peer.key \
          --ca-file=/etc/etcd/ca.crt \
          --peers="https://172.16.4.18:2379,https://172.16.4.27:2379" \
          member list
      
      51251b34b80001: name=10.3.9.221 peerURLs=https://172.16.4.18:2380 clientURLs=https://172.16.4.18:2379
      d266df286a41a8a4: name=10.3.9.222 peerURLs=https://172.16.4.27:2380 clientURLs=https://172.16.4.27:2379
      
      # etcdctl --cert-file=/etc/etcd/peer.crt \
          --key-file=/etc/etcd/peer.key \
          --ca-file=/etc/etcd/ca.crt \
          --peers="https://172.16.4.18:2379,https://172.16.4.27:2379" \
          cluster-health
      
      cluster is healthy
      member 51251b34b80001 is healthy
      member d266df286a41a8a4 is healthy
    5. Now repeat this process for the next member to add to the cluster.

  5. After all additional etcd members have been added, continue to Bringing OpenShift Enterprise Services Back Online.

Bringing OpenShift Enterprise Services Back Online

On each OpenShift Enterprise master, restore your master and node configuration from backup and enable and restart all relevant services.

On the master in a single master cluster:

# cp /etc/sysconfig/atomic-openshift-master.rpmsave /etc/sysconfig/atomic-openshift-master
# cp /etc/origin/master/master-config.yaml.<timestamp> /etc/origin/master/master-config.yaml
# cp /etc/origin/node/node-config.yaml.<timestamp> /etc/origin/node/node-config.yaml
# systemctl enable atomic-openshift-master
# systemctl enable atomic-openshift-node
# systemctl start atomic-openshift-master
# systemctl start atomic-openshift-node

On each master in a multi-master cluster:

# cp /etc/sysconfig/atomic-openshift-master-api.rpmsave /etc/sysconfig/atomic-openshift-master-api
# cp /etc/sysconfig/atomic-openshift-master-controllers.rpmsave /etc/sysconfig/atomic-openshift-master-controllers
# cp /etc/origin/master/master-config.yaml.<timestamp> /etc/origin/master/master-config.yaml
# cp /etc/origin/node/node-config.yaml.<timestamp> /etc/origin/node/node-config.yaml
# systemctl enable atomic-openshift-master-api
# systemctl enable atomic-openshift-master-controllers
# systemctl enable atomic-openshift-node
# systemctl start atomic-openshift-master-api
# systemctl start atomic-openshift-master-controllers
# systemctl start atomic-openshift-node

On each OpenShift Enterprise node, restore your node-config.yaml file from backup and enable and restart the atomic-openshift-node service:

# cp /etc/origin/node/node-config.yaml.<timestamp> /etc/origin/node/node-config.yaml
# systemctl enable atomic-openshift-node
# systemctl start atomic-openshift-node

Your OpenShift Enterprise cluster should now be back online.

Verifying the Downgrade

To verify the downgrade, first check that all nodes are marked as Ready:

# oc get nodes
NAME                 LABELS                                                                STATUS
master.example.com   kubernetes.io/hostname=master.example.com,region=infra,zone=default   Ready,SchedulingDisabled
node1.example.com    kubernetes.io/hostname=node1.example.com,region=primary,zone=east     Ready
node2.example.com    kubernetes.io/hostname=node2.example.com,region=primary,zone=east     Ready

Then, verify that you are running the expected versions of the docker-registry and router images, if deployed:

# oc get -n default dc/docker-registry -o json | grep \"image\"
    "image": "openshift3/ose-docker-registry:v3.1.1.6",
# oc get -n default dc/router -o json | grep \"image\"
    "image": "openshift3/ose-haproxy-router:v3.1.1.6",

You can use the diagnostics tool on the master to look for common issues and provide suggestions. In OpenShift Enterprise 3.1, the oc adm diagnostics tool is available as openshift ex diagnostics:

# openshift ex diagnostics
...
[Note] Summary of diagnostics execution:
[Note] Completed with no errors or warnings seen.