Downgrading | Installation and Configuration

Overview
Verifying Backups
Shutting Down the Cluster
Removing RPMs
Reinstalling RPMs
Restoring etcd
- Embedded etcd
- External etcd
Bringing OpenShift Services Back Online

Overview

Following an OpenShift Enterprise upgrade, it may be desirable in extreme cases to downgrade your cluster to a previous version. The following sections outline the required steps for each system in a cluster to perform such a downgrade, currently supported for the OpenShift Enterprise 3.1 to 3.0 downgrade path.

Verifying Backups

The Ansible playbook used during the upgrade process should have created a backup of the master-config.yaml file and the etcd data directory. Ensure these exist on your masters and etcd members:

/etc/openshift/master/master-config.yaml.<timestamp>
/var/lib/openshift/etcd-backup-<timestamp>

If you use a separate etcd cluster instead of a single embedded etcd instance, the backup is likely created on all etcd members, though only one is required for the recovery process. You can run a separate etcd instance that is co-located with your master nodes.

The RPM downgrade process in a later step should create .rpmsave backups of the following files, but it may be a good idea to keep a separate copy regardless:

/etc/sysconfig/openshift-master
/etc/etcd/etcd.conf (1)

1	Only required if using a separate etcd cluster.

Shutting Down the Cluster

On all masters, nodes, and etcd members, if you use a separate etcd cluster that runs on different nodes, ensure the relevant services are stopped:

# systemctl stop atomic-openshift-master
# systemctl stop atomic-openshift-node
# systemctl stop etcd (1)

1	Only required if using external etcd.

Removing RPMs

On all masters, nodes, and etcd members (if using an external etcd cluster), remove the following packages:

# yum remove atomic-openshift \
    atomic-openshift-clients \
    atomic-openshift-node \
    atomic-openshift-master \
    openvswitch \
    atomic-openshift-sdn-ovs \
    tuned-profiles-atomic-openshift-node

If you are using external etcd, also remove the etcd package:

# yum remove etcd

For embedded etcd, you can leave the etcd package installed, as the package is only required so that the etcdctl command is available to issue operations in later steps.

Reinstalling RPMs

Disable the OpenShift Enteprise 3.1 repositories, and re-enable the 3.0 repositories:

# subscription-manager repos \
    --disable=rhel-7-server-ose-3.1-rpms \
    --enable=rhel-7-server-ose-3.0-rpms

On each master, install the following packages:

# yum install openshift \
    openshift-master \
    openshift-node \
    openshift-sdn-ovs

On each node, install the following packages:

# yum install openshift \
    openshift-node \
    openshift-sdn-ovs

If using a separate etcd cluster, install the following package on each etcd member:

# yum install etcd

Restoring etcd

Whether using embedded or external etcd, you must first restore the etcd backup by creating a new, single node etcd cluster. If using external etcd with multiple members, you must then also add any additional etcd members to the cluster one by one.

However, the details of the restoration process differ between embedded and external etcd. See the following section that matches your etcd configuration and follow the relevant steps before continuing to Bringing OpenShift Services Back Online.

Embedded etcd

Restore your etcd backup and configuration:

Run the following on the master with the embedded etcd:

# ETCD_DIR=/var/lib/openshift/openshift.local.etcd
# mv $ETCD_DIR /var/lib/etcd.orig
# cp -Rp /var/lib/openshift/etcd-backup-<timestamp>/ $ETCD_DIR
# chcon -R --reference /var/lib/etcd.orig/ $ETCD_DIR
# chown -R etcd:etcd $ETCD_DIR

The $ETCD_DIR location differs between external and embedded etcd.

Create the new, single node etcd cluster:

# etcd -data-dir=/var/lib/openshift/openshift.local.etcd \
    -force-new-cluster

Verify etcd has started successfully by checking the output from the above command, which should look similar to the following at the end:

[...]
2016/01/8 13:24:21 etcdserver: starting server... [version: 2.1.1, cluster version: 2.1.0]
2016/01/8 13:24:22 raft: 5168c093630001 is starting a new election at term 13
2016/01/8 13:24:22 raft: 5168c093630001 became candidate at term 14
2016/01/8 13:24:22 raft: 5168c093630001 received vote from 5168c093630001 at term 14
2016/01/8 13:24:22 raft: 5168c093630001 became leader at term 14
2016/01/8 13:24:22 raft: raft.node: 5168c093630001 elected leader 5168c093630001 at term 14
2016/01/8 13:24:22 etcdserver: published {Name:default ClientURLs:[http://localhost:2379 http://localhost:4001]} to cluster 5168c093630002

Shut down the process by running the following from a separate terminal:
# pkill etcd
Continue to Bringing OpenShift Services Back Online.

External etcd

Choose a system to be the initial etcd member, and restore its etcd backup and configuration:

Run the following on the etcd host:

# ETCD_DIR=/var/lib/etcd/
# mv $ETCD_DIR /var/lib/etcd.orig
# cp -Rp /var/lib/openshift/etcd-backup-<timestamp>/ $ETCD_DIR
# chcon -R --reference /var/lib/etcd.orig/ $ETCD_DIR
# chown -R etcd:etcd $ETCD_DIR

The $ETCD_DIR location differs between external and embedded etcd.

Restore your /etc/etcd/etcd.conf file from backup or .rpmsave.
Create the new single node cluster using etcd’s --force-new-cluster option. You can do this with a long complex command using the values from the /etc/etcd/etcd.conf, or you can temporarily modify the systemd file and start the service normally.

To do so, edit the /usr/lib/systemd/system/etcd.service and add --force-new-cluster:
# sed -i '/ExecStart/s/"$/ --force-new-cluster"/' /usr/lib/systemd/system/etcd.service # cat /usr/lib/systemd/system/etcd.service | grep ExecStart ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/bin/etcd --force-new-cluster"
Then restart the etcd service:
# systemctl daemon-reload # systemctl start etcd

Verify the etcd service started correctly, then re-edit the /usr/lib/systemd/system/etcd.service file and remove the --force-new-cluster option:

# sed -i '/ExecStart/s/ --force-new-cluster//' /usr/lib/systemd/system/etcd.service
# cat /usr/lib/systemd/system/etcd.service  | grep ExecStart

ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/bin/etcd"

Restart the etcd service, then verify the etcd cluster is running correctly and displays OpenShift’s configuration:

# systemctl daemon-reload
# systemctl restart etcd
# etcdctl --cert-file=/etc/etcd/peer.crt \
    --key-file=/etc/etcd/peer.key \
    --ca-file=/etc/etcd/ca.crt \
    --peers="https://172.16.4.18:2379,https://172.16.4.27:2379" \
    ls /

If you have additional etcd members to add to your cluster, continue to Adding Additional etcd Members. Otherwise, if you only want a single node external etcd, continue to Bringing OpenShift Services Back Online.

Adding Additional etcd Members

To add additional etcd members to the cluster, you must first adjust the default localhost peerURLs for the first member:

Get the member ID for the first member using the member list command:

# etcdctl --cert-file=/etc/etcd/peer.crt \
    --key-file=/etc/etcd/peer.key \
    --ca-file=/etc/etcd/ca.crt \
    --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \
    member list

Update the peerURLs. In etcd 2.2 and beyond, this can be done with the etcdctl member update command. However, OpenShift Enterprise 3.1 uses etcd 2.1, so you must use curl:

# curl --cacert /etc/etcd/ca.crt \
    --cert /etc/etcd/peer.crt \
    --key /etc/etcd/peer.key \
    https://172.18.1.18:2379/v2/members/511b7fb6cc0001 \
    -XPUT -H "Content-Type: application/json" \
    -d '{"peerURLs":["https://172.18.1.18:2380"]}'

Re-run the member list command and ensure the peerURLs no longer points to localhost.

Now add each additional member to the cluster, one at a time.

Each member must be fully added and brought online one at a time. When adding each additional member to the cluster, the peerURLs list must be correct for that point in time, so it will grow by one for each member added. The etcdctl member add command will output the values that need to be set in the etcd.conf file as you add each member, as described in the following instructions.

For each member, add it to the cluster using the values that can be found in that system’s etcd.conf file:

# etcdctl --cert-file=/etc/etcd/peer.crt \
    --key-file=/etc/etcd/peer.key \
    --ca-file=/etc/etcd/ca.crt \
    --peers="https://172.16.4.18:2379,https://172.16.4.27:2379" \
    member add 10.3.9.222 https://172.16.4.27:2380

Added member named 10.3.9.222 with ID 4e1db163a21d7651 to cluster

ETCD_NAME="10.3.9.222"
ETCD_INITIAL_CLUSTER="10.3.9.221=https://172.16.4.18:2380,10.3.9.222=https://172.16.4.27:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

Using the environment variables provided in the output of the above etcdctl member add command, edit the /etc/etcd/etcd.conf file on the member system itself and ensure these settings match.

Now start etcd on the new member:

# rm -rf /var/lib/etcd/member
# systemctl enable etcd
# systemctl start etcd

Ensure the service starts correctly and the etcd cluster is now healthy:

# etcdctl --cert-file=/etc/etcd/peer.crt \
    --key-file=/etc/etcd/peer.key \
    --ca-file=/etc/etcd/ca.crt \
    --peers="https://172.16.4.18:2379,https://172.16.4.27:2379" \
    member list

51251b34b80001: name=10.3.9.221 peerURLs=https://172.16.4.18:2380 clientURLs=https://172.16.4.18:2379
d266df286a41a8a4: name=10.3.9.222 peerURLs=https://172.16.4.27:2380 clientURLs=https://172.16.4.27:2379

# etcdctl --cert-file=/etc/etcd/peer.crt \
    --key-file=/etc/etcd/peer.key \
    --ca-file=/etc/etcd/ca.crt \
    --peers="https://172.16.4.18:2379,https://172.16.4.27:2379" \
    cluster-health

cluster is healthy
member 51251b34b80001 is healthy
member d266df286a41a8a4 is healthy

Now repeat this process for the next member to add to the cluster.

After all additional etcd members have been added, continue to Bringing OpenShift Services Back Online.

Bringing OpenShift Services Back Online

On each OpenShift master, restore your openshift-master configuration from backup and restart relevant services:

# cp /etc/sysconfig/openshift-master.rpmsave /etc/sysconfig/openshift-master
# cp /etc/openshift/master/master-config.yaml.2015-11-20\@08\:36\:51~ /etc/openshift/master/master-config.yaml
# systemctl enable openshift-master
# systemctl enable openshift-node
# systemctl start openshift-master
# systemctl start openshift-node

On each OpenShift node, enable and restart the openshift-node service:

# systemctl enable openshift-node
# systemctl start openshift-node

Your OpenShift cluster should now be back online.

Downgrading OpenShift