You can update, or upgrade, an OpenShift Container Platform cluster. If your cluster contains Red Hat Enterprise Linux (RHEL) machines, you must perform more steps to update those machines.

Prerequisites

About the OpenShift Container Platform update service

The OpenShift Container Platform update service is the hosted service that provides over-the-air updates to both OpenShift Container Platform and Red Hat Enterprise Linux CoreOS (RHCOS). It provides a graph, or diagram that contain vertices and the edges that connect them, of component Operators. The edges in the graph show which versions you can safely update to, and the vertices are update payloads that specify the intended state of the managed cluster components.

The Cluster Version Operator (CVO) in your cluster checks with the OpenShift Container Platform update service to see the valid updates and update paths based on current component versions and information in the graph. When you request an update, the OpenShift Container Platform CVO uses the release image for that update to upgrade your cluster. The release artifacts are hosted in Quay as container images.

To allow the OpenShift Container Platform update service to provide only compatible updates, a release verification pipeline exists to drive automation. Each release artifact is verified for compatibility with supported cloud platforms and system architectures as well as other component packages. After the pipeline confirms the suitability of a release, the OpenShift Container Platform update service notifies you that it is available.

Because the update service displays all valid updates, you must not force an update to a version that the update service does not display.

During continuous update mode, two controllers run. One continuously updates the payload manifests, applies them to the cluster, and outputs the status of the controlled rollout of the Operators, whether they are available, upgrading, or failed. The second controller polls the OpenShift Container Platform update service to determine if updates are available.

Reverting your cluster to a previous version, or a rollback, is not supported. Only upgrading to a newer version is supported.

During the upgrade process, the Machine Config Operator (MCO) applies the new configuration to your cluster machines. It cordons the number of nodes that is specified by the maxUnavailable field on the machine configuration pool and marks them as unavailable. By default, this value is set to 1. It then applies the new configuration and reboots the machine. If you use Red Hat Enterprise Linux (RHEL) machines as workers, the MCO does not update the kubelet on these machines because you must update the OpenShift API on them first. Because the specification for the new version is applied to the old kubelet, the RHEL machine cannot return to the Ready state. You cannot complete the update until the machines are available. However, the maximum number of nodes that are unavailable is set to ensure that normal cluster operations are likely to continue with that number of machines out of service.

Understanding OpenShift Container Platform upgrade channels

In OpenShift Container Platform 4.1, Red Hat introduced the concept of upgrade channels for recommending the appropriate upgrade versions to your cluster. Upgrade channels separate upgrade strategies and also are used to control the cadence of updates. Channels are tied to a minor version of OpenShift Container Platform. For instance, OpenShift Container Platform 4.3 channels will never include an upgrade to a 4.4 release. This ensures administrators make an explicit decision to upgrade to the next minor version of OpenShift Container Platform. Channels only control updates and have no impact on the version of the cluster you install; the openshift-install binary for a given patch level of OpenShift Container Platform always installs that patch level.

OpenShift Container Platform 4.3, which includes the upgrade from the previous 4.2 release, has three upgrade channels to choose from:

  • candidate-4.3

  • fast-4.3

  • stable-4.3

The upgrade channels contain two types of updates:

  1. General Availability Software (or GA) - These versions of OpenShift Container Platform are fully supported and are considered production quality. You may upgrade to the general availability release from either of the fast and stable channels.

  2. Release Candidate Software (or RC) - These versions of OpenShift Container Platform are representative of the eventual general availability release and are available only in the candidate-4.3 channel. The release candidate will contain all the features of the product. You are allowed to upgrade from a release candidate to another release candidate and to upgrade from a previous minor version of OpenShift Container Platform to the current release candidate. Release candidate builds are not supported by Red Hat and you will not be able to upgrade from a release candidate to the general availability release of OpenShift Container Platform. Candidates should be used to test feature acceptance and assist in qualifying the next version of OpenShift Container Platform in your infrastructure.

    Release candidates differ from the nightly builds found on try.openshift.com. You cannot upgrade nightly builds to nightly builds. Nightly builds are available for early access to features but are not upgradable or supported.

For GA versions, the fast and stable channels present a choice between receiving updates as soon as they are available or allowing Red Hat to control the rollout of those updates.

fast-4.3

The fast channel is updated with new 4.3 patch versions as soon as Red Hat declares they are generally available. Use this channel if you wish to receive updates as soon as they are available or for your pre-production environments when participating in the connected customer program. This channel will contain all z-stream (4.3.z) updates but will not suggest upgrades to the next minor release (4.4.z) when the next minor release is available.

stable-4.3

The stable channel will contain updates on a time delay as they are gradually rolled out to customers based on data from our SRE teams, support services, and pre-production and production environments that participate in our connected customer program, rather than being immediately available as they are in the fast channel. For patch and CVE fixes this can range from several hours to a day and allows an extra period of assessment in how the software performs. If issues are detected during rollout, upgrades to that version may be blocked in both the fast and stable channels, and a new version may be introduced that will be the new preferred upgrade target.

Customers can improve this process by configuring pre-production systems on the fast channel, production systems on the stable channel, and participating in Red Hat’s connected customer program - this allows Red Hat to observe the impact of updates on your specific hardware and software configurations. Future releases may improve or alter the pace at which updates move from the fast to the stable channels.

If issues are discovered with an upgrade between patch levels, Red Hat may withdraw that suggested upgrade for affected versions. A newer patch would become available in the appropriate channels and be suggested for upgrade.

Upgrade version paths

OpenShift Container Platform maintains an upgrade recommendation service that understands the version of OpenShift Container Platform you have installed as well as the path to take within the channel you choose to get you to the next release. You can imagine seeing the following in the fast-4.3 channel:

  • 4.3.0

  • 4.3.1

  • 4.3.3

  • 4.3.4

The service only recommends upgrades that have been tested and have no known issues. If you are on 4.3.1 and OpenShift Container Platform is allowing you to select 4.3.4, then it is safe for you to go from .4.3.1 to .4.3.4. Likewise, the absence of 4.3.2 may be due to a CVE that was fixed in 4.3.3 and Red Hat no longer suggests upgrading to a known vulnerable version. If an issue is found that results in a new version being retracted from the recommendations, Red Hat will release a new version that is capable of upgrading from all necessary versions, including the retracted version.

Disconnected clusters

Customers which have chosen to not be connected to Red Hat and are curating their own OpenShift Container Platform container image content manually should consult the Red Hat errata associated with product releases and note any comments impacting upgrades. During upgrade the user interface may caution about switching between these versions and it is up to the customer to ensure they have correctly selected the appropriate version before bypassing those cautions.

Switching between channels

It is supported for customers to switch between the fast and stable channel at any time. Channels only offer suggested upgrades, and will never suggest a dangerous upgrade. If you switch to the candidate channel after installing from a GA version, you will see a warning the current version is not recognized, and you can safely switch back to a GA channel.

Updating a cluster by using the web console

If updates are available, you can update your cluster from the web console.

You can find information about available OpenShift Container Platform advisories and updates in the errata section of the Customer Portal.

Prerequisites
  • Have access to the web console as a user with admin privileges.

Procedure
  1. From the web console, click Administration > Cluster Settings and review the contents of the Overview tab.

  2. For production clusters, ensure that the CHANNEL is set to the correct channel for your current minor version, such as stable-4.3.

    For production clusters, you must subscribe to a stable-* or fast-* channel.

    • If the UPDATE STATUS is not Updates Available, you cannot upgrade your cluster.

    • The DESIRED VERSION indicates the cluster version that your cluster is running or is updating to.

  3. Click Updates Available, select the highest available version and click Update. The UPDATE STATUS changes to Updating, and you can review the progress of the Operator upgrades on the Cluster Operators tab.

  4. After the update completes and the Cluster Version Operator refreshes the available updates, check if more updates are available in your current channel.

    • If updates are available, continue to perform updates in the current channel until you can no longer update.

    • If no updates are available, change the CHANNEL to the stable-* or fast-* channel for the next minor version, and update to the version that you want in that channel.

    You might need to perform several intermediate updates until you reach the version that you want.

    When you update a cluster that contains Red Hat Enterprise Linux (RHEL) worker machines, those workers temporarily become unavailable during the update process. You must run the upgrade playbook against each RHEL machine as it enters the NotReady state for the cluster to finish updating.

(Optional) Adding hooks to perform Ansible tasks on RHEL machines

You can use hooks to run Ansible tasks on the RHEL compute machines during the OpenShift Container Platform update.

About Ansible hooks for upgrades

When you update OpenShift Container Platform, you can run custom tasks on your Red Hat Enterprise Linux (RHEL) nodes during specific operations by using hooks. Hooks allow you to provide files that define tasks to run before or after specific update tasks. You can use hooks to validate or modify custom infrastructure when you update the RHEL compute nodes in you OpenShift Container Platform cluster.

Because when a hook fails, the operation fails, you must design hooks that are idempotent, or can run multiple times and provide the same results.

Hooks have the following important limitations: - Hooks do not have a defined or versioned interface. They can use internal openshift-ansible variables, but it is possible that the variables will be modified or removed in future OpenShift Container Platform releases. - Hooks do not have error handling, so an error in a hook halts the update process. If you get an error, you must address the problem and then start the upgrade again.

Configuring the Ansible inventory file to use hooks

You define the hooks to use when you update the Red Hat Enterprise Linux (RHEL) compute, or worker, machines in the hosts inventory file under the all:vars section.

Prerequisites
  • You have access to the machine that you used to add the RHEL compute machines cluster. You must have access to the hosts Ansible inventory file that defines your RHEL machines.

Procedure
  1. After you design the hook, create a YAML file that defines the Ansible tasks for it. This file must be a set of tasks and cannot be a playbook, as shown in the following example:

    ---
    # Trivial example forcing an operator to acknowledge the start of an upgrade
    # file=/home/user/openshift-ansible/hooks/pre_compute.yml
    
    - name: note the start of a compute machine update
      debug:
          msg: "Compute machine upgrade of {{ inventory_hostname }} is about to start"
    
    - name: require the user agree to start an upgrade
      pause:
          prompt: "Press Enter to start the compute machine update"
  2. Modify the hosts Ansible inventory file to specify the hook files. The hook files are specified as parameter values in the [all:vars] section, as shown:

    Example hook definitions in an inventory file
    [all:vars]
    openshift_node_pre_upgrade_hook=/home/user/openshift-ansible/hooks/pre_node.yml
    openshift_node_post_upgrade_hook=/home/user/openshift-ansible/hooks/post_node.yml

    To avoid ambiguity in the paths to the hook, use absolute paths instead of a relative paths in their definitions.

Available hooks for RHEL compute machines

You can use the following hooks when you update the Red Hat Enterprise Linux (RHEL) compute machines in your OpenShift Container Platform cluster.

Hook name Description

openshift_node_pre_cordon_hook

  • Runs before each node is cordoned.

  • This hook runs against each node in serial.

  • If a task must run against a different host, the task must use delegate_to or local_action.

openshift_node_pre_upgrade_hook

  • Runs after each node is cordoned but before it is updated.

  • This hook runs against each node in serial.

  • If a task must run against a different host, the task must use delegate_to or local_action.

openshift_node_pre_uncordon_hook

  • Runs after each node is updated but before it is uncordoned.

  • This hook runs against each node in serial.

  • If a task must run against a different host, they task must use delegate_to or local_action.

openshift_node_post_upgrade_hook

  • Runs after each node uncordoned. It is the last node update action.

  • This hook runs against each node in serial.

  • If a task must run against a different host, the task must use delegate_to or local_action.

Updating RHEL compute machines in your cluster

After you update your cluster, you must update the Red Hat Enterprise Linux (RHEL) compute machines in your cluster.

Prerequisites
  • You updated your cluster.

    Because the RHEL machines require assets that are generated by the cluster to complete the update process, you must update the cluster before you update the RHEL compute machines in it.

  • You have access to the machine that you used to add the RHEL compute machines cluster. You must have access to the hosts Ansible inventory file that defines your RHEL machines and the upgrade playbook.

Procedure
  1. Stop and disable firewalld on the host:

    # systemctl disable --now firewalld.service

    You must not enable firewalld later. If you do, you cannot access OpenShift Container Platform logs on the worker.

  2. Enable the repositories that are required for OpenShift Container Platform 4.3:

    1. On the machine that you run the Ansible playbooks, update the required repositories:

      # subscription-manager repos --disable=rhel-7-server-ansible-2.7-rpms  \
                                   --disable=rhel-7-server-ose-4.2-rpms \
                                   --enable=rhel-7-server-ansible-2.8-rpms \
                                   --enable=rhel-7-server-ose-4.3-rpms
    2. On the machine that you run the Ansible playbooks, update the required packages, including openshift-ansible:

      # yum update openshift-ansible openshift-clients
    3. On each RHEL compute node, update the required repositories:

      # subscription-manager repos --disable=rhel-7-server-ose-4.2-rpms \
                                   --enable=rhel-7-server-ose-4.3-rpms
  3. Update a RHEL worker machine:

    1. Review the current node status to determine which RHEL worker to update:

      # oc get node
      NAME                        STATUS                        ROLES    AGE    VERSION
      mycluster-control-plane-0   Ready                         master   145m   v1.16.2
      mycluster-control-plane-1   Ready                         master   145m   v1.16.2
      mycluster-control-plane-2   Ready                         master   145m   v1.16.2
      mycluster-rhel7-0           NotReady,SchedulingDisabled   worker   98m    v1.14.6+97c81d00e
      mycluster-rhel7-1           Ready                         worker   98m    v1.14.6+97c81d00e
      mycluster-rhel7-2           Ready                         worker   98m    v1.14.6+97c81d00e
      mycluster-rhel7-3           Ready                         worker   98m    v1.14.6+97c81d00e

      Note which machine has the NotReady,SchedulingDisabled status.

    2. Review your Ansible inventory file at /<path>/inventory/hosts and update its contents so that only the machine with the NotReady,SchedulingDisabled status is listed in the [workers] section, as shown in the following example:

      [all:vars]
      ansible_user=root
      #ansible_become=True
      
      openshift_kubeconfig_path="~/.kube/config"
      
      [workers]
      mycluster-rhel7-0.example.com
    3. Change to the openshift-ansible directory and run the upgrade playbook:

      $ cd /usr/share/ansible/openshift-ansible
      $ ansible-playbook -i /<path>/inventory/hosts playbooks/upgrade.yml (1)
      1 For <path>, specify the path to the Ansible inventory file that you created.
  4. Follow the process in the previous step to update each RHEL worker machine in your cluster.

  5. After you update all of the workers, confirm that all of your cluster nodes have updated to the new version:

    # oc get node
    NAME                        STATUS                        ROLES    AGE    VERSION
    mycluster-control-plane-0   Ready                         master   145m   v1.16.2
    mycluster-control-plane-1   Ready                         master   145m   v1.16.2
    mycluster-control-plane-2   Ready                         master   145m   v1.16.2
    mycluster-rhel7-0           NotReady,SchedulingDisabled   worker   98m    v1.16.2
    mycluster-rhel7-1           Ready                         worker   98m    v1.16.2
    mycluster-rhel7-2           Ready                         worker   98m    v1.16.2
    mycluster-rhel7-3           Ready                         worker   98m    v1.16.2