×

You can update, or upgrade, an OpenShift Container Platform cluster between minor versions.

Use the web console or oc adm upgrade channel <channel> to change the update channel. You can follow the steps in Updating a cluster within a minor version by using the CLI to complete the update after you change to a 4.9 channel.

Prerequisites

  • Have access to the cluster as a user with admin privileges. See Using RBAC to define and apply permissions.

  • Have a recent etcd backup in case your upgrade fails and you must restore your cluster to a previous state.

    OpenShift Container Platform 4.9 requires an upgrade from etcd version 3.4 to 3.5. If the etcd Operator halts the upgrade, an alert is triggered. To clear this alert, ensure that you have a current etcd backup and restart the upgrade using the --force flag.

    $ oc adm upgrade --force
  • Ensure all Operators previously installed through Operator Lifecycle Manager (OLM) are updated to their latest version in their latest channel. Updating the Operators ensures they have a valid upgrade path when the default OperatorHub catalogs switch from the current minor version to the next during a cluster upgrade. See Upgrading installed Operators for more information.

  • Ensure that all machine config pools (MCPs) are running and not paused. Nodes associated with a paused MCP are skipped during the update process. You can pause the MCPs if you are performing a canary rollout update strategy.

  • If your cluster uses manually maintained credentials, ensure that the Cloud Credential Operator (CCO) is in an upgradeable state. For more information, see Upgrading clusters with manually maintained credentials for AWS, Azure, or GCP.

  • If your cluster uses manually maintained credentials with the AWS Secure Token Service (STS), obtain a copy of the ccoctl utility from the release image being upgraded to and use it to process any updated credentials. For more information, see Upgrading an OpenShift Container Platform cluster configured for manual mode with STS.

  • Review the list of APIs that were removed in Kubernetes 1.22, migrate any affected components to use the new API version, and provide the administrator acknowledgment. For more information, see Preparing to update to OpenShift Container Platform 4.9.

Using the unsupportedConfigOverrides section to modify the configuration of an Operator is unsupported and might block cluster upgrades. You must remove this setting before you can upgrade your cluster.

If you are running cluster monitoring with an attached PVC for Prometheus, you might experience OOM kills during cluster upgrade. When persistent storage is in use for Prometheus, Prometheus memory usage doubles during cluster upgrade and for several hours after upgrade is complete. To avoid the OOM kill issue, allow worker nodes with double the size of memory that was available prior to the upgrade. For example, if you are running monitoring on the minimum recommended nodes, which is 2 cores with 8 GB of RAM, increase memory to 16 GB. For more information, see BZ#1925061.

About the OpenShift Update Service

The OpenShift Update Service (OSUS) provides over-the-air updates to OpenShift Container Platform, including Red Hat Enterprise Linux CoreOS (RHCOS). It provides a graph, or diagram, that contains the vertices of component Operators and the edges that connect them. The edges in the graph show which versions you can safely update to. The vertices are update payloads that specify the intended state of the managed cluster components.

The Cluster Version Operator (CVO) in your cluster checks with the OpenShift Update Service to see the valid updates and update paths based on current component versions and information in the graph. When you request an update, the CVO uses the release image for that update to upgrade your cluster. The release artifacts are hosted in Quay as container images.

To allow the OpenShift Update Service to provide only compatible updates, a release verification pipeline drives automation. Each release artifact is verified for compatibility with supported cloud platforms and system architectures, as well as other component packages. After the pipeline confirms the suitability of a release, the OpenShift Update Service notifies you that it is available.

The OpenShift Update Service displays all recommended updates for your current cluster. If an upgrade path is not recommended by the OpenShift Update Service, it might be because of a known issue with the update or the target release.

Two controllers run during continuous update mode. The first controller continuously updates the payload manifests, applies the manifests to the cluster, and outputs the controlled rollout status of the Operators to indicate whether they are available, upgrading, or failed. The second controller polls the OpenShift Update Service to determine if updates are available.

Only upgrading to a newer version is supported. Reverting or rolling back your cluster to a previous version is not supported. If your upgrade fails, contact Red Hat support.

During the upgrade process, the Machine Config Operator (MCO) applies the new configuration to your cluster machines. The MCO cordons the number of nodes as specified by the maxUnavailable field on the machine configuration pool and marks them as unavailable. By default, this value is set to 1. The MCO then applies the new configuration and reboots the machine.

If you use Red Hat Enterprise Linux (RHEL) machines as workers, the MCO does not update the kubelet because you must update the OpenShift API on the machines first.

With the specification for the new version applied to the old kubelet, the RHEL machine cannot return to the Ready state. You cannot complete the update until the machines are available. However, the maximum number of unavailable nodes is set to ensure that normal cluster operations can continue with that number of machines out of service.

The OpenShift Update Service is composed of an Operator and one or more application instances.

OpenShift Container Platform upgrade channels and releases

In OpenShift Container Platform 4.1, Red Hat introduced the concept of channels for recommending the appropriate release versions for cluster upgrades. By controlling the pace of upgrades, these upgrade channels allow you to choose an upgrade strategy. Upgrade channels are tied to a minor version of OpenShift Container Platform. For instance, OpenShift Container Platform 4.9 upgrade channels recommend upgrades to 4.9 and upgrades within 4.9. They also recommend upgrades within 4.8 and from 4.8 to 4.9, to allow clusters on 4.8 to eventually upgrade to 4.9. They do not recommend upgrades to 4.10 or later releases. This strategy ensures that administrators explicitly decide to upgrade to the next minor version of OpenShift Container Platform.

Upgrade channels control only release selection and do not impact the version of the cluster that you install; the openshift-install binary file for a specific version of OpenShift Container Platform always installs that version.

OpenShift Container Platform 4.9 offers the following upgrade channels:

  • candidate-4.9

  • fast-4.9

  • stable-4.9

  • eus-4.y (only when running an even-numbered 4.y cluster release, like 4.10)

If you do not want the Cluster Version Operator to fetch available updates from the upgrade recommendation service, you can use the oc adm upgrade channel command in the OpenShift CLI to configure an empty channel. This configuration can be helpful if, for example, a cluster has restricted network access and there is no local, reachable upgrade recommendation service.

candidate-4.9 channel

The candidate-4.9 channel contains candidate builds for a z-stream (4.9.z) and previous minor version releases. Release candidates contain all the features of the product but are not supported. Use release candidate versions to test feature acceptance and assist in qualifying the next version of OpenShift Container Platform. A release candidate is any build that is available in the candidate channel, including ones that do not contain a pre-release version such as -rc in their names. After a version is available in the candidate channel, it goes through more quality checks. If it meets the quality standard, it is promoted to the fast-4.9 or stable-4.9 channels. Because of this strategy, if a specific release is available in both the candidate-4.9 channel and in the fast-4.9 or stable-4.9 channels, it is a Red Hat-supported version. The candidate-4.9 channel can include release versions from which there are no recommended updates in any channel.

You can use the candidate-4.9 channel to upgrade from a previous minor version of OpenShift Container Platform.

Release candidates differ from the nightly builds. Nightly builds are available for early access to features, but updating to or from nightly builds is neither recommended nor supported. Nightly builds are not available in any upgrade channel. You can reference the OpenShift Container Platform release statuses for more build information.

fast-4.9 channel

The fast-4.9 channel is updated with new and previous minor versions of 4.9 as soon as Red Hat declares the given version as a general availability release. As such, these releases are fully supported, are production quality, and have performed well while available as a release candidate in the candidate-4.9 channel from where they were promoted. Some time after a release appears in the fast-4.9 channel, it is added to the stable-4.9 channel. Releases never appear in the stable-4.9 channel before they appear in the fast-4.9 channel.

You can use the fast-4.9 channel to upgrade from a previous minor version of OpenShift Container Platform.

stable-4.9 channel

While the fast-4.9 channel contains releases as soon as their errata are published, releases are added to the stable-4.9 channel after a delay. During this delay, data is collected from Red Hat SRE teams, Red Hat support services, and pre-production and production environments that participate in connected customer program about the stability of the release.

You can use the stable-4.9 channel to upgrade from a previous minor version of OpenShift Container Platform.

eus-4.y channel

In addition to the stable channel, all even-numbered minor versions of OpenShift Container Platform offer an Extended Update Support (EUS). These EUS versions extend the Full and Maintenance support phases for customers with Standard and Premium Subscriptions to 18 months.

Although there is no difference between stable-4.y and eus-4.y channels until OpenShift Container Platform 4.y transitions to the EUS phase, you can switch to the eus-4.y channel as soon as it becomes available.

When upgrades to the next EUS channel are offered, you can switch to the next EUS channel and upgrade until you have reached the next EUS version.

This upgrade process does not apply for the eus-4.6 channel.

Upgrade version paths

OpenShift Container Platform maintains an upgrade recommendation service that understands the version of OpenShift Container Platform you have installed as well as the path to take within the channel you choose to get you to the next release.

You can imagine seeing the following in the fast-4.9 channel:

  • 4.9.0

  • 4.9.1

  • 4.9.3

  • 4.9.4

The service recommends only upgrades that have been tested and have no serious issues. It will not suggest updating to a version of OpenShift Container Platform that contains known vulnerabilities. For example, if your cluster is on 4.9.1 and OpenShift Container Platform suggests 4.9.4, then it is safe for you to update from 4.9.1 to 4.9.4. Do not rely on consecutive patch numbers. In this example, 4.9.2 is not and never was available in the channel.

Update stability depends on your channel. The presence of an update recommendation in the candidate-4.9 channel does not imply that the update is supported. It means that no serious issues have been found with the update yet, but there might not be significant traffic through the update to suggest stability. The presence of an update recommendation in the fast-4.9 or stable-4.9 channels at any point is a declaration that the update is supported. While releases will never be removed from a channel, update recommendations that exhibit serious issues will be removed from all channels. Updates initiated after the update recommendation has been removed are still supported.

Red Hat will eventually provide supported update paths from any supported release in the fast-4.9 or stable-4.9 channels to the latest release in 4.9.z, although there can be delays while safe paths away from troubled releases are constructed and verified.

Fast and stable channel use and strategies

The fast-4.9 and stable-4.9 channels present a choice between receiving general availability releases as soon as they are available or allowing Red Hat to control the rollout of those updates. If issues are detected during rollout or at a later time, upgrades to that version might be blocked in both the fast-4.9 and stable-4.9 channels, and a new version might be introduced that becomes the new preferred upgrade target.

Customers can improve this process by configuring pre-production systems on the fast-4.9 channel, configuring production systems on the stable-4.9 channel, and participating in the Red Hat connected customer program. Red Hat uses this program to observe the impact of updates on your specific hardware and software configurations. Future releases might improve or alter the pace at which updates move from the fast-4.9 to the stable-4.9 channel.

Restricted network clusters

If you manage the container images for your OpenShift Container Platform clusters yourself, you must consult the Red Hat errata that is associated with product releases and note any comments that impact upgrades. During upgrade, the user interface might warn you about switching between these versions, so you must ensure that you selected an appropriate version before you bypass those warnings.

Switching between channels

A channel can be switched from the web console or through the adm upgrade channel command:

$ oc adm upgrade channel clusterversion version --type json -p '[{"op": "add", "path": "/spec/channel", "value": "<channel>”}]'

The web console will display an alert if you switch to a channel that does not include the current release. The web console does not recommend any updates while on a channel without the current release. You can return to the original channel at any point, however.

Changing your channel might impact the supportability of your cluster. The following conditions might apply:

  • Your cluster is still supported if you change from the stable-4.9 channel to the fast-4.9 channel.

  • You can switch to the candidate-4.9 channel at any time, but some releases for this channel might be unsupported.

  • You can switch from the candidate-4.9 channel to the fast-4.9 channel if your current release is a general availability release.

  • You can always switch from the fast-4.9 channel to the stable-4.9 channel. There is a possible delay of up to a day for the release to be promoted to stable-4.9 if the current release was recently promoted.

Performing a canary rollout update

In some specific use cases, you might want a more controlled update process where you do not want specific nodes updated concurrently with the rest of the cluster. These use cases include, but are not limited to:

  • You have mission-critical applications that you do not want unavailable during the update. You can slowly test the applications on your nodes in small batches after the update.

  • You have a small maintenance window that does not allow the time for all nodes to be updated, or you have multiple maintenance windows.

The rolling update process is not a typical update workflow. With larger clusters, it can be a time-consuming process that requires you execute multiple commands. This complexity can result in errors that can affect the entire cluster. It is recommended that you carefully consider whether your organization wants to use a rolling update and carefully plan the implementation of the process before you start.

The rolling update process described in this topic involves:

  • Creating one or more custom machine config pools (MCPs).

  • Labeling each node that you do not want to update immediately to move those nodes to the custom MCPs.

  • Pausing those custom MCPs, which prevents updates to those nodes.

  • Performing the cluster update.

  • Unpausing one custom MCP, which triggers the update on those nodes.

  • Testing the applications on those nodes to make sure the applications work as expected on those newly-updated nodes.

  • Optionally removing the custom labels from the remaining nodes in small batches and testing the applications on those nodes.

Pausing an MCP prevents the Machine Config Operator from applying any configuration changes on the associated nodes. Pausing an MCP also prevents any automatically-rotated certificates from being pushed to the associated nodes, including the automatic CA rotation of the kube-apiserver-to-kubelet-signer CA certificate. If the MCP is paused when the kube-apiserver-to-kubelet-signer CA certificate expires and the MCO attempts to automatically renew the certificate, the new certificate is created but not applied across the nodes in the respective machine config pool. This causes failure in multiple oc commands, including but not limited to oc debug, oc logs, oc exec, and oc attach. Pausing an MCP should be done with careful consideration about the kube-apiserver-to-kubelet-signer CA certificate expiration and for short periods of time only.

If you want to use the canary rollout update process, see Performing a canary rollout update.

Pausing a MachineHealthCheck resource

During the upgrade process, nodes in the cluster might become temporarily unavailable. In the case of worker nodes, the machine health check might identify such nodes as unhealthy and reboot them. To avoid rebooting such nodes, pause all the MachineHealthCheck resources before updating the cluster.

Prerequisites
  • Install the OpenShift CLI (oc).

Procedure
  1. To list all the available MachineHealthCheck resources that you want to pause, run the following command:

    $ oc get machinehealthcheck -n openshift-machine-api
  2. To pause the machine health checks, add the cluster.x-k8s.io/paused="" annotation to the MachineHealthCheck resource. Run the following command:

    $ oc -n openshift-machine-api annotate mhc <mhc-name> cluster.x-k8s.io/paused=""

    The annotated MachineHealthCheck resource resembles the following YAML file:

    apiVersion: machine.openshift.io/v1beta1
    kind: MachineHealthCheck
    metadata:
      name: example
      namespace: openshift-machine-api
      annotations:
        cluster.x-k8s.io/paused: ""
    spec:
      selector:
        matchLabels:
          role: worker
      unhealthyConditions:
      - type:    "Ready"
        status:  "Unknown"
        timeout: "300s"
      - type:    "Ready"
        status:  "False"
        timeout: "300s"
      maxUnhealthy: "40%"
    status:
      currentHealthy: 5
      expectedMachines: 5

    Resume the machine health checks after updating the cluster. To resume the check, remove the pause annotation from the MachineHealthCheck resource by running the following command:

    $ oc -n openshift-machine-api annotate mhc <mhc-name> cluster.x-k8s.io/paused-

About updating single node OpenShift Container Platform

You can update, or upgrade, a single-node OpenShift Container Platform cluster by using either the console or CLI.

However, note the following limitations:

  • The prerequisite to pause the MachineHealthCheck resources is not required because there is no other node to perform the health check.

  • Restoring a single-node OpenShift Container Platform cluster using an etcd backup is not officially supported. However, it is good practice to perform the etcd backup in case your upgrade fails. If your control plane is healthy, you might be able to restore your cluster to a previous state by using the backup.

  • Updating a single-node OpenShift Container Platform cluster requires downtime and can include an automatic reboot. The amount of downtime depends on the update payload, as described in the following scenarios:

    • If the update payload contains an operating system update, which requires a reboot, the downtime is significant and impacts cluster management and user workloads.

    • If the update contains machine configuration changes that do not require a reboot, the downtime is less, and the impact on the cluster management and user workloads is lessened. In this case, the node draining step is skipped with single-node OpenShift Container Platform because there is no other node in the cluster to reschedule the workloads to.

    • If the update payload does not contain an operating system update or machine configuration changes, a short API outage occurs and resolves quickly.

There are conditions, such as bugs in an updated package, that can cause the single node to not restart after a reboot. In this case, the update does not rollback automatically.

Additional resources

Updating a cluster by using the web console

If updates are available, you can update your cluster from the web console.

You can find information about available OpenShift Container Platform advisories and updates in the errata section of the Customer Portal.

Prerequisites
  • Have access to the web console as a user with admin privileges.

  • Pause all MachineHealthCheck resources.

Procedure
  1. From the web console, click AdministrationCluster Settings and review the contents of the Details tab.

  2. For production clusters, ensure that the Channel is set to the correct channel for your current minor version, such as stable-4.9.

    For production clusters, you must subscribe to a stable-* or fast-* channel.

    • If the Update status is not Updates available, you cannot upgrade your cluster.

    • Select channel indicates the cluster version that your cluster is running or is updating to.

  3. Select the highest available version and click Save.

    The Input channel Update status changes to Update to <product-version> in progress, and you can review the progress of the cluster update by watching the progress bars for the Operators and nodes.

    If you are upgrading your cluster to the next minor version, like version 4.y to 4.(y+1), it is recommended to confirm your nodes are upgraded before deploying workloads that rely on a new feature. Any pools with worker nodes that are not yet updated are displayed on the Cluster Settings page.

  4. After the update completes and the Cluster Version Operator refreshes the available updates, check if more updates are available in your current channel.

    • If updates are available, continue to perform updates in the current channel until you can no longer update.

    • If no updates are available, change the Channel to the stable-* or fast-* channel for the next minor version, and update to the version that you want in that channel.

    You might need to perform several intermediate updates until you reach the version that you want.

Changing the update server by using the web console

Changing the update server is optional. If you have an OpenShift Update Service (OSUS) installed and configured locally, you must set the URL for the server as the upstream to use the local server during updates.

Procedure
  1. Navigate to AdministrationCluster Settings, click version.

  2. Click the YAML tab and then edit the upstream parameter value:

    Example output
      ...
      spec:
        clusterID: db93436d-7b05-42cc-b856-43e11ad2d31a
        upstream: '<update-server-url>' (1)
      ...
    1 The <update-server-url> variable specifies the URL for the update server.

    The default upstream is https://api.openshift.com/api/upgrades_info/v1/graph.

  3. Click Save.