×

Red Hat OpenShift Container Platform provides developers and IT organizations with a hybrid cloud application platform for deploying both new and existing applications on secure, scalable resources with minimal configuration and management overhead. OpenShift Container Platform supports a wide selection of programming languages and frameworks, such as Java, JavaScript, Python, Ruby, and PHP.

Built on Red Hat Enterprise Linux (RHEL) and Kubernetes, OpenShift Container Platform provides a more secure and scalable multi-tenant operating system for today’s enterprise-class applications, while delivering integrated application runtimes and libraries. OpenShift Container Platform enables organizations to meet security, privacy, compliance, and governance requirements.

About this release

OpenShift Container Platform (RHSA-2021:3759) is now available. This release uses Kubernetes 1.22 with CRI-O runtime. New features, changes, and known issues that pertain to OpenShift Container Platform 4.9 are included in this topic.

OpenShift Container Platform 4.9 clusters are available at https://cloud.redhat.com/openshift. The {cloud-redhat-com} application for OpenShift Container Platform allows you to deploy OpenShift clusters to either on-premise or cloud environments.

OpenShift Container Platform 4.9 is supported on Red Hat Enterprise Linux (RHEL) 7.9 and 8.4, as well as on Red Hat Enterprise Linux CoreOS (RHCOS) 4.9.

You must use RHCOS machines for the control plane, and you can use either RHCOS or Red Hat Enterprise Linux (RHEL) 7.9 or 8.4 for compute machines.

OpenShift Container Platform layered and dependent component support and compatibility

The scope of support for layered and dependent components of OpenShift Container Platform changes independently of the OpenShift Container Platform version. To determine the current support status and compatibility for an add-on, refer to its release notes. For more information, see the Red Hat OpenShift Container Platform Life Cycle Policy.

New features and enhancements

This release adds improvements related to the following components and concepts.

Red Hat Enterprise Linux CoreOS (RHCOS)

Installation Ignition config is removed upon boot

Nodes installed with the coreos-installer program previously retained the installation Ignition config in the /boot/ignition/config.ign file. Starting with the OpenShift Container Platform 4.9 installation image, that file is removed when the node is provisioned. This change does not affect clusters that were installed on previous OpenShift Container Platform versions because they still use an older bootimage.

Installation and upgrade

Installing a cluster on Microsoft Azure Stack Hub using user-provisioned infrastructure

OpenShift Container Platform 4.9 introduces support for installing a cluster on Azure Stack Hub using user-provisioned infrastructure.

You can incorporate example Azure Resource Manager (ARM) templates provided by Red Hat to assist in the deployment process, or create your own. You are also free to create the required resources through other methods; the ARM templates are just an example.

Pausing machine health checks before updating the cluster

During the upgrade process, nodes in the cluster might become temporarily unavailable. In the case of worker nodes, the machine health check might identify such nodes as unhealthy and reboot them. To avoid rebooting such nodes, OpenShift Container Platform 4.9 introduces the cluster.x-k8s.io/paused="" annotation to let you pause the MachineHealthCheck resources before updating the cluster.

For more information, see Pausing a MachineHealthCheck resource.

Increased size of Azure subnets within the machine CIDR

The OpenShift Container Platform installation program for Microsoft Azure now creates subnets as large as possible within the machine CIDR. This lets the cluster use a machine CIDR that is appropriately sized to accommodate the number of nodes in the cluster.

Support for AWS regions in China

OpenShift Container Platform 4.9 introduces support for AWS regions in China. You can now install and update OpenShift Container Platform clusters in the cn-north-1 (Beijing) and cn-northwest-1(Ningxia) regions.

For more information, see Installing a cluster on AWS China.

Expanding the cluster with Virtual Media on the baremetal network

In OpenShift Container Platform 4.9, you can expand an installer provisioned cluster deployed using the provisioning network by using Virtual Media on the baremetal network. You can use this feature when the ProvisioningNetwork configuration setting is set to Managed. To use this feature, you must set the virtualMediaViaExternalNetwork configuration setting to true in the provisioning custom resource (CR). You must also edit the machine set to use the API VIP address. See Preparing to deploy with Virtual Media on the baremetal network for details.

Required administrator acknowledgment when upgrading from OpenShift Container Platform 4.8 to 4.9

OpenShift Container Platform 4.9 uses Kubernetes 1.22, which removed a significant number of deprecated v1beta1 APIs.

OpenShift Container Platform 4.8.14 introduced a requirement that an administrator must provide a manual acknowledgment before the cluster can be upgraded from OpenShift Container Platform 4.8 to 4.9. This is to help prevent issues after upgrading to OpenShift Container Platform 4.9, where APIs that have been removed are still in use by workloads, tools, or other components running on or interacting with the cluster. Administrators must evaluate their cluster for any APIs in use that will be removed and migrate the affected components to use the appropriate new API version. After this is done, the administrator can provide the administrator acknowledgment.

All OpenShift Container Platform 4.8 clusters require this administrator acknowledgment before they can be upgraded to OpenShift Container Platform 4.9.

Support for installation on RHOSP deployments that use PCI passthrough

OpenShift Container Platform 4.9 introduces support for installation on Red Hat OpenStack Platform (RHOSP) deployments that rely on PCI passthrough.

Upgrading etcd version 3.4 to 3.5

OpenShift Container Platform 4.9 supports etcd 3.5. Before you upgrade the cluster, verify that a valid etcd backup exists. An etcd backup ensures that the cluster can be restored if an upgrade failure occurs. In OpenShift Container Platform 4.9, etcd upgrades are automatic. Depending on the cluster’s transition state to version 4.9, an etcd backup might be available. However, verifying that a backup exists before the cluster upgrade starts is recommended.

Installing a cluster on IBM Cloud using installer-provisioned infrastructure

OpenShift Container Platform 4.9 introduces support for installing a cluster on IBM Cloud® using installer-provisioned infrastructure. The procedure is nearly identical to installer-provisioned infrastructure on bare metal with these differences:

  • Installer-provisioned installation of OpenShift Container Platform 4.9 on IBM Cloud requires the provisioning network, IPMI, and PXE boot. Red Hat does not support deployment with Redfish and virtual media on IBM Cloud.

  • You must create and configure public and private VLANs on the IBM Cloud.

  • IBM Cloud nodes must be available before starting the installation process. So you must create the IBM Cloud nodes first.

  • You must prepare the provisioner node.

  • You must install and configure a DHCP server on the public baremetal network.

  • You must configure the install-config.yaml file so that each node points to the BMC using IPMI, and sets the IPMI privilege level to OPERATOR.

Improved support for Fujitsu hardware on installer-provisioned clusters

OpenShift Container Platform 4.9 adds BIOS configuration support for worker nodes when deploying installer-provisioned clusters on Fujitsu hardware and using the Fujitsu integrated Remote Management Controller (iRMC). See Configuring BIOS for worker node for details.

Web console

Accessing node logs from the Node page

With this update, administrators now have the ability to access node logs from the Node page. To review the node logs, you can switch between individual log files and journal log units by clicking the Logs tab.

Break down cluster utilization by node type

You now have the ability to filter by node type in the Cluster utilization card on the cluster dashboard. Additional node types will appear in the list when created.

User preferences

This update adds a User Preferences page for customizing settings, such as default project, perspective, and topology view.

Hide default projects from project list

With this update, you can hide default projects from the Projects dropdown in the web console masthead. You can still toggle to show default projects before you search and filter.

Adding user preferences in the web console

With this update, you can now add user preferences in the web console. Users can select their default perspective, project, topology, and other preferences.

Developer perspective

  • You can now import a devfile, a Dockerfile, or a builder image through your Git repository to further customize your deployment. You can also edit the file import type and select a different strategy for importing the file.

  • You can now add tasks in a pipeline using Add task and Quick Search using the updated user interface of the Pipeline builder in the developer console. This enhanced experience allows users to add tasks from the Tekton Hub.

  • To edit your build configurations, you use the Edit BuildConfig option in the Builds view of the Developer perspective. Users can use a Form view and a YAML view to edit the build configurations.

  • You can use the context menu in the topology Graph view to add services or create a connection with operator-backed services to the projects.

  • You can use the +Add actions in the context menu of the topology Graph view to add services or remove a service in the application group.

  • Initial support for pipeline as code is now available in the Pipelines Repository list view, enabled by the OpenShift Pipelines Operator.

  • Usability enhancement have been made to the Application Monitoring section in the Observe page of the topology.

IBM Z and LinuxONE

With this release, IBM Z and LinuxONE are now compatible with OpenShift Container Platform 4.9. The installation can be performed with z/VM or RHEL KVM. For installation instructions, see the following documentation:

Notable enhancements

The following new features are supported on IBM Z and LinuxONE with OpenShift Container Platform 4.9:

  • Helm

  • Support for multiple network interfaces

  • Service Binding Operator

Supported features

The following features are also supported on IBM Z and LinuxONE:

  • Currently, the following Operators are supported:

    • Cluster Logging Operator

    • NFD Operator

    • OpenShift Elasticsearch Operator

    • Local Storage Operator

    • Service Binding Operator

  • Encrypting data stored in etcd

  • Multipathing

  • Persistent storage using iSCSI

  • Persistent storage using local volumes (Local Storage Operator)

  • Persistent storage using hostPath

  • Persistent storage using Fibre Channel

  • Persistent storage using Raw Block

  • OVN-Kubernetes

  • Three-node cluster support

  • z/VM Emulated FBA devices on SCSI disks

  • 4K FCP block device

These features are available only for OpenShift Container Platform on IBM Z and LinuxONE for 4.9:

  • HyperPAV enabled on IBM Z and LinuxONE for the virtual machines for FICON attached ECKD storage

Restrictions

Note the following restrictions for OpenShift Container Platform on IBM Z and LinuxONE:

  • The following OpenShift Container Platform Technology Preview features are unsupported:

    • Precision Time Protocol (PTP) hardware

  • The following OpenShift Container Platform features are unsupported:

    • Automatic repair of damaged machines with machine health checking

    • CodeReady Containers (CRC)

    • Controlling overcommit and managing container density on nodes

    • CSI volume cloning

    • CSI volume snapshots

    • FIPS cryptography

    • Multus CNI plug-in

    • NVMe

    • OpenShift Metering

    • OpenShift Virtualization

    • Tang mode disk encryption during OpenShift Container Platform deployment

  • Worker nodes must run Red Hat Enterprise Linux CoreOS (RHCOS)

  • Persistent shared storage must be provisioned by using either NFS or other supported storage protocols

  • Persistent non-shared storage must be provisioned using local storage, like iSCSI, FC, or using LSO with DASD, FCP, or EDEV/FBA

IBM Power Systems

With this release, IBM Power Systems are now compatible with OpenShift Container Platform 4.9. For installation instructions, see the following documentation:

Notable enhancements

The following new features are supported on IBM Power Systems with OpenShift Container Platform 4.9:

  • Helm

  • Support for Power10

  • Support for multiple network interfaces

  • Service Binding Operator

Supported features

The following features are also supported on IBM Power Systems:

  • Currently, the following Operators are supported:

    • Cluster Logging Operator

    • NFD Operator

    • OpenShift Elasticsearch Operator

    • Local Storage Operator

    • SR-IOV Network Operator

    • Service Binding Operator

  • Multipathing

  • Persistent storage using iSCSI

  • Persistent storage using local volumes (Local Storage Operator)

  • Persistent storage using hostPath

  • Persistent storage using Fibre Channel

  • Persistent storage using Raw Block

  • OVN-Kubernetes

  • 4K Disk Support

  • NVMe

  • Encrypting data stored in etcd

  • Three-node cluster support

  • Multus SR-IOV

Restrictions

Note the following restrictions for OpenShift Container Platform on IBM Power Systems:

  • The following OpenShift Container Platform Technology Preview features are unsupported:

    • Precision Time Protocol (PTP) hardware

  • The following OpenShift Container Platform features are unsupported:

    • Automatic repair of damaged machines with machine health checking

    • CodeReady Containers (CRC)

    • Controlling overcommit and managing container density on nodes

    • FIPS cryptography

    • OpenShift Metering

    • OpenShift Virtualization

    • Tang mode disk encryption during OpenShift Container Platform deployment

  • Worker nodes must run Red Hat Enterprise Linux CoreOS (RHCOS)

  • Persistent storage must be of the Filesystem type that uses local volumes, Network File System (NFS), or Container Storage Interface (CSI)

Security and compliance

Configuring the audit log policy with custom rules

You now have more fine-grained control over the audit logging level for OpenShift Container Platform. You can use custom rules to specify a different audit policy profile (Default, WriteRequestBodies, AllRequestBodies, or None) for different groups.

Disabling audit logging

You can now disable audit logging for OpenShift Container Platform by using the None audit policy profile.

It is not recommended to disable audit logging unless you are fully aware of the risks of not logging data that can be beneficial when troubleshooting issues. If you disable audit logging and a support situation arises, you might need to enable audit logging and reproduce the issue in order to troubleshoot properly.

For more information, see Disabling audit logging.

Customizing the OAuth server URL

You can now customize the URL for the internal OAuth server. For more information, see Customizing the internal OAuth server URL.

Network-Bound Disk Encryption (NBDE)

OpenShift Container Platform 4.9 provides new procedures for ongoing maintenance of NBDE-configured systems. NBDE allows you to encrypt root volumes of hard drives on physical and virtual machines without having to manually enter a password when restarting machines. For more information, see About disk encryption technology.

etcd

Automatic rotation of etcd certificates

In OpenShift Container Platform 4.9, etcd certificates are automatically rotated and are managed by the system.

Additional TLS security profile setting on the API server

The Kubernetes API server TLS security profile setting is now also honored by etcd.

Networking

Enhancements to linuxptp services

OpenShift Container Platform 4.9 introduces the following updates to PTP:

  • New ptp4lConf field

  • New option to configure linuxptp services as a boundary clock

Monitoring PTP fast events with the PTP fast event notification framework

Fast event notifications for PTP events are now available for bare-metal clusters. The PTP Operator generates event notifications for every configured PTP-capable network interface. Events are made available through a REST API for applications running on the same node. Fast event notifications are transported by an Advanced Message Queuing Protocol (AMQP) message bus provided by the AMQ Interconnect Operator.

OVN-Kubernetes cluster network provider egress IP feature balances across nodes

The egress IP feature of OVN-Kubernetes now balances network traffic approximately equally across nodes for a given namespace, if that namespace is assigned multiple egress IP addresses. Each IP address must reside on a different node. For more information, refer to Configuring egress IPs for a project for OVN-Kubernetes.

SR-IOV containerized Data Plane Development Kit (DPDK) is GA

The containerized Data Plane Development Kit (DPDK) is now GA in OpenShift Container Platform 4.9. For more information, see Using virtual functions (VFs) with DPDK and RDMA modes.

SR-IOV support for using vhost-net with Fast Datapath DPDK applications

SR-IOV now supports vhost-net for use with Fast Datapath DPDK applications on Intel and Mellanox NICs. You can enable this feature by configuring the SriovNetworkNodePolicy resource. For more information, see SR-IOV network node configuration object.

SR-IOV support for single node clusters

Single node clusters support SR-IOV hardware and the SR-IOV Network Operator. Be aware that configuring an SR-IOV network device causes the single node to reboot and that you must configure the disableDrain field for the Operator. For more information, see Configuring the SR-IOV Network Operator.

Supported hardware for SR-IOV

OpenShift Container Platform 4.9 adds support for additional Broadcom and Intel hardware.

  • Broadcom BCM57414 and BCM57508

  • Intel E810-CQDA2, E810-XXVDA2, and E810-XXVDA4

For more information, see the supported devices.

MetalLB load balancer

This release introduces the MetalLB Operator. After installing and configuring the MetalLB Operator, you can deploy MetalLB to provide a native load balancer implementation for services on bare-metal clusters. Other on-premise infrastructures that are like bare metal can also benefit.

The Operator introduces a custom resource, AddressPool. You configure address pools with ranges of IP addresses that MetalLB can assign to services. When you add a service of type LoadBalancer, MetalLB assigns an IP address from a pool.

For this release, Red Hat only supports using MetalLB in layer 2 mode.

For more information, see About MetalLB and the MetalLB Operator.

CNI VRF plug-in is generally available

The CNI VRF plug-in was previously introduced as a Technology Preview feature in OpenShift Container Platform 4.7 and is now generally available in OpenShift Container Platform 4.9.

For more information, see Assigning a secondary network to a VRF.

Ingress controller timeout configuration parameters

This release introduces six timeout configurations for the Ingress Controller tuningOptions parameter:

  • clientTimeout specifies how long a connection is held open while waiting for a client response.

  • serverFinTimeout specifies how long a connection is held open while waiting for the server response to the client that is closing the connection.

  • serverTimeout specifies how long a connection is held open while waiting for a server response.

  • clientFinTimeout specifies how long a connection is held open while waiting for the client response to the server closing the connection.

  • tlsInspectDelay specifies how long the router can hold data to find a matching route.

  • tunnelTimeout specifies how long a tunnel connection, including WebSocket connections, remains open while the tunnel is idle.

For more information, see Ingress controller configuration parameters.

Mutual TLS Authentication

You can now configure the Ingress Controller to enable mutual TLS (mTLS) authentication by setting spec.clientTLS. The clientTLS field specifies configuration for the Ingress Controller to verify client certificates.

For more information, see Configuring Mutual TLS Authentication.

Customizing HAProxy error code response pages

Cluster administrators can specify a custom HTTP error code response page for either 503, 404, or both error pages.

The provisioningNetworkInterface configuration setting is optional

In OpenShift Container Platform 4.9, the provisioningNetworkInterface configuration setting for installer-provisioned clusters is optional. The provisioningNetworkInterface configuration setting identifies the NIC name used for the provisioning network. In OpenShift Container Platform 4.9, you can alternatively specify the bootMACAddress configuration setting in the install-config.yml file, which enables Ironic to identify the IP address for the NIC connected to the provisioning network and bind to it. You can also omit the provisioningInterface configuration setting in the provisioning custom resource so that the provisioning custom resource uses the bootMACAddress configuration setting instead.

DNS Operator managementState

In OpenShift Container Platform 4.9, you can now change the DNS Operator managementState. The managementState of the DNS Operator is set to Managed by default, which means that the DNS Operator is actively managing its resources. You can change it to Unmanaged, which means the DNS Operator is not managing its resources.

The following are use cases for changing the DNS Operator managementState:

  • You are a developer and want to test a configuration change to see if it fixes an issue in CoreDNS. You can stop the DNS Operator from overwriting the change by setting the managementState to Unmanaged.

  • You are a cluster administrator and have reported an issue with CoreDNS, but need to apply a workaround until the issue is fixed. You can set the managementState field of the DNS Operator to Unmanaged to apply the workaround.

For more information, see Changing the DNS Operator managementState.

Load balancer configuration as a cloud provider option for clusters on RHOSP

For clusters that run on RHOSP, you can now configure Octavia for load balancing as a cloud provider option.

For more information, see Setting cloud provider options.

Support added for TLS 1.3 and the Modern profile

This release adds Ingress Controller support for TLS 1.3 and the Modern profile in HAProxy.

For more information, see Ingress Controller TLS security profiles.

Global admission plug-in for HTTP Strict Transport Security requirements

Cluster administrators can configure HTTP Strict Transport Security (HSTS) verification on a per-domain basis with the addition of an admission plug-in for the router, called route.openshift.io/RequiredRouteAnnotations. If a cluster administrator configures this plug-in to enforce HSTS, then any newly created route must be configured with a compliant HSTS Policy, which is verified against the global setting on the cluster Ingress configuration, called ingresses.config.openshift.io/cluster.

For more information, see HTTP Strict Transport Security.

Ingress empty requests policy

In OpenShift Container Platform 4.9 you can now configure the Ingress Controller to log or ignore empty requests by setting the logEmptyRequests and HTTPEmptyRequestsPolicy fields.

For more information, see Ingress controller configuration parameters.

Create network policies in the web console

Logging in to the web console with the cluster-admin role now enables you to create new network policies in any namespace in the cluster from a form in the console. Previously, this could only be done directly in YAML.

Storage

Persistent storage using AWS EBS CSI driver operator is generally available

OpenShift Container Platform is capable of provisioning persistent volumes (PVs) using the Container Storage Interface (CSI) driver for AWS Elastic Block Store (EBS). This feature was previously introduced as a Technology Preview feature in OpenShift Container Platform 4.5 and is now generally available and enabled by default in OpenShift Container Platform 4.9.

For more information, see AWS EBS CSI Driver Operator.

Persistent storage using the Azure Stack Hub CSI Driver Operator (general availability)

OpenShift Container Platform is capable of provisioning PVs using the CSI driver for Azure Stack Hub Storage. Azure Stack Hub, which is part of the Azure Stack portfolio, allows you to run apps in an on-premises environment and deliver Azure services in your datacenter. The Azure Stack Hub CSI Driver Operator that manages this driver is new for 4.9 and generally available.

For more information, see Azure Stack Hub CSI Driver Operator.

Persistent storage using the AWS EFS CSI Driver Operator (Technology Preview)

OpenShift Container Platform is capable of provisioning PVs using the CSI driver for AWS Elastic File Service (EFS). The AWS EFS CSI Driver Operator that manages this driver is in Technology Preview.

For more information, see AWS EFS CSI Driver Operator.

Automatic CSI migration supports GCE (Technology Preview)

Starting with OpenShift Container Platform 4.8, automatic migration for in-tree volume plug-ins to their equivalent CSI drivers became available as a Technology Preview feature. This feature now supports automatic migration from Google Compute Engine Persistent Disk (GCE PD) in-tree plug-in to the Google Cloud Platform (GCP) Persistent Disk CSI driver.

For more information, see CSI Automatic Migration.

Automatic CSI migration supports Azure Disk (Technology Preview)

Starting with OpenShift Container Platform 4.8, automatic migration for in-tree volume plug-ins to their equivalent CSI drivers became available as a Technology Preview feature. This feature now supports automatic migration from the Azure Disk in-tree plug-in to the Azure Disk CSI driver.

For more information, see CSI Automatic Migration.

VMWare vSphere CSI Driver Operator creates storage policy automatically (Technology Preview)

The vSphere CSI Operator Driver storage class now uses vSphere’s storage policy. OpenShift Container Platform automatically creates a storage policy that targets datastore configured in cloud configuration.

For more information, see VMWare vSphere CSI Driver Operator.

New metrics provided for Local Storage Operator

OpenShift Container Platform 4.9 provides the following new metrics for the Local Storage Operator:

  • lso_discovery_disk_count: total number of discovered devices on each node

  • lso_lvset_provisioned_PV_count: total number of PVs created by LocalVolumeSet objects

  • lso_lvset_unmatched_disk_count: total number of disks that Local Storage Operator did not select for provisioning because of mismatching criteria

  • lso_lvset_orphaned_symlink_count: number of devices with PVs that no longer match LocalVolumeSet object criteria

  • lso_lv_orphaned_symlink_count: number of devices with PVs that no longer match LocalVolume object criteria

  • lso_lv_provisioned_PV_count: total number of provisioned PVs for LocalVolume

For more information, see Persistent storage using local volumes.

oVirt CSI driver resizing feature is now available

OpenShift Container Platform 4.9 adds resizing capability to the oVirt CSI Driver, which allows users to increase the size of their existing persistent volume claims (PVCs). Prior to this feature, users had to create new PVCs with the increased size, and move all of the content from the old persistent volume (PV) to the new PV, which could result in data loss. Now, users can edit the existing PVC and the oVirt CSI Driver will resize the underlying oVirt disk.

Registry

Image Registry uses Azure Blob Storage on Azure Stack Hub installations

In OpenShift Container Platform 4.9, the integrated Image Registry uses Azure Blob Storage for clusters installed on Microsoft Azure Stack Hub using user-provisioned infrastructure.

Operator lifecycle

The following new features and enhancements relate to running Operators with Operator Lifecycle Manager (OLM).

Operator Lifecycle Manager upgraded to Kubernetes 1.22

Starting in OpenShift Container Platform 4.9, Operator Lifecycle Manager (OLM) supports Kubernetes 1.22. As a result, a significant number of v1beta1 APIs have been removed and updated to v1. Operators that depend on the removed v1beta1 APIs will not run on OpenShift Container Platform 4.9. Cluster administrators should upgrade their installed Operators to the latest channel before upgrading a cluster to OpenShift Container Platform 4.9.

Kubernetes 1.22 introduces several notable changes to v1 of the CustomResorceDefinition API.

File-based catalogs

File-based catalogs are the latest iteration of the catalog format in Operator Lifecycle Manager (OLM). The format is a plain text-based (JSON or YAML) and declarative config evolution of the earlier, and now deprecated, SQLite database format, and it is fully backwards compatible. The goal of this format is to enable Operator catalog editing, composability, and extensibility.

For more information about the file-based catalog specification, see Operator Framework packaging format.

For instructions about creating file-based catalogs by using the opm CLI, see Managing custom catalogs.

Operator Lifecycle Manager support for Single Node OpenShift

Operator Lifecycle Manager (OLM) is now available on Single Node OpenShift (SNO) clusters, enabling self-service Operator installations.

Enhanced error reporting for cluster administrators

Because administrators should not require an understanding of the interaction process between the various low-level APIs or access to the Operator Lifecycle Manager (OLM) pod logs to successfully debug such issues, OpenShift Container Platform 4.9 introduces the following enhancements in OLM to provide administrators with more comprehensible error reporting and messages:

Updating Operator group status conditions

Previously, if a namespace contained multiple Operator groups or could not find a service account, the status of the Operator group would not report an error. With this enhancement, these scenarios now update the status condition of the Operator group to report an error.

Indicating the reason for install plan failures

Before this release, if an install plan failed, the subscription condition would not state why the failure occurred. Now, if an install plan fails, the subscription status condition indicates the reason for the failure.

Indicating resolution conflicts on subscription statuses

Because dependency resolution treats all components in a namespace as a single unit, if a resolution failure occurs, all subscriptions on the namespace now indicate the error.

Image template for custom catalog sources

To avoid cluster upgrades potentially leaving Operator installations in an unsupported state or without a continued update path, you can enable automatically changing your Operator catalog’s index image version as part of cluster upgrades.

Set the olm.catalogImageTemplate annotation to your catalog image name and use one or more of the Kubernetes cluster version variables when constructing the template for the image tag.

For more information, see Image template for custom catalog sources.

Operator development

The following new features and enhancements relate to developing Operators with the Operator SDK.

High-availability or single node cluster detection and support

An OpenShift Container Platform cluster can be configured in high-availability (HA) mode, which uses multiple nodes, or in non-HA mode, which uses a single node. A single node cluster, also known as Single Node OpenShift (SNO), is likely to have more conservative resource constraints. Therefore, it is important that Operators installed on a single node cluster can adjust accordingly and still run well.

By accessing the cluster high-availability mode API provided in OpenShift Container Platform, Operator authors can use the Operator SDK to enable their Operator to detect a cluster’s infrastructure topology, either HA or non-HA mode. Custom Operator logic can be developed that uses the detected cluster topology to automatically switch the resource requirements, both for the Operator and for any Operands or workloads it manages, to a profile that best fits the topology.

Operator support for network proxies

Operator authors can now develop Operators that support network proxies. Operators with proxy support inspect the Operator deployment for environment variables and pass the variables on to the required Operands. Cluster administrators configure proxy support for the environment variables that are handled by Operator Lifecycle Manager (OLM). For more information, see the Operator SDK tutorials for developing Operators using Go, Ansible, and Helm.

Validating bundle manifests for APIs removed from Kubernetes 1.22

You can now check bundle manifests for APIs removed from Kubernetes 1.22 by using the Operator Framework suite of tests with the bundle validate subcommand.

For example:

$ operator-sdk bundle validate .<bundle_dir_or_image> \
  --select-optional suite=operatorframework \
  --optional-values=k8s-version=1.22

If your bundle manifest includes APIs removed from Kubernetes 1.22, the command displays a warning message. The warning message indicates which APIs you need to migrate and links to the Kubernetes API migration guide.

Builds

As a developer using OpenShift Container Platform for builds, with this update, you can use the following new capabilities:

  • You can mount build volumes to give running builds access to information that you do not want to persist in the output container image. Build volumes can provide sensitive information, such as repository credentials, which the build environment or configuration only needs at build-time. Build volumes are different from build inputs, whose data can persist in the output container image.

  • You can configure image changes to trigger builds based on information recorded in the BuildConfig status. This way, you can use ImageChange triggers with builds in a GitOps workflow.

Images

Wildcard domains as registry sources

This release introduces support for using wildcard domains as registry sources in your image registry settings. With a wildcard domain, such as *.example.com, you can set your cluster to push and pull images from multiple subdomains without having to manually enter each one. For more information, see Image controller configuration parameters.

Machine API

Red Hat Enterprise Linux (RHEL) 8 now supported for compute machines

Starting in OpenShift Container Platform 4.9, you can now use Red Hat Enterprise Linux (RHEL) 8.4 for compute machines. Previously, RHEL 8 was not supported for compute machines.

You cannot upgrade RHEL 7 compute machines to RHEL 8. You must deploy new RHEL 8 hosts, and the old RHEL 7 hosts should be removed.

Nodes

Scheduler profiles GA

Scheduling pods using a scheduler profile is now generally available. This is a replacement for configuring a scheduler policy. The following scheduler profiles are available:

  • LowNodeUtilization: This profile attempts to spread pods evenly across nodes to get low resource usage per node.

  • HighNodeUtilization: This profile attempts to place as many pods as possible onto as few nodes as possible, to minimize node count with high usage per node.

  • NoScoring: This is a low-latency profile that strives for the quickest scheduling cycle by disabling all score plug-ins. This might sacrifice better scheduling decisions for faster ones.

For more information, see Scheduling pods using a scheduler profile.

New descheduler profiles and customization

The following descheduler profiles are now available:

  • SoftTopologyAndDuplicates: This profile is the same as TopologyAndDuplicates, except that pods with soft topology constraints, such as whenUnsatisfiable: ScheduleAnyway, are also considered for eviction.

  • EvictPodsWithLocalStorage: This profile allows pods with local storage to be eligible for eviction.

  • EvictPodsWithPVC: This profile allows pods with persistent volume claims to be eligible for eviction.

You can also customize the pod lifetime value for the LifecycleAndUtilization profile.

For more information, see Evicting pods using the descheduler.

Multiple logins to the same registry

When configuring the docker/config.json file to allow pods to pull images from private registries, you can now list specific repositories in the same registry, each with credentials specific to that registry path. Previously, you could list only one repository from a given registry. You can also now define a registry with a specific namespace.

Enhanced monitoring of node resources

Node-related metrics and alerts have been enhanced to give you an earlier indication of when the stability of a node is compromised.

Deploy node health checks with the Node Health Check Operator (Technology Preview)

You can use the Node Health Check Operator to deploy the NodeHealthCheck controller. The controller identifies unhealthy nodes and uses the Poison Pill Operator to remediate the unhealthy nodes.

Red Hat OpenShift Logging

In OpenShift Container Platform 4.7, Cluster Logging became Red Hat OpenShift Logging. For more information, see Release notes for Red Hat OpenShift Logging.

Monitoring

The monitoring stack for this release includes the following new and modified features.

Monitoring stack components and dependencies

Updates to versions of monitoring stack components and dependencies include the following:

  • Prometheus to 2.29.2

  • The Prometheus Operator to 0.49.0

  • The Prometheus Adapter to 0.9.0

  • Alertmanager to 0.22.2

  • Thanos to 0.22.0

Alerting rules

  • New

    • HighlyAvailableWorkloadIncorrectlySpread informs you about a potential problem when two instances of a highly available monitoring component are running on the same node and have persistent volumes attached.

    • NodeFileDescriptorLimit triggers an alert when a node kernel is running out of available file descriptors. A warning level alert fires at greater than 70% usage, and a critical level alert fires at greater than 90% usage.

    • PrometheusLabelLimitHit detects when a target exceeds the defined label limits.

    • PrometheusTargetSyncFailure detects when Prometheus fails to synchronize targets.

    • All critical alerting rules contain links to runbooks.

  • Enhanced

    • AlertmanagerReceiversNotConfigured and KubePodCrashLooping now contain fewer false positives.

    • KubeCPUOvercommit and KubeMemoryOvercommit are now more robust in non-homogeneous environments.

    • The for duration setting of the NodeFilesystemAlmostOutOfSpace alerting rule has changed from one hour to 30 minutes so that the system more quickly detects when disk space runs low.

    • KubeDeploymentReplicasMismatch now fires as expected. In previous versions, this alert did not fire.

    • The following alerts now contain a namespace label:

      • AlertmanagerReceiversNotConfigured

      • KubeClientErrors

      • KubeCPUOvercommit

      • KubeletDown

      • KubeMemoryOvercommit

      • MultipleContainersOOMKilled

      • ThanosQueryGrpcClientErrorRate

      • ThanosQueryGrpcServerErrorRate

      • ThanosQueryHighDNSFailures

      • ThanosQueryHttpRequestQueryErrorRateHigh

      • ThanosQueryHttpRequestQueryRangeErrorRateHigh

      • ThanosSidecarPrometheusDown

      • Watchdog

Red Hat does not guarantee backward compatibility for metrics, recording rules, or alerting rules.

Alertmanager

  • You can add and configure additional external Alertmanagers for both platform and user-defined project monitoring stacks.

  • You can disable the local Alertmanager instance.

Prometheus

  • You can enable and configure remote write storage for both platform monitoring and user-defined projects in Prometheus. This feature enables you to send ingested metrics to long-term storage.

  • To reduce the overall memory consumption of Prometheus, the following cAdvisor metrics with both an empty pod and namespace label have been dropped:

    • container_fs_.*

    • container_spec_.*

    • container_blkio_device_usage_total

    • container_file_descriptors

    • container_sockets

    • container_threads_max

    • container_threads

    • container_start_time_seconds

    • container_last_seen

  • When persistent storage is not configured for platform monitoring, upgrades and cluster disruptions can lead to data loss. A warning message has been added to the Degraded condition when the system detects that persistent storage is not configured for platform monitoring.

  • You can exclude individual user-defined projects from the openshift-user-workload-monitoring project by adding the openshift.io/user-monitoring: "false" label to them.

  • You can configure an enforcedTargetLimit parameter for the openshift-user-workload-monitoring project to set an overall limit on the number of targets scraped.

Grafana

Because running the default Grafana dashboard can take resources from user workloads, you can disable the Grafana dashboard deployment.

Metering

This release removes the OpenShift Container Platform Metering Operator.

Scalability and performance

Special Resource Operator (Technology Preview)

You can now use the Special Resource Operator (SRO) to help manage the deployment of kernel modules and drivers on an existing OpenShift Container Platform cluster. This is currently a Technology Preview feature.

For more information, see About the Special Resource Operator.

Memory Manager feature (Technology Preview)

The Memory Manager feature is now enabled by default for all pods running on the node that is configured with one of the following Topology Manager policies:

  • single-numa-node

  • restricted

For more information, see Topology Manager policies.

Additional tools for latency testing

OpenShift Container Platform 4.9 introduces two additional tools to measure system latency:

  • hwladetect measures the baseline that the bare hardware can achieve

  • cyclictest schedules a repeated timer after hwlatdetect passes validation and measures the difference between the desired and the actual trigger times

For more information, see Running the latency tests.

Cluster maximums

Updated guidance around cluster maximums for OpenShift Container Platform 4.9 is now available.

No large scale testing for performance against OVN-Kubernetes testing was executed for this release.

Use the OpenShift Container Platform Limit Calculator to estimate cluster limits for your environment.

Zero touch provisioning (Technology Preview)

OpenShift Container Platform 4.9 supports zero touch provisioning (ZTP), which allows you to provision new edge sites with declarative configurations of bare metal equipment at remote sites. ZTP uses the GitOps deployment set of practices for infrastructure deployment. GitOps achieves these tasks using declarative specifications stored in Git repositories, such as YAML files and other defined patterns, to provide a framework for deploying the infrastructure. The declarative output is leveraged by the Open Cluster Manager (OCM) for multisite deployment. For more information, see Provisioning edge sites at scale.

Insights Operator

Importing RHEL Simple Content Access certificates (Technology Preview)

In OpenShift Container Platform 4.9, Insights Operator can import RHEL Simple Content Access (SCA) certificates from {cloud-redhat-com}.

Insights Operator data collection enhancements

In OpenShift Container Platform 4.9, the Insights Operator collects the following additional information:

  • All of the MachineConfig resource definitions from a cluster.

  • The names of the PodSecurityPolicies installed in a cluster.

  • If installed, the ClusterLogging resource definition.

  • If the SamplesImagestreamImportFailing alert is firing, then the ImageStream definitions and the last 100 lines of container logs from the openshift-cluster-samples-operator namespace.

With this additional information, Red Hat can provide improved remediation steps in Insights Advisor.

Authentication and authorization

Support for Microsoft Azure Stack Hub with Cloud Credential Operator in manual mode

With this release, installations on Microsoft Azure Stack Hub can be performed by configuring the Cloud Credential Operator (CCO) in manual mode.

For more information, see Using manual mode.

OpenShift sandboxed containers support on OpenShift Container Platform (Technology Preview)

To review OpenShift sandboxed containers new features, bug fixes, known issues, and asynchronous errata updates, see OpenShift sandboxed containers 1.1 release notes.

Notable technical changes

OpenShift Container Platform 4.9 introduces the following notable technical changes.

Automatic defragmentation for etcd data

In OpenShift Container Platform 4.9, etcd data is automatically defragmented by the etcd Operator.

Octavia OVN NodePort changes

Previously, on Red Hat OpenStack Platform (RHOSP) deployments, opening traffic on NodePorts was constrained to the CIDR of the node’s subnet. In order to support LoadBalancer services using the Octavia Open Virtual Network (OVN) provider, the security group rules that allow NodePort traffic to master and worker nodes are now changed to open 0.0.0.0/0.

OpenStack Platform LoadBalancer configuration changes

The Red Hat OpenStack Platform (RHOSP) cloud provider LoadBalancer configuration now defaults to use-octavia=True. An exception to this rule is a deployment with Kuryr, in which case use-octavia is set to false, because Kuryr handles LoadBalancer services on its own.

Ingress Controller upgraded to HAProxy 2.2.15

The OpenShift Container Platform Ingress Controller is upgraded to HAProxy version 2.2.15.

CoreDNS update to version 1.8.4

In OpenShift Container Platform 4.9, CoreDNS uses version 1.8.4, which includes bug fixes.

Implementation of cloud controller managers for cloud providers

The Kubernetes controller manager that manages cloud provider deployments does not include support for Azure Stack Hub as a provider. Because using cloud controller managers is the preferred method for interacting with underlying cloud platforms, there is no plan to add this support. As a result, the Azure Stack Hub implementation in OpenShift Container Platform uses cloud controller managers.

In addition, this release supports using cloud controller managers for Amazon Web Services (AWS), Microsoft Azure, and Red Hat OpenStack Platform (RHOSP) as a Technology Preview. Any new cloud platform support that is added to OpenShift Container Platform will also use cloud controller managers.

To learn more about the cloud controller manager, see the Kubernetes documentation on this component.

To manage the cloud controller manager and cloud node manager deployments and lifecycles, this release introduces the Cluster Cloud Controller Manager Operator.

For more information, see the Cluster Cloud Controller Manager Operator entry in the Red Hat Operators reference.

Performing a canary rollout update

With OpenShift Container Platform 4.9, a new process to perform a canary rollout update has been introduced. For a detailed overview of this process, see Performing a canary rollout update.

Support for large Operator bundles

Operator Lifecycle Manager (OLM) now compresses Operator bundles with large amounts of metadata, such as large custom resource definition (CRD) manifests, to stay below the 1 MB limit set by etcd.

Reduced resource usage for Operator Lifecycle Manager

Operator Lifecycle Management (OLM) catalog pods are now more efficient and use less RAM.

Default update channel for Operators from "Extras" advisories

Operators that ship with OpenShift Container Platform "Extras" advisories, such as RHBA-2021:3760, are published in Red Hat-provided catalogs and run on Operator Lifecyle Manager (OLM). Starting with OpenShift Container Platform 4.9, these Operators are now included in a stable update channel in addition to the version-specific 4.9 channel.

For OpenShift Container Platform 4.9 and future releases, stable will be the default channel for these Operators. Cluster administrators should use the stable channel so that changing update channels for these Operators in OLM is no longer necessary with future cluster upgrades.

For more information about OLM-based Operators, see Red Hat-provided Operator catalogs and Understanding OperatorHub. For more information about update channels in OLM, see Upgrading installed Operators.

Operator SDK v1.10.1

OpenShift Container Platform 4.9 supports Operator SDK v1.10.1. See Installing the Operator SDK CLI to install or update to this latest version.

Operator SDK v1.10.1 supports Kubernetes 1.21.

If you have any Operator projects that were previously created or maintained with Operator SDK v1.8.0, see Upgrading projects for newer Operator SDK versions to ensure your projects are upgraded to maintain compatibility with Operator SDK v1.10.1.

Deprecated and removed features

Some features available in previous releases have been deprecated or removed.

Deprecated functionality is still included in OpenShift Container Platform and continues to be supported; however, it will be removed in a future release of this product and is not recommended for new deployments. For the most recent list of major functionality deprecated and removed within OpenShift Container Platform 4.9, refer to the table below. Additional details for more fine-grained functionality that has been deprecated and removed are listed after the table.

In the table, features are marked with the following statuses:

  • GA: General Availability

  • TP: Technology Preview

  • DEP: Deprecated

  • REM: Removed

Table 1. Deprecated and removed features tracker
Feature OCP 4.7 OCP 4.8 OCP 4.9

Package manifest format (Operator Framework)

DEP

REM

REM

SQLite database format for Operator catalogs

GA

GA

DEP

oc adm catalog build

DEP

REM

REM

--filter-by-os flag for oc adm catalog mirror

DEP

REM

REM

v1beta1 CRDs

DEP

DEP

REM

Docker Registry v1 API

DEP

DEP

REM

Metering Operator

DEP

DEP

REM

Scheduler policy

DEP

DEP

DEP

ImageChangesInProgress condition for Cluster Samples Operator

DEP

DEP

DEP

MigrationInProgress condition for Cluster Samples Operator

DEP

DEP

DEP

Use of v1 without a group in apiVersion for OpenShift Container Platform resources

DEP

DEP

REM

Use of dhclient in RHCOS

DEP

DEP

REM

Cluster Loader

GA

DEP

DEP

Bring your own RHEL 7 compute machines

DEP

DEP

DEP

lastTriggeredImageID field in the BuildConfig spec for Builds

GA

DEP

REM

Jenkins Operator

TP

DEP

DEP

HPA custom metrics adapter based on Prometheus

TP

REM

REM

vSphere 6.7 Update 2 or earlier and virtual hardware version 13

GA

GA

DEP

The instance_type_id installation configuration parameter for Red Hat Virtualization (RHV)

DEP

DEP

DEP

Deprecated features

SQLite database format for Operator catalogs

The SQLite database format used by Operator Lifecycle Manager (OLM) for catalogs and index images has been deprecated, including the related opm CLI commands. Cluster administrators and catalog maintainers are encouraged to familiarize themselves with the new file-based catalog format introduced in OpenShift Container Platform 4.9 and begin migrating catalog workflows.

The default Red Hat-provided Operator catalogs for OpenShift Container Platform 4.6 and later are currently still shipped in the SQLite database format.

vSphere 6.7 Update 2 and earlier cluster installation and virtual hardware version 13 are now deprecated

Installing a cluster on VMware vSphere version 6.7 Update 2 or earlier and virtual hardware version 13 is now deprecated. Support for these versions will end in a future version of OpenShift Container Platform.

Hardware version 15 is now the default for vSphere virtual machines in OpenShift Container Platform. Hardware version 15 will be the only supported version in a future version of OpenShift Container Platform.

The instance_type_id installation configuration parameter for Red Hat Virtualization (RHV)

The instance_type_id installation configuration parameter is deprecated and will be removed in a future release.

Removed features

Metering

This release removes the OpenShift Container Platform Metering Operator feature.

Beta APIs removed from Kubernetes 1.22

Kubernetes 1.22 removed the following deprecated v1beta1 APIs. Migrate manifests and API clients to use the v1 API version. For more information about migrating removed APIs, see the Kubernetes documentation.

Table 2. v1beta1 APIs removed from Kubernetes 1.22
Resource API Notable changes

APIService

apiregistration.k8s.io/v1beta1

No

CertificateSigningRequest

certificates.k8s.io/v1beta1

Yes

ClusterRole

rbac.authorization.k8s.io/v1beta1

No

ClusterRoleBinding

rbac.authorization.k8s.io/v1beta1

No

CSIDriver

storage.k8s.io/v1beta1

No

CSINode

storage.k8s.io/v1beta1

No

CustomResourceDefinition

apiextensions.k8s.io/v1beta1

Yes

Ingress

extensions/v1beta1

Yes

Ingress

networking.k8s.io/v1beta1

Yes

IngressClass

networking.k8s.io/v1beta1

No

Lease

coordination.k8s.io/v1beta1

No

LocalSubjectAccessReview

authorization.k8s.io/v1beta1

Yes

MutatingWebhookConfiguration

admissionregistration.k8s.io/v1beta1

Yes

PriorityClass

scheduling.k8s.io/v1beta1

No

Role

rbac.authorization.k8s.io/v1beta1

No

RoleBinding

rbac.authorization.k8s.io/v1beta1

No

SelfSubjectAccessReview

authorization.k8s.io/v1beta1

Yes

StorageClass

storage.k8s.io/v1beta1

No

SubjectAccessReview

authorization.k8s.io/v1beta1

Yes

TokenReview

authentication.k8s.io/v1beta1

No

ValidatingWebhookConfiguration

admissionregistration.k8s.io/v1beta1

Yes

VolumeAttachment

storage.k8s.io/v1beta1

No

Descheduler v1beta1 API removed

The deprecated v1beta1 API for the descheduler has been removed in OpenShift Container Platform 4.9. Migrate any resources using the descheduler v1beta1 API version to v1.

Use of dhclient in RHCOS removed

The deprecated dhclient binary has been removed from RHCOS. Starting with OpenShift Container Platform 4.6, RHCOS switched to using NetworkManager in the initramfs to configure networking during early boot. Use the NetworkManager internal DHCP client for networking configuration instead. See BZ#1908462 for more information.

Cease updating the lastTriggeredImageID field and ignore it

The current release stops updating the buildConfig.spec.triggers[i].imageChange.lastTriggeredImageID field when the ImageStreamTag referenced by buildConfig.spec.triggers[i].imageChage points to a new image. Instead, this release updates the buildConfig.status.imageChangeTriggers[i].lastTriggeredImageID field.

Additionally, the Build Image Change Trigger controller ignores the buildConfig.spec.triggers[i].imageChange.lastTriggeredImageID field.

Now, the Build Image Change Trigger controller starts a build based on the buildConfig.status.imageChangeTriggers[i].lastTriggeredImageID field and how it compares to the image ID now referenced by the ImageStreamTag referenced in the buildConfig.spec.triggers[i].imageChange.

Therefore, update scripts and jobs that inspect buildConfig.spec.triggers[i].imageChange.lastTriggeredImageID accordingly. (BUILD-190)

Use of v1 without a group for apiVersion for OpenShift Container Platform resources

Support for using v1 without a group for apiVersion for OpenShift Container Platform resources has been removed. Every resource that includes *.openshift.io must match the apiVersion value found in the API index.

Bug fixes

API server and authentication

  • Previously, encryption conditions could remain indefinitely and be reported as a degraded condition for some Operators. Stale encryption conditions are now cleared properly and no longer improperly reported. (BZ#1974520)

  • Previously, the CA for API server client certificates was rotated early in the lifetime of a cluster, which prevented the Authentication Operator from creating a certificate signing request (CSR) because a previous CSR with the same name still existed. The Kubernetes API server was unable to authenticate itself to the OAuth API server when sending TokenReview requests, which caused authentication to fail. Generated names are now used when creating CSRs by the Authentication Operator, so an early rotation of the CA for API server client certificates no longer causes authentication failures. (BZ#1978193)

Bare Metal Hardware Provisioning

  • Previously, metal3 pods could not download an Red Hat Enterprise Linux CoreOS (RHCOS) image due to the sequencing of creating initContainers. This issue is fixed by reordering the creation of the initContainers, so that the metal-static-ip-set initContainer is created before the metal3-machine-os-downloader initContainer. The RHCOS image now downloads as expected. (BZ#1973724)

  • Previously, when using installer-provisioned installation on bare metal with a host configured to use idrac-virtualmedia, the bios_interface for that host got set to idrac-wsman by default. This resulted in the BIOS settings being unavailable and an exception occurring. This issue is fixed by using idrac-redfish for the default bios_interface when using idrac-virtualmedia. (BZ#1928816)

  • Previously, in UEFI mode, the ironic-python-agent created a UEFI bootloader entry after downloading the RHCOS image. When using an RHCOS image based on RHEL 8.4, the image could fail to boot using this entry and output a BIOS error screen. This is fixed by the ironic-python-agent configuring the boot entry based on a CSV file located in the image, instead of using a fixed boot entry. The image boots properly without error. (BZ#1966129)

  • Previously, if provisioningHostIP had been set in install-config, it was assigned to the metal3 pod, even in cases where the provisioning network had been disabled. This has been fixed. (BZ#1972753)

  • Previously, the assisted installer could not provision Supermicro X11/X12-based systems because of a mismatch in the sushy resource library. The mismatch resulted in an installation issue by being unable to attach virtual media to the Inserted and WriteProtected attributes, and not being allowed in the VirtualMedia.InsertMedia request body. This issue is fixed by modifying the sushy resource library, and adding a condition to stop sending these optional attributes when not strictly required, thus allowing the installation to progress past this point. (BZ#1986238)

  • Previously, some error types in the provisioned state caused the host to be deprovisioned. This occurred after a restart of the metal3 pod if the image provisioned to a bare metal host became unavailable. In this case, the host would enter the deprovisioning state. This issue is fixed by modifying the action of the error in the provisioned state so that if the image becomes unavailable, the error will be reported but deprovisioning will not be initiated. (BZ#1972374)

Builds

  • In OpenShift Container Platform and later, the fix for bug BZ#1884270 incorrectly pruned SSH protocol URLs in an attempt to provide SCP-styled URL capabilities. This error caused the oc new-build command not to pick an automatic source clone secret: the build could not use the build.openshift.io/sbuild.openshift.io/source-secret-match-uri-1ource-secret-match-uri-1 annotation to map SSH keys with the associated secrets, and therefore could not perform git cloning. This update reverts the changes from BZ#1884270 so that builds can use the annotation and perform git cloning.

  • Before this update, various allowed and block registry configuration options of the cluster image configuration could block the Cluster Samples Operator from creating image streams. When that happened, the samples operator marked itself as degraded, which impacted the general OpenShift Container Platform install and upgrade status.

    The Cluster Samples Operator can bootstrap itself as removed in a variety of circumstances. With this update, these circumstances include when the image controller configuration parameters prevent the creation of image streams by using the default image registry or by using the image registry specified by the samplesRegistry setting. The Operator status also indicates when the cluster image configuration prevents the creation of the sample image streams.

Cloud Compute

  • Previously, when a root volume was created for a new server, and that server was not successfully created, the automatic deletion for the volume was not triggered because there was no deletion of a server associated with the volume. In some situations, this led to the creation of many extra volumes, and caused errors if the volume quota was reached. With this release, newly created root volumes are deleted when server creation call fails. (BZ#1943378)

  • Previously, when using the default value for instanceType, the Machine API created m4.large instances on AWS. This is different than the m5.large instance type for machines that are created by the OpenShift Container Platform installer. With this release, the Machine API creates m5.large instances for new machines on AWS when the default value is specified. (BZ#1953063)

  • Previously, the machine set definitions of compute nodes could not specify whether a port should be trunked. This was a problem for technologies that require the user to configure trunked and non-trunked ports for the same machines. This release adds a new field, spec.Port.Trunk = bool, which gives the user more flexibility to determine which ports result in trunks. If no value is specified, spec.Port.Trunk inherits the value of spec.Trunk and the name of the trunk created matches the name of the port used. (BZ#1964540)

  • Previously, the Machine API Operator constantly attached new targets even if they were already attached. The excessive calls to the AWS API resulted in a high number of errors. With this release, the Operator checks whether a load balancer attachment is required before attempting the attachment process. This change reduces the frequency of failed API requests. (BZ#1965080)

  • Previously, when using automatic pinning for a VM, the name of the property was disabled, existing, or adjust. With this release, the name better describes each policy, and existing was removed because it is blocked on oVirt. The new property names are none and resize_and_pin, which align with the oVirt user interface. (BZ#1972747)

  • Previously, the cluster autoscaler was unable to access the csidrivers.storage.k8s.io or csistoragecapacities.storage.k8s.io resources, which resulted in permissions errors. This fix updates the role assigned to the cluster autoscaler so that it includes permissions for these resources. (BZ#1973567)

  • Previously, it was possible to delete a machine with a node that has been deleted. This caused the machine to remain in a deleting phase indefinitely. This fix allows you to delete machines in this state properly. (BZ#1977369)

  • When using boot-from-volume image, creating a new instance leaks volumes if the machine controller is rebooted. This caused the previously created volume to never be cleaned up. This fix ensures that the volume created previously is either pruned or reused. (BZ#1983612)

  • Previously, the Red Hat Virtualization (RHV) provider ignored NICs with br-ex names for machines. Since a network type of OVNKubernetes creates a NIC with a br-ex name, this resulted in the machine never getting an IP address on OVN-Kubernetes. With this fix, it is now possible to install OpenShift Container Platform on RHV with network set to OVNKubernetes. (BZ#1984481)

  • Previously, when deployed on Red Hat OpenStack Platform (RHOSP) with a combination of proxy and custom CA certificate, a cluster would not become fully operational. This fix passes the proxy settings to the HTTP transport used when connecting with a custom CA certificate, ensuring that all cluster components work as expected. (BZ#1986540)

Cluster Version Operator

  • Previously, the Cluster Version Operator (CVO) did not respect the noProxy property in the proxy configuration resource. As a result, the CVO was denied access to update recommendations or release signatures when only unproxied connections completed. Now, when the proxy resource requests direct, unproxied access, the CVO reaches the upstream update service and signature stores directly. (BZ#1978749)

  • Previously, the Cluster Version Operator (CVO) loaded its proxy configuration from proxy resource specification properties instead of from status properties that were verified by the Network Operator. As a result, any incorrectly configured values would prevent the CVO from reaching the upstream update service or signature stores. Now, the CVO loads its proxy configuration only from the verified status properties. (BZ#1978774)

  • Previously, the Cluster Version Operator (CVO) did not remove volume mounts that were added outside of the manifest. As a result, pod creation could fail during a volume failure. Now, CVO removes all volume mounts that do not appear in the manifest. (BZ#2004568)

Console Storage Plug-in

  • Previously, when working with Ceph storage, the Console Storage Plug-in unnecessarily included a redundant use of a namespace parameter. This bug had no customer-visible impact; however, the plug-in has been updated to avoid the redundant use of the namespace. (BZ#1982682)

Image Registry

  • The Operator to check if the registry should use custom tolerations was checking spec.nodeSelector instead of spec.tolerations. The custom tolerations from spec.tolerations are applied only when spec.nodeSelector is set. This fix uses the field spec.tolerations to check for the presence of custom tolerations. Now, the Operator uses custom tolerations if spec.tolerations are set. (BZ#1973318)

  • The spec.managementState in configs.imageregistry is set to Removed, which caused the image pruner pod to generate warnings about deprecated CronJob in v1.21 and later, and that batch/v1 should be used. This fix updates batch/v1beta1 with batch/v1 in OpenShift Container Platform oc. Now, warnings about the deprecated CronJob in image pruner pods no longer appear. (BZ#1976112)

Installer

  • Previously, the network interface on Azure control plane nodes was missing a hyphen in the interface name. This was inconsistent compared to other platforms, which caused issues. The missing hyphen has been added. Now all control plane nodes are named the same, regardless of the platform. (BZ#1882490)

  • You can now configure the autoPinningPolicy and hugepages fields in the install-config.yaml file for oVirt. The autoPinningPolicy field allows you to automatically set the non-uniform memory access (NUMA) pinning settings and CPU topology changes for the cluster. The hugepages field allows you to set the Hugepages of the hypervisor. (BZ#1925203)

  • Previously, the installation program did not output any errors when the Ed25519 SSH key type was used with FIPS enabled, even though it could not be used. Now the installation program validates SSH key types, outputting an error when an SSH key type is not supported with FIPS enabled; only RSA and ECDSA SSH key types are allowed when FIPS is enabled. (BZ#1962414)

  • In certain conditions, Red Hat OpenStack Platform (RHOSP) network trunks do not contain a tag to indicate that the trunk belongs to the cluster. Consequently, cluster deletion missed the trunk ports and got stuck in a loop until they timed out. Deleting the cluster now deletes trunks for which the tagged port is a parent. (BZ#1971518)

  • Previously, when uninstalling a cluster on Red Hat OpenStack Platform (RHOSP), the installer used an inefficient algorithm to delete resources. The inefficient algorithm caused the uninstall process to require more time than was necessary. The installer is updated with a more efficient algorithm that should uninstall the cluster more quickly. (BZ#1974598)

  • Previously, if the AWS_SHARED_CREDENTIALS_FILE environment variable was set to an empty file, the installer prompted for credentials and then created a aws/credentials file, ignoring the value of the environment variable and possibly overwriting existing credentials. With this fix, the installer is updated to store credentials in the specified file. If the specified file has invalid credentials, the installer produces an error instead of overwriting the file and risking information loss. (BZ#1974640)

  • Previously, users encountered a vague error message when they deleted a cluster on Azure that shared resources with another cluster, making it difficult to understand why the deletion failed. This update adds an error message that explains why the failure occurs. (BZ#1976016)

  • Previously, because of a typo, Kuryr deployments were being checked against the wrong requirements, meaning that installations with Kuryr could succeed even if they did not meet the minimum requirements for Kuryr. This fix eliminates the error, allowing the installer to check the right requirements. (BZ#1978213)

  • Before this update, the ingress checks for keepalived did not include fall and raise directives, which meant that a single failed check could cause an ingress virtual IP failover. This bug fix introduces fall and raise directives to prevent such failovers. (BZ#1982766)

Kubernetes API server

  • Previously, when a deployment and image stream were created at the same time, a race condition could occur which caused the deployment controller to create replica sets in an infinite loop. The responsibilities of the API server’s image policy plug-in were lowered and concurrent creation of a deployment and image stream no longer causes infinite replica sets. (BZ#1925180), (BZ#1976775)

  • Previously, there was a race between the installer pod and the cert-syncer container, which were writing to the same path. This could leave some certificates empty and prevent the server from running. Kubernetes API server certificates are now written in an atomic way to prevent races between multiple processes. (BZ#1971624)

Networking

  • When using the OVN-Kubernetes cluster network provider, the logical flow cache was configured without any memory limit. As a result, in some situations high memory pressure could cause a node to become unusable. With this update, the logical flow cache is configured with a 1 GB memory limit by default. (BZ#1961757)

  • When using the OVN-Kubernetes cluster network provider, any network policies created in a OpenShift Container Platform 4.5 cluster that was subsequently upgraded might allow or drop unexpected traffic. In later versions of OpenShift Container Platform, OVN-Kubernetes uses a different convention for managing IP address sets, and any network policies created in OpenShift Container Platform 4.5 did not use this convention. Now, during upgrades all network policies are migrated to the new convention. (BZ#1962387)

  • For the OVN-Kubernetes cluster network provider, when using must-gather to retrieve Open vSwitch (OVS) logs, the INFO log level was missing from the gathered logging data. Now all log levels are included in OVS logging data. (BZ#1970129)

  • Previously, performance testing revealed that the service controller metrics had a significant increase in cardinality due to a label requirement. As a result, memory usage was elevated on Open Virtual Network (OVN) Prometheus pods. With this update, the label requirement is removed. Service controller cardinality metrics and memory usage are now reduced. (BZ#1974967)

  • Previously, ovnkube-trace required iproute to be installed in the source and/or destination pod because it needed to detect the interfaces link index. This causes ovnkube-trace to fail on pods if there is no iproute installed. Now, you can get the link index from /sys/class/net/<interface>/iflink instead of iproute. As a result, ovnkube-trace no longer requires iproute to be installed in source and destination pods. (BZ#1978137)

  • Previously, the Cluster Network Operator (CNO) deployed a service monitor for the network-check-source service to get discovered by Prometheus without correct annotations and role-based access control (RBAC). As a result, the service and its metrics never populated in Prometheus. Now, the correct annotations and RBAC are added to the namespace of network-check-source service. Now, metrics of service network-check-source get scraped by Prometheus. (BZ#1986061)

  • Previously, when using IPv6 DHCP, node interface addresses might be leased with a /128 prefix. Consequently, OVN-Kubernetes uses the same prefix to infer the node’s network and routes any other address traffic, including traffic to other cluster nodes, through the gateway. With this update, OVN-Kubernetes inspects the node’s routing table and checks for the wider routing entry for the node’s interface address and uses that prefix to infer the node’s network. As a result, traffic to other cluster nodes is no longer routed through the gateway. (BZ#1980135)

  • Previously, when a cluster used the OVN-Kubernetes Container Network Interface provider, attempting to add an egress router with IPv6 address failed. With the fix, support for IPv6 is added to the egress router CNI plug-in and adding adding egress routers succeeds. (BZ#1989688)

Node

  • Previously, in containers, CRI-O did not create a symbolic link from /proc/mounts file to the /etc/mtab file. As a consequence, the user could not view the list of the mounted devices in the container’s /etc/mtab file. CRI-O now adds the symbolic link. As a result, users can view the container’s mounted devices. (BZ#1868221)

  • Previously, if pods were deleted quickly after creation, the kubelet might not clean up the pods properly. This resulted in pods being stuck in a terminating state, and could impact the availability of upgrades. This fix improves the pod lifecycle logic to avoid this problem. (BZ#1952224)

  • Previously, the SystemMemoryExceedsReserved alert would fire when the system memory usage exceeded 90% of the reserved memory. As a result, clusters could fire an excessive number of alerts. The threshold for this alarm was changed to fire at 95% of reserved memory. (BZ#1980844)

  • Previously, a bug in CRI-O caused CRI-O to leak a child PID of a process it created. As a result, if under load, systemd could create a significant number of zombie processes. This could lead to node failure if the node ran out of PIDs. CRI-O was fixed to prevent the leakage. As a result, these zombie processes are no longer being created. (BZ#2003197)

OpenShift CLI (oc)

  • Previously, the oc command-line tool was crashing while mirroring the registry, causing a slice bounds out of range panic runtime error because of an unchecked index operation on a slice when using the --max-components argument. With this fix, a check has been added to ensure that the components check does not request an out-of-range index value so that the oc tool no longer panics when using the --max-components argument. (BZ#1786835)

  • Previously, the oc describe quota command showed inconsistent units in Used memory for the ClusterResourceQuota value, which was unpredictable and difficult to read. With this fix, the Used memory now always uses the same unit as the Hard memory so that the oc describe quota command shows predictable values. (BZ#1955292)

  • Previously, the oc logs command did not work with pipeline builds because of a missing client setup. The client setup has been fixed in the oc logs command so that it now works with pipeline builds. (BZ#1973643)

Operator Lifecycle Manager (OLM)

  • Previously, the Operator Lifecycle Manager (OLM) upgradeable condition message was unclear when installed Operators set olm.maxOpenShiftVersion to a minor OpenShift Container Platform version less than or equal to the current version. This resulted in an incorrect error message that has been fixed to specify that only minor and major version upgrades are blocked when olm.maxOpenShiftVersion is set to version different than the current OpenShift Container Platform version. (BZ#1992677)

  • Previously, the opm command failed to deprecate bundles when they were present in the index. Consequently, bundles truncated as part of another deprecation in the same call were reported as missing. This update adds a check for bundles before any deprecation takes place to differentiate between a bundle that is not present and one that has been truncated. As a result, deprecated bundles along the same upgrade path are no longer reported as missing. (BZ#1950534)

  • A transient error can occur when Operator Lifecycle Manager (OLM) attempts to update a custom resource definition (CRD) object in the cluster. This caused OLM to permanently fail the install plan containing the CRD. This bug fix updates OLM to retry CRD updates on resource-modified conflict errors. As a result, OLM is now more resilient to this class of transient errors. Install plans no longer permanently fail on conflict errors that OLM is able to retry and resolve. (BZ#1923111)

  • The opm index|registry add commands attempted to verify the existence of Operator bundles in an index that are replaced, regardless of whether they were already truncated from the index. The commands would consistently fail after a bundle was deprecated for a given package. This bug fix updates the opm CLI to handle this edge case and no longer verify the existence of truncated bundles. As a result, the commands no longer fail after a bundle is deprecated for a given package. (BZ#1952101)

  • Operator Lifecycle Manager (OLM) can now allow priority classes to be projected into registry pods using labels in catalog source resources. The default catalog sources are important components in namespaces managed by the cluster, which mandate priority classes. With this enhancement, all default catalog sources in the openshift-marketplace namespace have a system-cluster-critical priority class. (BZ#1954869)

  • The Marketplace Operator was using the leader-for-life implementation where a config map holding the leasing owner’s identity has owner references placed by the controller’s pod. This is problematic in the case where the node that the pod was scheduled on became unavailable, and the pod was unable to be terminated. This made the config map unable to be properly garbage collected so a new leader could be elected. Minor version cluster upgrades were blocked as the newer Marketplace Operator version could not gain leader election. Manual cleanup of the config map holding the leader election lease was required in order to release the lock and complete the upgrade of the Marketplace component. This bug fix switches to using the leader-for-lease leader election implementation. As a result, leader election no longer gets stuck in this scenario. (BZ#1958888)

  • Previously, a new Failed phase was introduced for install plans. Failure to detect a valid Operator group (OG) or service account (SA) resource for the namespace the install plan was being created in would transition the install plan to the failed state. That is, failure to detect these resources when the install plan was reconciled the first time was considered a permanent failure. This was a regression from the following previous behavior of install plans:

    • Failure to detect OG or SA resources would requeue the install plan for reconciliation.

    • Creation of the required resources before the retry limit of the informer queue was reached would transition the install plan from the Installing phase to the Complete phase, unless the bundle unpacking step failed.

    This regression introduced strange behavior for users who had infrastructure built that applied a set of manifests simultaneously to install an Operator that included a subscription, which creates install plans, along with the required OG and SA resources. In those cases, whenever there was a delay in the reconciliation of the OG and SA, the install plan would be transitioned to a state of permanent failure.

    This bug fix removes the logic that transitioned the install plan to the Failed phase. Instead, the install plan is now requeued for any reconciliation error. As a result, when no OG is detected, the following condition is set:

    conditions:
     - lastTransitionTime: ""2021-06-23T18:16:00Z""
     lastUpdateTime: ""2021-06-23T18:16:16Z""
     message: attenuated service account query failed - no operator group found that
     is managing this namespace
     reason: InstallCheckFailed
     status: ""False""
     type: Installed

    When a valid OG is created, the following condition is set:

    conditions:
     - lastTransitionTime: ""2021-06-23T18:33:37Z""
     lastUpdateTime: ""2021-06-23T18:33:37Z""
     status: ""True""
  • When updating a catalog source, a Get call is immediately followed by a Delete call on a number of resources related to the catalog source. In some instances