Customer data
This documentation outlines Red Hat, Amazon Web Services (AWS), and customer responsibilities for the Red Hat OpenShift Service on AWS (ROSA) managed service.
While Red Hat and Amazon Web Services (AWS) manage the Red Hat OpenShift Service on AWS services, the customer shares certain responsibilities. The Red Hat OpenShift Service on AWS services are accessed remotely, hosted on public cloud resources, created in customer-owned AWS accounts, and have underlying platform and data security that is owned by Red Hat.
If the |
Resource | Incident and operations management | Change management | Access and identity authorization | Security and regulation compliance | Disaster recovery |
---|---|---|---|---|---|
Customer data |
Customer |
Customer |
Customer |
Customer |
Customer |
Customer applications |
Customer |
Customer |
Customer |
Customer |
Customer |
Developer services |
Customer |
Customer |
Customer |
Customer |
Customer |
Platform monitoring |
Red Hat |
Red Hat |
Red Hat |
Red Hat |
Red Hat |
Logging |
Red Hat |
Red Hat and Customer |
Red Hat and Customer |
Red Hat and Customer |
Red Hat |
Application networking |
Red Hat and Customer |
Red Hat and Customer |
Red Hat and Customer |
Red Hat |
Red Hat |
Cluster networking |
Red Hat [1] |
Red Hat and Customer [2] |
Red Hat and Customer |
Red Hat [1] |
Red Hat [1] |
Virtual networking management |
Red Hat and Customer |
Red Hat and Customer |
Red Hat and Customer |
Red Hat and Customer |
Red Hat and Customer |
Virtual compute management (control plane, infrastructure and worker nodes) |
Red Hat |
Red Hat |
Red Hat |
Red Hat |
Red Hat |
Cluster version |
Red Hat |
Red Hat and Customer |
Red Hat |
Red Hat |
Red Hat |
Capacity management |
Red Hat |
Red Hat and Customer |
Red Hat |
Red Hat |
Red Hat |
Virtual storage management |
Red Hat |
Red Hat |
Red Hat |
Red Hat |
Red Hat |
AWS software (public AWS services) |
AWS |
AWS |
AWS |
AWS |
AWS |
Hardware/AWS global infrastructure |
AWS |
AWS |
AWS |
AWS |
AWS |
If the customer chooses to use their own CNI plugin, the responsibility shifts to the customer.
The customer must configure their firewall to grant access to the required OpenShift and AWS domains and ports before the cluster is provisioned. For more information, see "AWS firewall prerequisites".
Red Hat, AWS, and the customer all share responsibility for the monitoring, maintenance, and overall health of a Red Hat OpenShift Service on AWS (ROSA) cluster. This documentation illustrates the delineation of responsibilities for each of the listed resources as shown in the tables below.
Cluster notifications are messages about the status, health, or performance of your cluster.
Cluster notifications are the primary way that Red Hat Site Reliability Engineering (SRE) communicates with you about the health of your managed cluster. SRE may also use cluster notifications to prompt you to perform an action in order to resolve or prevent an issue with your cluster.
Cluster owners and administrators must regularly review and action cluster notifications to ensure clusters remain healthy and supported.
You can view cluster notifications in the Red Hat Hybrid Cloud Console, in the Cluster history tab for your cluster. By default, only the cluster owner receives cluster notifications as emails. If other users need to receive cluster notification emails, add each user as a notification contact for your cluster.
Cluster notifications are designed to keep you informed about the health of your cluster and high impact events that affect it.
Most cluster notifications are generated and sent automatically to ensure that you are immediately informed of problems or important changes to the state of your cluster.
In certain situations, Red Hat Site Reliability Engineering (SRE) creates and sends cluster notifications to provide additional context and guidance for a complex issue.
Cluster notifications are not sent for low-impact events, low-risk security updates, routine operations and maintenance, or minor, transient issues that are quickly resolved by SRE.
Red Hat services automatically send notifications when:
Remote health monitoring or environment verification checks detect an issue in your cluster, for example, when a worker node has low disk space.
Significant cluster life cycle events occur, for example, when scheduled maintenance or upgrades begin, or cluster operations are impacted by an event, but do not require customer intervention.
Significant cluster management changes occur, for example, when cluster ownership or administrative control is transferred from one user to another.
Your cluster subscription is changed or updated, for example, when Red Hat makes updates to subscription terms or features available to your cluster.
SRE creates and sends notifications when:
An incident results in a degradation or outage that impacts your cluster’s availability or performance, for example, your cloud provider has a regional outage. SRE sends subsequent notifications to inform you of incident resolution progress, and when the incident is resolved.
A security vulnerability, security breach, or unusual activity is detected on your cluster.
Red Hat detects that changes you have made are creating or may result in cluster instability.
Red Hat detects that your workloads are causing performance degradation or instability in your cluster.
Red Hat is responsible for overseeing the service components required for default platform networking. AWS is responsible for protecting the hardware infrastructure that runs all of the services offered in the AWS Cloud. The customer is responsible for incident and operations management of customer application data and any custom networking the customer has configured for the cluster network or virtual network.
Resource | Service responsibilities | Customer responsibilities |
---|---|---|
Application networking |
Red Hat
|
|
Cluster networking |
Red Hat
|
|
Virtual networking management |
Red Hat
|
|
Virtual storage management |
Red Hat
|
|
Platform monitoring |
Red Hat
|
|
Incident management |
Red Hat
|
|
Infrastructure and data resiliency |
Red Hat
|
|
Cluster capacity |
Red Hat
|
|
AWS software (public AWS services) |
AWS
|
|
Hardware/AWS global infrastructure |
AWS
|
|
Platform audit logs are securely forwarded to a centralized security information and event monitoring (SIEM) system, where they may trigger configured alerts to the SRE team and are also subject to manual review. Audit logs are retained in the SIEM system for one year. Audit logs for a given cluster are not deleted at the time the cluster is deleted.
An incident is an event that results in a degradation or outage of one or more Red Hat services. An incident can be raised by a customer or a Customer Experience and Engagement (CEE) member through a support case, directly by the centralized monitoring and alerting system, or directly by a member of the SRE team.
Depending on the impact on the service and customer, the incident is categorized in terms of severity.
When managing a new incident, Red Hat uses the following general workflow:
An SRE first responder is alerted to a new incident and begins an initial investigation.
After the initial investigation, the incident is assigned an incident lead, who coordinates the recovery efforts.
An incident lead manages all communication and coordination around recovery, including any relevant notifications and support case updates.
The incident is recovered.
The incident is documented and a root cause analysis (RCA) is performed within 5 business days of the incident.
An RCA draft document will be shared with the customer within 7 business days of the incident.
Red Hat also assists with customer incidents raised through support cases. Red Hat can assist with activities including but not limited to:
Forensic gathering, including isolating virtual compute
Guiding compute image collection
Providing collected audit logs
The impact of a cluster upgrade on capacity is evaluated as part of the upgrade testing process to ensure that capacity is not negatively impacted by new additions to the cluster. During a cluster upgrade, additional worker nodes are added to make sure that total cluster capacity is maintained during the upgrade process.
Capacity evaluations by the Red Hat SRE staff also happen in response to alerts from the cluster, after usage thresholds are exceeded for a certain period of time. Such alerts can also result in a notification to the customer.
This section describes the policies about how cluster and configuration changes, patches, and releases are managed.
Red Hat is responsible for enabling changes to the cluster infrastructure and services that the customer will control, as well as maintaining versions for the control plane nodes, infrastructure nodes and services, and worker nodes. AWS is responsible for protecting the hardware infrastructure that runs all of the services offered in the AWS Cloud. The customer is responsible for initiating infrastructure change requests and installing and maintaining optional services and networking configurations on the cluster, as well as all changes to customer data and customer applications.
You can initiate changes using self-service capabilities such as cluster deployment, worker node scaling, or cluster deletion.
Change history is captured in the Cluster History section in the OpenShift Cluster Manager Overview tab, and is available for you to view. The change history includes, but is not limited to, logs from the following changes:
Adding or removing identity providers
Adding or removing users to or from the dedicated-admins
group
Scaling the cluster compute nodes
Scaling the cluster load balancer
Scaling the cluster persistent storage
Upgrading the cluster
You can implement a maintenance exclusion by avoiding changes in OpenShift Cluster Manager for the following components:
Deleting a cluster
Adding, modifying, or removing identity providers
Adding, modifying, or removing a user from an elevated group
Installing or removing add-ons
Modifying cluster networking configurations
Adding, modifying, or removing machine pools
Enabling or disabling user workload monitoring
Initiating an upgrade
To enforce the maintenance exclusion, ensure machine pool autoscaling or automatic upgrade policies have been disabled. After the maintenance exclusion has been lifted, proceed with enabling machine pool autoscaling or automatic upgrade policies as desired. |
Red Hat site reliability engineering (SRE) manages the infrastructure, code, and configuration of Red Hat OpenShift Service on AWS using a GitOps workflow and fully automated CI/CD pipelines. This process ensures that Red Hat can safely introduce service improvements on a continuous basis without negatively impacting customers.
Every proposed change undergoes a series of automated verifications immediately upon check-in. Changes are then deployed to a staging environment where they undergo automated integration testing. Finally, changes are deployed to the production environment. Each step is fully automated.
An authorized SRE reviewer must approve advancement to each step. The reviewer cannot be the same individual who proposed the change. All changes and approvals are fully auditable as part of the GitOps workflow.
Some changes are released to production incrementally, using feature flags to control availability of new features to specified clusters or customers.
OpenShift Container Platform software and the underlying immutable Red Hat CoreOS (RHCOS) operating system image are patched for bugs and vulnerabilities in regular z-stream upgrades. Read more about RHCOS architecture in the OpenShift Container Platform documentation.
Red Hat does not automatically upgrade your clusters. You can schedule to upgrade the clusters at regular intervals (recurring upgrade) or just once (individual upgrade) using the OpenShift Cluster Manager web console. Red Hat might forcefully upgrade a cluster to a new z-stream version only if the cluster is affected by a critical impact CVE.
Because the required permissions can change between y-stream releases, the policies might have to be updated before an upgrade can be performed. Therefore, you cannot schedule a recurring upgrade on ROSA clusters with STS. |
You can review the history of all cluster upgrade events in the OpenShift Cluster Manager web console. For more information about releases, see the Life Cycle policy.
Resource | Service responsibilities | Customer responsibilities |
---|---|---|
Logging |
Red Hat
|
|
Application networking |
Red Hat
|
|
Cluster networking |
Red Hat
|
|
Virtual networking management |
Red Hat
|
|
Virtual compute management |
Red Hat
|
|
Cluster version |
Red Hat
|
|
Capacity management |
Red Hat
|
|
Virtual storage management |
Red Hat
|
|
AWS software (public AWS services) |
AWS Compute: Provide the Amazon EC2 service, used for ROSA control plane, infrastructure, and worker nodes. Storage: Provide Amazon EBS, used by ROSA to provision local node storage and persistent volume storage for the cluster. Storage: Provide Amazon S3, used for the ROSA service’s built-in image registry. Networking: Provide the following AWS Cloud services, used by ROSA to satisfy virtual networking infrastructure needs:
Networking: Provide the following AWS services, which customers can optionally integrate with ROSA:
|
|
Hardware/AWS global infrastructure |
AWS
|
|
The following table outlines the the responsibilities in regards to security and regulation compliance:
Resource | Service responsibilities | Customer responsibilities |
---|---|---|
Logging |
Red Hat
|
|
Virtual networking management |
Red Hat
|
|
Virtual storage management |
Red Hat
|
|
Virtual compute management |
Red Hat
|
|
AWS software (public AWS services) |
AWS Compute: Secure Amazon EC2, used for ROSA control plane, infrastructure, and worker nodes. For more information, see Infrastructure security in Amazon EC2 in the Amazon EC2 User Guide. Storage: Secure Amazon Elastic Block Store (EBS), used for ROSA control plane, infrastructure, and worker node volumes, as well as Kubernetes persistent volumes. For more information, see Data protection in Amazon EC2 in the Amazon EC2 User Guide. Storage: Provide AWS KMS, which ROSA uses to encrypt control plane, infrastructure, and worker node volumes and persistent volumes. For more information, see Amazon EBS encryption in the Amazon EC2 User Guide. Storage: Secure Amazon S3, used for the ROSA service’s built-in container image registry. For more information, see Amazon S3 security in the S3 User Guide. Networking: Provide security capabilities and services to increase privacy and control network access on AWS global infrastructure, including network firewalls built into Amazon VPC, private or dedicated network connections, and automatic encryption of all traffic on the AWS global and regional networks between AWS secured facilities. For more information, see the AWS Shared Responsibility Model and Infrastructure security in the Introduction to AWS Security whitepaper. |
|
Hardware/AWS global infrastructure |
AWS
|
|
For more information about customer or shared responsibilities, see the ROSA Security document.
Disaster recovery includes data and configuration backup, replicating data and configuration to the disaster recovery environment, and failover on disaster events.
Red Hat OpenShift Service on AWS (ROSA) provides disaster recovery for failures that occur at the pod, worker node, infrastructure node, control plane node, and availability zone levels.
All disaster recovery requires that the customer use best practices for deploying highly available applications, storage, and cluster architecture, such as single-zone deployment or multi-zone deployment, to account for the level of desired availability.
One single-zone cluster will not provide disaster avoidance or recovery in the event of an availability zone or region outage. Multiple single-zone clusters with customer-maintained failover can account for outages at the zone or at the regional level.
One multi-zone cluster will not provide disaster avoidance or recovery in the event of a full region outage. Multiple multi-zone clusters with customer-maintained failover can account for outages at the regional level.
Resource | Service responsibilities | Customer responsibilities |
---|---|---|
Virtual networking management |
Red Hat
|
|
Virtual Storage management |
Red Hat
|
|
Virtual compute management |
Red Hat
|
|
AWS software (public AWS services) |
AWS Compute: Provide Amazon EC2 features that support data resiliency such as Amazon EBS snapshots and Amazon EC2 Auto Scaling. For more information, see Resilience in Amazon EC2 in the EC2 User Guide. Storage: Provide the ability for the ROSA service and customers to back up the Amazon EBS volume on the cluster through Amazon EBS volume snapshots. Storage: For information about Amazon S3 features that support data resiliency, see Resilience in Amazon S3. Networking: For information about Amazon VPC features that support data resiliency, see Resilience in Amazon Virtual Private Cloud in the Amazon VPC User Guide. |
|
Hardware/AWS global infrastructure |
AWS
|
|
The customer is responsible for the applications, workloads, and data that they deploy to Red Hat OpenShift Service on AWS. However, Red Hat and AWS provide various tools to help the customer manage data and applications on the platform.
Resource | Red Hat and AWS | Customer responsibilities |
---|---|---|
Customer data |
Red Hat
AWS
|
|
Customer applications |
Red Hat
AWS
|
|
For more information about Red Hat site reliability engineering (SRE) teams access, see Identity and access management.