$ oc get machinesets -n openshift-machine-api
After installing OpenShift Container Platform, you can further expand and customize your cluster to your requirements.
If you incorrectly sized the worker nodes during deployment, adjust them by creating one or more new machine sets, scale them up, then scale the original machine set down before removing them.
MachineSet
objects describe OpenShift Container Platform nodes with respect to the cloud or machine provider.
The MachineConfigPool
object allows MachineConfigController
components to define and provide the status of machines in the context of upgrades.
The MachineConfigPool
object allows users to configure how upgrades are rolled out to the OpenShift Container Platform nodes in the machine config pool.
The NodeSelector
object can be replaced with a reference to the MachineSet
object.
If you must add or remove an instance of a machine in a machine set, you can manually scale the machine set.
This guidance is relevant to fully automated, installer-provisioned infrastructure installations. Customized, user-provisioned infrastructure installations does not have machine sets.
Install an OpenShift Container Platform cluster and the oc
command line.
Log in to oc
as a user with cluster-admin
permission.
View the machine sets that are in the cluster:
$ oc get machinesets -n openshift-machine-api
The machine sets are listed in the form of <clusterid>-worker-<aws-region-az>
.
Scale the machine set:
$ oc scale --replicas=2 machineset <machineset> -n openshift-machine-api
Or:
$ oc edit machineset <machineset> -n openshift-machine-api
You can scale the machine set up or down. It takes several minutes for the new machines to be available.
Random
, Newest
, and Oldest
are the three supported deletion options. The default is Random
, meaning that random machines are chosen and deleted when scaling machine sets down. The deletion policy can be set according to the use case by modifying the particular machine set:
spec:
deletePolicy: <delete_policy>
replicas: <desired_replica_count>
Specific machines can also be prioritized for deletion by adding the annotation machine.openshift.io/cluster-api-delete-machine
to the machine of interest, regardless of the deletion policy.
By default, the OpenShift Container Platform router pods are deployed on workers. Because the router is required to access some cluster resources, including the web console, do not scale the worker machine set to |
Custom machine sets can be used for use cases requiring that services run on specific nodes and that those services are ignored by the controller when the worker machine sets are scaling down. This prevents service disruption. |
You can create a MachineSet
object to host only infrastructure components. You apply specific Kubernetes labels to these machines and then update the infrastructure components to run on only those machines. These infrastructure nodes are not counted toward the total number of subscriptions that are required to run the environment.
The following infrastructure workloads do not incur OpenShift Container Platform worker subscriptions:
Kubernetes and OpenShift Container Platform control plane services that run on masters
The default router
The integrated container image registry
The cluster metrics collection, or monitoring service, including components for monitoring user-defined projects
Cluster aggregated logging
Service brokers
Red Hat Quay
Red Hat OpenShift Container Storage
Red Hat Advanced Cluster Manager
Any node that runs any other container, pod, or component is a worker node that your subscription must cover.
You can use default cluster-wide node selectors on pods together with labels on nodes to constrain all pods created in a cluster to specific nodes.
With cluster-wide node selectors, when you create a pod in that cluster, OpenShift Container Platform adds the default node selectors to the pod and schedules the pod on nodes with matching labels.
You configure cluster-wide node selectors by editing the Scheduler Operator custom resource (CR). You add labels to a node, a machine set, or a machine config. Adding the label to the machine set ensures that if the node or machine goes down, new nodes have the label. Labels added to a node or machine config do not persist if the node or machine goes down.
You can add additional key/value pairs to a pod. But you cannot add a different value for a default key. |
To add a default cluster-wide node selector:
Edit the Scheduler Operator CR to add the default cluster-wide node selectors:
$ oc edit scheduler cluster
apiVersion: config.openshift.io/v1
kind: Scheduler
metadata:
name: cluster
...
spec:
defaultNodeSelector: type=user-node,region=east (1)
mastersSchedulable: false
policy:
name: ""
1 | Add a node selector with the appropriate <key>:<value> pairs. |
After making this change, wait for the pods in the openshift-kube-apiserver
project to redeploy. This can take several minutes. The default cluster-wide node selector does not take effect until the pods redeploy.
Add labels to a node by using a machine set or editing the node directly:
Use a machine set to add labels to nodes managed by the machine set when a node is created:
Run the following command to add labels to a MachineSet
object:
$ oc patch MachineSet <name> --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"<key>"="<value>","<key>"="<value>"}}]' -n openshift-machine-api (1)
1 | Add a <key>/<value> pair for each label. |
For example:
$ oc patch MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"type":"user-node","region":"east"}}]' -n openshift-machine-api
Verify that the labels are added to the MachineSet
object by using the oc edit
command:
For example:
$ oc edit MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-api
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
...
spec:
...
template:
metadata:
...
spec:
metadata:
labels:
region: east
type: user-node
Redeploy the nodes associated with that machine set by scaling down to 0
and scaling up the nodes:
For example:
$ oc scale --replicas=0 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-api
$ oc scale --replicas=1 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-api
When the nodes are ready and available, verify that the label is added to the nodes by using the oc get
command:
$ oc get nodes -l <key>=<value>
For example:
$ oc get nodes -l type=user-node
NAME STATUS ROLES AGE VERSION
ci-ln-l8nry52-f76d1-hl7m7-worker-c-vmqzp Ready worker 61s v1.18.3+002a51f
Add labels directly to a node:
Edit the Node
object for the node:
$ oc label nodes <name> <key>=<value>
For example, to label a node:
$ oc label nodes ci-ln-l8nry52-f76d1-hl7m7-worker-b-tgq49 type=user-node region=east
Verify that the labels are added to the node using the oc get
command:
$ oc get nodes -l <key>=<value>,<key>=<value>
For example:
$ oc get nodes -l type=user-node,region=east
NAME STATUS ROLES AGE VERSION
ci-ln-l8nry52-f76d1-hl7m7-worker-b-tgq49 Ready worker 17m v1.18.3+002a51f
Some of the infrastructure resources are deployed in your cluster by default. You can move them to the infrastructure machine sets that you created.
You can deploy the router pod to a different machine set. By default, the pod is deployed to a worker node.
Configure additional machine sets in your OpenShift Container Platform cluster.
View the IngressController
custom resource for the router Operator:
$ oc get ingresscontroller default -n openshift-ingress-operator -o yaml
The command output resembles the following text:
apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
creationTimestamp: 2019-04-18T12:35:39Z
finalizers:
- ingresscontroller.operator.openshift.io/finalizer-ingresscontroller
generation: 1
name: default
namespace: openshift-ingress-operator
resourceVersion: "11341"
selfLink: /apis/operator.openshift.io/v1/namespaces/openshift-ingress-operator/ingresscontrollers/default
uid: 79509e05-61d6-11e9-bc55-02ce4781844a
spec: {}
status:
availableReplicas: 2
conditions:
- lastTransitionTime: 2019-04-18T12:36:15Z
status: "True"
type: Available
domain: apps.<cluster>.example.com
endpointPublishingStrategy:
type: LoadBalancerService
selector: ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default
Edit the ingresscontroller
resource and change the nodeSelector
to use the infra
label:
$ oc edit ingresscontroller default -n openshift-ingress-operator
Add the nodeSelector
stanza that references the infra
label to the spec
section, as shown:
spec:
nodePlacement:
nodeSelector:
matchLabels:
node-role.kubernetes.io/infra: ""
Confirm that the router pod is running on the infra
node.
View the list of router pods and note the node name of the running pod:
$ oc get pod -n openshift-ingress -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
router-default-86798b4b5d-bdlvd 1/1 Running 0 28s 10.130.2.4 ip-10-0-217-226.ec2.internal <none> <none>
router-default-955d875f4-255g8 0/1 Terminating 0 19h 10.129.2.4 ip-10-0-148-172.ec2.internal <none> <none>
In this example, the running pod is on the ip-10-0-217-226.ec2.internal
node.
View the node status of the running pod:
$ oc get node <node_name> (1)
1 | Specify the <node_name> that you obtained from the pod list. |
NAME STATUS ROLES AGE VERSION
ip-10-0-217-226.ec2.internal Ready infra,worker 17h v1.18.3
Because the role list includes infra
, the pod is running on the correct node.
You configure the registry Operator to deploy its pods to different nodes.
Configure additional machine sets in your OpenShift Container Platform cluster.
View the config/instance
object:
$ oc get configs.imageregistry.operator.openshift.io cluster -o yaml
apiVersion: imageregistry.operator.openshift.io/v1
kind: Config
metadata:
creationTimestamp: 2019-02-05T13:52:05Z
finalizers:
- imageregistry.operator.openshift.io/finalizer
generation: 1
name: cluster
resourceVersion: "56174"
selfLink: /apis/imageregistry.operator.openshift.io/v1/configs/cluster
uid: 36fd3724-294d-11e9-a524-12ffeee2931b
spec:
httpSecret: d9a012ccd117b1e6616ceccb2c3bb66a5fed1b5e481623
logging: 2
managementState: Managed
proxy: {}
replicas: 1
requests:
read: {}
write: {}
storage:
s3:
bucket: image-registry-us-east-1-c92e88cad85b48ec8b312344dff03c82-392c
region: us-east-1
status:
...
Edit the config/instance
object:
$ oc edit configs.imageregistry.operator.openshift.io cluster
Add the following lines of text the spec
section of the object:
nodeSelector:
node-role.kubernetes.io/infra: ""
Verify the registry pod has been moved to the infrastructure node.
Run the following command to identify the node where the registry pod is located:
$ oc get pods -o wide -n openshift-image-registry
Confirm the node has the label you specified:
$ oc describe node <node_name>
Review the command output and confirm that node-role.kubernetes.io/infra
is in the LABELS
list.
In a production deployment, deploy at least three machine sets to hold infrastructure components. Both the logging aggregation solution and the service mesh deploy Elasticsearch, and Elasticsearch requires three instances that are installed on different nodes. For high availability, install deploy these nodes to different availability zones. Since you need different machine sets for each availability zone, create at least three machine sets.
In addition to the ones created by the installation program, you can create your own machine sets to dynamically manage the machine compute resources for specific workloads of your choice.
Deploy an OpenShift Container Platform cluster.
Install the OpenShift CLI (oc
).
Log in to oc
as a user with cluster-admin
permission.
Create a new YAML file that contains the machine set custom resource (CR) sample, as shown, and is named <file_name>.yaml
.
Ensure that you set the <clusterID>
and <role>
parameter values.
If you are not sure about which value to set for a specific field, you can check an existing machine set from your cluster.
$ oc get machinesets -n openshift-machine-api
NAME DESIRED CURRENT READY AVAILABLE AGE
agl030519-vplxk-worker-us-east-1a 1 1 1 1 55m
agl030519-vplxk-worker-us-east-1b 1 1 1 1 55m
agl030519-vplxk-worker-us-east-1c 1 1 1 1 55m
agl030519-vplxk-worker-us-east-1d 0 0 55m
agl030519-vplxk-worker-us-east-1e 0 0 55m
agl030519-vplxk-worker-us-east-1f 0 0 55m
Check values of a specific machine set:
$ oc get machineset <machineset_name> -n \
openshift-machine-api -o yaml
...
template:
metadata:
labels:
machine.openshift.io/cluster-api-cluster: agl030519-vplxk (1)
machine.openshift.io/cluster-api-machine-role: worker (2)
machine.openshift.io/cluster-api-machine-type: worker
machine.openshift.io/cluster-api-machineset: agl030519-vplxk-worker-us-east-1a
1 | The cluster ID. |
2 | A default node label. |
Create the new MachineSet
CR:
$ oc create -f <file_name>.yaml
View the list of machine sets:
$ oc get machineset -n openshift-machine-api
NAME DESIRED CURRENT READY AVAILABLE AGE
agl030519-vplxk-infra-us-east-1a 1 1 1 1 11m
agl030519-vplxk-worker-us-east-1a 1 1 1 1 55m
agl030519-vplxk-worker-us-east-1b 1 1 1 1 55m
agl030519-vplxk-worker-us-east-1c 1 1 1 1 55m
agl030519-vplxk-worker-us-east-1d 0 0 55m
agl030519-vplxk-worker-us-east-1e 0 0 55m
agl030519-vplxk-worker-us-east-1f 0 0 55m
When the new machine set is available, the DESIRED
and CURRENT
values match. If the machine set is not available, wait a few minutes and run the command again.
After the new machine set is available, check status of the machine and the node that it references:
$ oc describe machine <name> -n openshift-machine-api
For example:
$ oc describe machine agl030519-vplxk-infra-us-east-1a -n openshift-machine-api
status:
addresses:
- address: 10.0.133.18
type: InternalIP
- address: ""
type: ExternalDNS
- address: ip-10-0-133-18.ec2.internal
type: InternalDNS
lastUpdated: "2019-05-03T10:38:17Z"
nodeRef:
kind: Node
name: ip-10-0-133-18.ec2.internal
uid: 71fb8d75-6d8f-11e9-9ff3-0e3f103c7cd8
providerStatus:
apiVersion: awsproviderconfig.openshift.io/v1beta1
conditions:
- lastProbeTime: "2019-05-03T10:34:31Z"
lastTransitionTime: "2019-05-03T10:34:31Z"
message: machine successfully created
reason: MachineCreationSucceeded
status: "True"
type: MachineCreation
instanceId: i-09ca0701454124294
instanceState: running
kind: AWSMachineProviderStatus
View the new node and confirm that the new node has the label that you specified:
$ oc get node <node_name> --show-labels
Review the command output and confirm that node-role.kubernetes.io/<your_label>
is in the LABELS
list.
Any change to a machine set is not applied to existing machines owned by the machine set. For example, labels edited or added to an existing machine set are not propagated to existing machines and nodes associated with the machine set. |
Use the sample machine set for your cloud.
This sample YAML defines a machine set that runs in the us-east-1a
Amazon Web Services (AWS) zone and creates nodes that are labeled with node-role.kubernetes.io/<role>: ""
In this sample, <infrastructureID>
is the infrastructure ID label that is based on the cluster ID that you set when you provisioned the cluster, and <role>
is the node label to add.
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
labels:
machine.openshift.io/cluster-api-cluster: <infrastructureID> (1)
name: <infrastructureID>-<role>-<zone> (2)
namespace: openshift-machine-api
spec:
replicas: 1
selector:
matchLabels:
machine.openshift.io/cluster-api-cluster: <infrastructureID> (1)
machine.openshift.io/cluster-api-machineset: <infrastructureID>-<role>-<zone> (2)
template:
metadata:
labels:
machine.openshift.io/cluster-api-cluster: <infrastructureID> (1)
machine.openshift.io/cluster-api-machine-role: <role> (3)
machine.openshift.io/cluster-api-machine-type: <role> (3)
machine.openshift.io/cluster-api-machineset: <infrastructureID>-<role>-<zone> (2)
spec:
metadata:
labels:
node-role.kubernetes.io/<role>: "" (3)
providerSpec:
value:
ami:
id: ami-046fe691f52a953f9 (4)
apiVersion: awsproviderconfig.openshift.io/v1beta1
blockDevices:
- ebs:
iops: 0
volumeSize: 120
volumeType: gp2
credentialsSecret:
name: aws-cloud-credentials
deviceIndex: 0
iamInstanceProfile:
id: <infrastructureID>-worker-profile (1)
instanceType: m4.large
kind: AWSMachineProviderConfig
placement:
availabilityZone: us-east-1a
region: us-east-1
securityGroups:
- filters:
- name: tag:Name
values:
- <infrastructureID>-worker-sg (1)
subnet:
filters:
- name: tag:Name
values:
- <infrastructureID>-private-us-east-1a (1)
tags:
- name: kubernetes.io/cluster/<infrastructureID> (1)
value: owned
userDataSecret:
name: worker-user-data
1 | Specify the infrastructure ID that is based on the cluster ID that you set when you provisioned the cluster. If you have the OpenShift CLI installed, you can obtain the infrastructure ID by running the following command:
|
2 | Specify the infrastructure ID, node label, and zone. |
3 | Specify the node label to add. |
4 | Specify a valid Red Hat Enterprise Linux CoreOS (RHCOS) AMI for your AWS zone for your OpenShift Container Platform nodes. |
Machine sets running on AWS support non-guaranteed Spot Instances. You can save on costs by using Spot Instances at a lower price compared to On-Demand Instances on AWS. Configure Spot Instances by adding spotMarketOptions
to the machine set YAML file.
This sample YAML defines a machine set that runs in the 1
Microsoft Azure zone in the centralus
region and creates nodes that are labeled with node-role.kubernetes.io/<role>: ""
In this sample, <infrastructureID>
is the infrastructure ID label that is based on the cluster ID that you set when you provisioned the cluster, and <role>
is the node label to add.
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
labels:
machine.openshift.io/cluster-api-cluster: <infrastructureID> (1)
machine.openshift.io/cluster-api-machine-role: <role> (2)
machine.openshift.io/cluster-api-machine-type: <role> (2)
name: <infrastructureID>-<role>-<region> (3)
namespace: openshift-machine-api
spec:
replicas: 1
selector:
matchLabels:
machine.openshift.io/cluster-api-cluster: <infrastructureID> (1)
machine.openshift.io/cluster-api-machineset: <infrastructureID>-<role>-<region> (3)
template:
metadata:
creationTimestamp: null
labels:
machine.openshift.io/cluster-api-cluster: <infrastructureID> (1)
machine.openshift.io/cluster-api-machine-role: <role> (2)
machine.openshift.io/cluster-api-machine-type: <role> (2)
machine.openshift.io/cluster-api-machineset: <infrastructureID>-<role>-<region> (3)
spec:
metadata:
creationTimestamp: null
labels:
node-role.kubernetes.io/<role>: "" (2)
providerSpec:
value:
apiVersion: azureproviderconfig.openshift.io/v1beta1
credentialsSecret:
name: azure-cloud-credentials
namespace: openshift-machine-api
image:
offer: ""
publisher: ""
resourceID: /resourceGroups/<infrastructureID>-rg/providers/Microsoft.Compute/images/<infrastructureID>
sku: ""
version: ""
internalLoadBalancer: ""
kind: AzureMachineProviderSpec
location: centralus
managedIdentity: <infrastructureID>-identity (1)
metadata:
creationTimestamp: null
natRule: null
networkResourceGroup: ""
osDisk:
diskSizeGB: 128
managedDisk:
storageAccountType: Premium_LRS
osType: Linux
publicIP: false
publicLoadBalancer: ""
resourceGroup: <infrastructureID>-rg (1)
sshPrivateKey: ""
sshPublicKey: ""
subnet: <infrastructureID>-<role>-subnet (1) (2)
userDataSecret:
name: <role>-user-data (2)
vmSize: Standard_D2s_v3
vnet: <infrastructureID>-vnet (1)
zone: "1" (4)
1 | Specify the infrastructure ID that is based on the cluster ID that you set when you provisioned the cluster. If you have the OpenShift CLI installed, you can obtain the infrastructure ID by running the following command:
|
2 | Specify the node label to add. |
3 | Specify the infrastructure ID, node label, and region. |
4 | Specify the zone within your region to place Machines on. Be sure that your region supports the zone that you specify. |
This sample YAML defines a machine set that runs in Google Cloud Platform (GCP) and creates nodes that are labeled with node-role.kubernetes.io/<role>: ""
In this sample, <infrastructureID>
is the infrastructure ID label that is based on the cluster ID that you set when you provisioned the cluster, and <role>
is the node label to add.
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
labels:
machine.openshift.io/cluster-api-cluster: <infrastructureID> (1)
name: <infrastructureID>-w-a (1)
namespace: openshift-machine-api
spec:
replicas: 1
selector:
matchLabels:
machine.openshift.io/cluster-api-cluster: <infrastructureID> (1)
machine.openshift.io/cluster-api-machineset: <infrastructureID>-w-a (1)
template:
metadata:
creationTimestamp: null
labels:
machine.openshift.io/cluster-api-cluster: <infrastructureID> (1)
machine.openshift.io/cluster-api-machine-role: <role> (3)
machine.openshift.io/cluster-api-machine-type: <role> (3)
machine.openshift.io/cluster-api-machineset: <infrastructureID>-w-a (1)
spec:
metadata:
labels:
node-role.kubernetes.io/<role>: "" (3)
providerSpec:
value:
apiVersion: gcpprovider.openshift.io/v1beta1
canIPForward: false
credentialsSecret:
name: gcp-cloud-credentials
deletionProtection: false
disks:
- autoDelete: true
boot: true
image: <infrastructureID>-rhcos-image (1)
labels: null
sizeGb: 128
type: pd-ssd
kind: GCPMachineProviderSpec
machineType: n1-standard-4
metadata:
creationTimestamp: null
networkInterfaces:
- network: <infrastructureID>-network (1)
subnetwork: <infrastructureID>-<role>-subnet (2)
projectID: <project_name> (4)
region: us-central1
serviceAccounts:
- email: <infrastructureID>-w@<project_name>.iam.gserviceaccount.com (1) (4)
scopes:
- https://www.googleapis.com/auth/cloud-platform
tags:
- <infrastructureID>-<role> (2)
userDataSecret:
name: worker-user-data
zone: us-central1-a
1 | Specify the infrastructure ID that is based on the cluster ID that you set when you provisioned the cluster. If you have the OpenShift CLI installed, you can obtain the infrastructure ID by running the following command:
|
2 | Specify the infrastructure ID and node label. |
3 | Specify the node label to add. |
4 | Specify the name of the GCP project that you use for your cluster. |
This sample YAML defines a machine set that runs on VMware vSphere and creates nodes that are labeled with node-role.kubernetes.io/<role>: ""
.
In this sample, <infrastructure_id>
is the infrastructure ID label that is based on the cluster ID that you set when you provisioned the cluster, and <role>
is the node label to add.
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
creationTimestamp: null
labels:
machine.openshift.io/cluster-api-cluster: <infrastructure_id> (1)
name: <infrastructure_id>-<role> (2)
namespace: openshift-machine-api
spec:
replicas: 1
selector:
matchLabels:
machine.openshift.io/cluster-api-cluster: <infrastructure_id> (1)
machine.openshift.io/cluster-api-machineset: <infrastructure_id>-<role> (2)
template:
metadata:
creationTimestamp: null
labels:
machine.openshift.io/cluster-api-cluster: <infrastructure_id> (1)
machine.openshift.io/cluster-api-machine-role: <role> (3)
machine.openshift.io/cluster-api-machine-type: <role> (3)
machine.openshift.io/cluster-api-machineset: <infrastructure_id>-<role> (2)
spec:
metadata:
creationTimestamp: null
labels:
node-role.kubernetes.io/<role>: "" (3)
providerSpec:
value:
apiVersion: vsphereprovider.openshift.io/v1beta1
credentialsSecret:
name: vsphere-cloud-credentials
diskGiB: 120
kind: VSphereMachineProviderSpec
memoryMiB: 8192
metadata:
creationTimestamp: null
network:
devices:
- networkName: "<vm_network_name>" (4)
numCPUs: 4
numCoresPerSocket: 1
snapshot: ""
template: <vm_template_name> (5)
userDataSecret:
name: worker-user-data
workspace:
datacenter: <vcenter_datacenter_name> (6)
datastore: <vcenter_datastore_name> (7)
folder: <vcenter_vm_folder_path> (8)
resourcepool: <vsphere_resource_pool> (9)
server: <vcenter_server_ip> (10)
1 | Specify the infrastructure ID that is based on the cluster ID that you set when you provisioned the cluster. If you have the OpenShift CLI (oc ) installed, you can obtain the infrastructure ID by running the following command:
|
2 | Specify the infrastructure ID and node label. |
3 | Specify the node label to add. |
4 | Specify the vSphere VM network to deploy the machine set to. |
5 | Specify the vSphere VM template to use, such as user-5ddjd-rhcos . |
6 | Specify the vCenter Datacenter to deploy the machine set on. |
7 | Specify the vCenter Datastore to deploy the machine set on. |
8 | Specify the path to the vSphere VM folder in vCenter, such as /dc1/vm/user-inst-5ddjd . |
9 | Specify the vSphere resource pool for your VMs. |
10 | Specify the vCenter server IP or fully qualified domain name. |
See Creating infrastructure machine sets for installer-provisioned infrastructure environments or for any cluster where the master nodes are managed by the machine API. |
Requirements of the cluster dictate that infrastructure, also called infra
nodes, be provisioned. The installer only provides provisions for master and worker nodes. Worker nodes can be designated as infrastructure nodes or application, also called app
, nodes through labeling.
Add a label to the worker node that you want to act as application node:
$ oc label node <node-name> node-role.kubernetes.io/app=""
Add a label to the worker nodes that you want to act as infrastructure nodes:
$ oc label node <node-name> node-role.kubernetes.io/infra=""
Check to see if applicable nodes now have the infra
role and app
roles:
$ oc get nodes
Create a default node selector so that pods without a node selector are assigned a subset of nodes to be deployed on, for example by default deployment in worker nodes. As an example, the defaultNodeSelector
to deploy pods on worker nodes by default would look like:
defaultNodeSelector: node-role.kubernetes.io/app=
Move infrastructure resources to the newly labeled infra
nodes.
If you need infrastructure machines to have dedicated configurations, then you must create an infra pool.
Create a machine config pool that contains both the worker role and your custom role as machine config selector:
$ cat infra.mcp.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: infra
spec:
machineConfigSelector:
matchExpressions:
- {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
nodeSelector:
matchLabels:
node-role.kubernetes.io/infra: ""
After you have the YAML file, you can create the machine config pool:
$ oc create -f infra.mcp.yaml
Check the machine configs to ensure that the infrastructure configuration rendered successfully:
$ oc get machineconfig
NAME GENERATEDBYCONTROLLER IGNITIONVERSION CREATED
00-master 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 2.2.0 31d
00-worker 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 2.2.0 31d
01-master-container-runtime 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 2.2.0 31d
01-master-kubelet 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 2.2.0 31d
01-worker-container-runtime 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 2.2.0 31d
01-worker-kubelet 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 2.2.0 31d
99-master-1ae2a1e0-a115-11e9-8f14-005056899d54-registries 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 2.2.0 31d
99-master-ssh 2.2.0 31d
99-worker-1ae64748-a115-11e9-8f14-005056899d54-registries 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 2.2.0 31d
99-worker-ssh 2.2.0 31d
rendered-infra-4e48906dca84ee702959c71a53ee80e7 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 2.2.0 23m
rendered-master-072d4b2da7f88162636902b074e9e28e 5b6fb8349a29735e48446d435962dec4547d3090 2.2.0 31d
rendered-master-3e88ec72aed3886dec061df60d16d1af 02c07496ba0417b3e12b78fb32baf6293d314f79 2.2.0 31d
rendered-master-419bee7de96134963a15fdf9dd473b25 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 2.2.0 17d
rendered-master-53f5c91c7661708adce18739cc0f40fb 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 2.2.0 13d
rendered-master-a6a357ec18e5bce7f5ac426fc7c5ffcd 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 2.2.0 7d3h
rendered-master-dc7f874ec77fc4b969674204332da037 5b6fb8349a29735e48446d435962dec4547d3090 2.2.0 31d
rendered-worker-1a75960c52ad18ff5dfa6674eb7e533d 5b6fb8349a29735e48446d435962dec4547d3090 2.2.0 31d
rendered-worker-2640531be11ba43c61d72e82dc634ce6 5b6fb8349a29735e48446d435962dec4547d3090 2.2.0 31d
rendered-worker-4e48906dca84ee702959c71a53ee80e7 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 2.2.0 7d3h
rendered-worker-4f110718fe88e5f349987854a1147755 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 2.2.0 17d
rendered-worker-afc758e194d6188677eb837842d3b379 02c07496ba0417b3e12b78fb32baf6293d314f79 2.2.0 31d
rendered-worker-daa08cc1e8f5fcdeba24de60cd955cc3 365c1cfd14de5b0e3b85e0fc815b0060f36ab955 2.2.0 13d
Optional: To deploy changes to a custom pool, create a machine config that uses the custom pool name as the label, such as infra
. Note that this is not required and only shown for instructional purposes. In this manner, you can apply any custom configurations specific to only your infra nodes.
$ cat infra.mc.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: infra
name: 51-infra
spec:
config:
ignition:
version: 2.2.0
storage:
files:
- contents:
source: data:,infra
filesystem: root
mode: 0644
path: /etc/infratest
Apply the machine config, then verify that the infra-labeled nodes are updated with the configuration file:
$ oc create -f infra.mc.yaml
The cluster autoscaler adjusts the size of an OpenShift Container Platform cluster to meet its current deployment needs. It uses declarative, Kubernetes-style arguments to provide infrastructure management that does not rely on objects of a specific cloud provider. The cluster autoscaler has a cluster scope, and is not associated with a particular namespace.
The cluster autoscaler increases the size of the cluster when there are pods that failed to schedule on any of the current nodes due to insufficient resources or when another node is necessary to meet deployment needs. The cluster autoscaler does not increase the cluster resources beyond the limits that you specify.
Ensure that the |
The cluster autoscaler decreases the size of the cluster when some nodes are consistently not needed for a significant period, such as when it has low resource use and all of its important pods can fit on other nodes.
If the following types of pods are present on a node, the cluster autoscaler will not remove the node:
Pods with restrictive pod disruption budgets (PDBs).
Kube-system pods that do not run on the node by default.
Kube-system pods that do not have a PDB or have a PDB that is too restrictive.
Pods that are not backed by a controller object such as a deployment, replica set, or stateful set.
Pods with local storage.
Pods that cannot be moved elsewhere because of a lack of resources, incompatible node selectors or affinity, matching anti-affinity, and so on.
Unless they also have a "cluster-autoscaler.kubernetes.io/safe-to-evict": "true"
annotation, pods that have a "cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
annotation.
If you configure the cluster autoscaler, additional usage restrictions apply:
Do not modify the nodes that are in autoscaled node groups directly. All nodes within the same node group have the same capacity and labels and run the same system pods.
Specify requests for your pods.
If you have to prevent pods from being deleted too quickly, configure appropriate PDBs.
Confirm that your cloud provider quota is large enough to support the maximum node pools that you configure.
Do not run additional node group autoscalers, especially the ones offered by your cloud provider.
The horizontal pod autoscaler (HPA) and the cluster autoscaler modify cluster resources in different ways. The HPA changes the deployment’s or replica set’s number of replicas based on the current CPU load. If the load increases, the HPA creates new replicas, regardless of the amount of resources available to the cluster. If there are not enough resources, the cluster autoscaler adds resources so that the HPA-created pods can run. If the load decreases, the HPA stops some replicas. If this action causes some nodes to be underutilized or completely empty, the cluster autoscaler deletes the unnecessary nodes.
The cluster autoscaler takes pod priorities into account. The Pod Priority and Preemption feature enables scheduling pods based on priorities if the cluster does not have enough resources, but the cluster autoscaler ensures that the cluster has resources to run all pods. To honor the intention of both features, the cluster autoscaler includes a priority cutoff function. You can use this cutoff to schedule "best-effort" pods, which do not cause the cluster autoscaler to increase resources but instead run only when spare resources are available.
Pods with priority lower than the cutoff value do not cause the cluster to scale up or prevent the cluster from scaling down. No new nodes are added to run the pods, and nodes running these pods might be deleted to free resources.
This ClusterAutoscaler
resource definition shows the parameters and sample values for the cluster autoscaler.
apiVersion: "autoscaling.openshift.io/v1"
kind: "ClusterAutoscaler"
metadata:
name: "default"
spec:
podPriorityThreshold: -10 (1)
resourceLimits:
maxNodesTotal: 24 (2)
cores:
min: 8 (3)
max: 128 (4)
memory:
min: 4 (5)
max: 256 (6)
gpus:
- type: nvidia.com/gpu (7)
min: 0 (8)
max: 16 (9)
- type: amd.com/gpu (7)
min: 0 (8)
max: 4 (9)
scaleDown: (10)
enabled: true (11)
delayAfterAdd: 10m (12)
delayAfterDelete: 5m (13)
delayAfterFailure: 30s (14)
unneededTime: 60s (15)
1 | Specify the priority that a pod must exceed to cause the cluster autoscaler to deploy additional nodes. Enter a 32-bit integer value. The podPriorityThreshold value is compared to the value of the PriorityClass that you assign to each pod. |
2 | Specify the maximum number of nodes to deploy. This value is the total number of machines that are deployed in your cluster, not just the ones that the autoscaler controls. Ensure that this value is large enough to account for all of your control plane and compute machines and the total number of replicas that you specify in your MachineAutoscaler resources. |
3 | Specify the minimum number of cores to deploy. |
4 | Specify the maximum number of cores to deploy. |
5 | Specify the minimum amount of memory, in GiB, per node. |
6 | Specify the maximum amount of memory, in GiB, per node. |
7 | Optionally, specify the type of GPU node to deploy. Only nvidia.com/gpu and amd.com/gpu are valid types. |
8 | Specify the minimum number of GPUs to deploy. |
9 | Specify the maximum number of GPUs to deploy. |
10 | In this section, you can specify the period to wait for each action by using any valid ParseDuration interval, including ns , us , ms , s , m , and h . |
11 | Specify whether the cluster autoscaler can remove unnecessary nodes. |
12 | Optionally, specify the period to wait before deleting a node after a node has recently been added. If you do not specify a value, the default value of 10m is used. |
13 | Specify the period to wait before deleting a node after a node has recently been deleted. If you do not specify a value, the default value of 10s is used. |
14 | Specify the period to wait before deleting a node after a scale down failure occurred. If you do not specify a value, the default value of 3m is used. |
15 | Specify the period before an unnecessary node is eligible for deletion. If you do not specify a value, the default value of 10m is used. |
To deploy the cluster autoscaler, you create an instance of the ClusterAutoscaler
resource.
Create a YAML file for the ClusterAutoscaler
resource that contains the customized resource definition.
Create the resource in the cluster:
$ oc create -f <filename>.yaml (1)
1 | <filename> is the name of the resource file that you customized. |
The machine autoscaler adjusts the number of Machines in the machine sets that you deploy in an OpenShift Container Platform cluster. You can scale both the default worker
machine set and any other machine sets that you create. The machine autoscaler makes more Machines when the cluster runs out of resources to support more deployments. Any changes to the values in MachineAutoscaler
resources, such as the minimum or maximum number of instances, are immediately applied to the machine set they target.
You must deploy a machine autoscaler for the cluster autoscaler to scale your machines. The cluster autoscaler uses the annotations on machine sets that the machine autoscaler sets to determine the resources that it can scale. If you define a cluster autoscaler without also defining machine autoscalers, the cluster autoscaler will never scale your cluster. |
This MachineAutoscaler
resource definition shows the parameters and sample values for the machine autoscaler.
apiVersion: "autoscaling.openshift.io/v1beta1"
kind: "MachineAutoscaler"
metadata:
name: "worker-us-east-1a" (1)
namespace: "openshift-machine-api"
spec:
minReplicas: 1 (2)
maxReplicas: 12 (3)
scaleTargetRef: (4)
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet (5)
name: worker-us-east-1a (6)
1 | Specify the machine autoscaler name. To make it easier to identify which machine set this machine autoscaler scales, specify or include the name of the machine set to scale. The machine set name takes the following form: <clusterid>-<machineset>-<aws-region-az> |
2 | Specify the minimum number machines of the specified type that must remain in the specified zone after the cluster autoscaler initiates cluster scaling. If running in AWS, GCP, or Azure, this value can be set to 0 . For other providers, do not set this value to 0 . |
3 | Specify the maximum number machines of the specified type that the cluster autoscaler can deploy in the specified AWS zone after it initiates cluster scaling. Ensure that the maxNodesTotal value in the ClusterAutoscaler resource definition is large enough to allow the machine autoscaler to deploy this number of machines. |
4 | In this section, provide values that describe the existing machine set to scale. |
5 | The kind parameter value is always MachineSet . |
6 | The name value must match the name of an existing machine set, as shown in the metadata.name parameter value. |
To deploy the machine autoscaler, you create an instance of the MachineAutoscaler
resource.
Create a YAML file for the MachineAutoscaler
resource that contains the customized resource definition.
Create the resource in the cluster:
$ oc create -f <filename>.yaml (1)
1 | <filename> is the name of the resource file that you customized. |
You can turn Technology Preview features on and off for all nodes in the cluster by editing the FeatureGates Custom Resource, named cluster, in the openshift-config project.
The following Technology Preview features are enabled by feature gates:
RotateKubeletServerCertificate
SupportPodPidsLimit
Turning on Technology Preview features using the |
To turn on the Technology Preview features for the entire cluster:
Create the FeatureGates instance:
Switch to the Administration → Custom Resource Definitions page.
On the Custom Resource Definitions page, click FeatureGate.
On the Custom Resource Definitions page, click the Actions Menu and select View Instances.
On the Feature Gates page, click Create Feature Gates.
Replace the code with following sample:
apiVersion: config.openshift.io/v1
kind: FeatureGate
metadata:
name: cluster
spec: {}
Click Create.
To turn on the Technology Preview features, change the spec
parameter to:
apiVersion: config.openshift.io/v1 kind: FeatureGate metadata: name: cluster spec: featureSet: TechPreviewNoUpgrade (1)
1 | Add featureSet: TechPreviewNoUpgrade to enable the Technology Preview
features that are affected by FeatureGates. |
Back up etcd, enable or disable etcd encryption, or defragment etcd data.
By default, etcd data is not encrypted in OpenShift Container Platform. You can enable etcd encryption for your cluster to provide an additional layer of data security. For example, it can help protect the loss of sensitive data if an etcd backup is exposed to the incorrect parties.
When you enable etcd encryption, the following OpenShift API server and Kubernetes API server resources are encrypted:
Secrets
Config maps
Routes
OAuth access tokens
OAuth authorize tokens
When you enable etcd encryption, encryption keys are created. These keys are rotated on a weekly basis. You must have these keys in order to restore from an etcd backup.
You can enable etcd encryption to encrypt sensitive resources in your cluster.
It is not recommended to take a backup of etcd until the initial encryption process is complete. If the encryption process has not completed, the backup might be only partially encrypted. |
Access to the cluster as a user with the cluster-admin
role.
Modify the APIServer
object:
$ oc edit apiserver
Set the encryption
field type to aescbc
:
spec:
encryption:
type: aescbc (1)
1 | The aescbc type means that AES-CBC with PKCS#7 padding and a 32 byte key is used to perform the encryption. |
Save the file to apply the changes.
The encryption process starts. It can take 20 minutes or longer for this process to complete, depending on the size of your cluster.
Verify that etcd encryption was successful.
Review the Encrypted
status condition for the OpenShift API server to verify that its resources were successfully encrypted:
$ oc get openshiftapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.reason}{"\n"}{.message}{"\n"}'
The output shows EncryptionCompleted
upon successful encryption:
EncryptionCompleted
All resources encrypted: routes.route.openshift.io, oauthaccesstokens.oauth.openshift.io, oauthauthorizetokens.oauth.openshift.io
If the output shows EncryptionInProgress
, this means that encryption is still in progress. Wait a few minutes and try again.
Review the Encrypted
status condition for the Kubernetes API server to verify that its resources were successfully encrypted:
$ oc get kubeapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.reason}{"\n"}{.message}{"\n"}'
The output shows EncryptionCompleted
upon successful encryption:
EncryptionCompleted
All resources encrypted: secrets, configmaps
If the output shows EncryptionInProgress
, this means that encryption is still in progress. Wait a few minutes and try again.
You can disable encryption of etcd data in your cluster.
Access to the cluster as a user with the cluster-admin
role.
Modify the APIServer
object:
$ oc edit apiserver
Set the encryption
field type to identity
:
spec:
encryption:
type: identity (1)
1 | The identity type is the default value and means that no encryption is performed. |
Save the file to apply the changes.
The decryption process starts. It can take 20 minutes or longer for this process to complete, depending on the size of your cluster.
Verify that etcd decryption was successful.
Review the Encrypted
status condition for the OpenShift API server to verify that its resources were successfully decrypted:
$ oc get openshiftapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.reason}{"\n"}{.message}{"\n"}'
The output shows DecryptionCompleted
upon successful decryption:
DecryptionCompleted
Encryption mode set to identity and everything is decrypted
If the output shows DecryptionInProgress
, this means that decryption is still in progress. Wait a few minutes and try again.
Review the Encrypted
status condition for the Kubernetes API server to verify that its resources were successfully decrypted:
$ oc get kubeapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.reason}{"\n"}{.message}{"\n"}'
The output shows DecryptionCompleted
upon successful decryption:
DecryptionCompleted
Encryption mode set to identity and everything is decrypted
If the output shows DecryptionInProgress
, this means that decryption is still in progress. Wait a few minutes and try again.
Follow these steps to back up etcd data by creating an etcd snapshot and backing up the resources for the static pods. This backup can be saved and used at a later time if you need to restore etcd.
Only save a backup from a single master host. Do not take a backup from each master host in the cluster. |
You have access to the cluster as a user with the cluster-admin
role.
You have checked whether the cluster-wide proxy is enabled.
You can check whether the proxy is enabled by reviewing the output of |
Start a debug session for a master node:
$ oc debug node/<node_name>
Change your root directory to the host:
sh-4.2# chroot /host
If the cluster-wide proxy is enabled, be sure that you have exported the NO_PROXY
, HTTP_PROXY
, and HTTPS_PROXY
environment variables.
Run the cluster-backup.sh
script and pass in the location to save the backup to.
sh-4.4# /usr/local/bin/cluster-backup.sh /home/core/assets/backup
1bf371f1b5a483927cd01bb593b0e12cff406eb8d7d0acf4ab079c36a0abd3f7
etcdctl version: 3.3.18
API version: 3.3
found latest kube-apiserver-pod: /etc/kubernetes/static-pod-resources/kube-apiserver-pod-7
found latest kube-controller-manager-pod: /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-8
found latest kube-scheduler-pod: /etc/kubernetes/static-pod-resources/kube-scheduler-pod-6
found latest etcd-pod: /etc/kubernetes/static-pod-resources/etcd-pod-2
Snapshot saved at /home/core/assets/backup/snapshot_2020-03-18_220218.db
snapshot db and kube resources are successfully saved to /home/core/assets/backup
In this example, two files are created in the /home/core/assets/backup/
directory on the master host:
snapshot_<datetimestamp>.db
: This file is the etcd snapshot.
static_kuberesources_<datetimestamp>.tar.gz
: This file contains the resources for the static pods. If etcd encryption is enabled, it also contains the encryption keys for the etcd snapshot.
If etcd encryption is enabled, it is recommended to store this second file separately from the etcd snapshot for security reasons. However, this file is required in order to restore from the etcd snapshot. Keep in mind that etcd encryption only encrypts values, not keys. This means that resource types, namespaces, and object names are unencrypted. |
Manual defragmentation must be performed periodically to reclaim disk space after etcd history compaction and other events cause disk fragmentation.
History compaction is performed automatically every five minutes and leaves gaps in the back-end database. This fragmented space is available for use by etcd, but is not available to the host file system. You must defragment etcd to make this space available to the host file system.
Because etcd writes data to disk, its performance strongly depends on disk performance. Consider defragmenting etcd every month, twice a month, or as needed for your cluster. You can also monitor the etcd_db_total_size_in_bytes
metric to determine whether defragmentation is necessary.
Defragmenting etcd is a blocking action. The etcd member will not response until defragmentation is complete. For this reason, wait at least one minute between defragmentation actions on each of the pods to allow the cluster to recover. |
Follow this procedure to defragment etcd data on each etcd member.
You have access to the cluster as a user with the cluster-admin
role.
Determine which etcd member is the leader, because the leader should be defragmented last.
Get the list of etcd pods:
$ oc get pods -n openshift-etcd -o wide | grep etcd
etcd-ip-10-0-159-225.example.redhat.com 3/3 Running 0 175m 10.0.159.225 ip-10-0-159-225.example.redhat.com <none> <none>
etcd-ip-10-0-191-37.example.redhat.com 3/3 Running 0 173m 10.0.191.37 ip-10-0-191-37.example.redhat.com <none> <none>
etcd-ip-10-0-199-170.example.redhat.com 3/3 Running 0 176m 10.0.199.170 ip-10-0-199-170.example.redhat.com <none> <none>
Choose a pod and run the following command to determine which etcd member is the leader:
$ oc rsh -n openshift-etcd etcd-ip-10-0-159-225.us-west-1.compute.internal etcdctl endpoint status --cluster -w table
Defaulting container name to etcdctl.
Use 'oc describe pod/etcd-ip-10-0-159-225.example.redhat.com -n openshift-etcd' to see all of the containers in this pod.
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.0.191.37:2379 | 251cd44483d811c3 | 3.4.9 | 104 MB | false | false | 7 | 91624 | 91624 | |
| https://10.0.159.225:2379 | 264c7c58ecbdabee | 3.4.9 | 104 MB | false | false | 7 | 91624 | 91624 | |
| https://10.0.199.170:2379 | 9ac311f93915cc79 | 3.4.9 | 104 MB | true | false | 7 | 91624 | 91624 | |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
Based on the IS LEADER
column of this output, the https://10.0.199.170:2379
endpoint is the leader. Matching this endpoint with the output of the previous step, the pod name of the leader is etcd-ip-10-0-199-170.example.redhat.com
.
Defragment an etcd member.
Connect to the running etcd container, passing in the name of a pod that is not the leader:
$ oc rsh -n openshift-etcd etcd-ip-10-0-159-225.example.redhat.com
Unset the ETCDCTL_ENDPOINTS
environment variable:
sh-4.4# unset ETCDCTL_ENDPOINTS
Defragment the etcd member:
sh-4.4# etcdctl --command-timeout=30s --endpoints=https://localhost:2379 defrag
Finished defragmenting etcd member[https://localhost:2379]
If a timeout error occurs, increase the value for --command-timeout
until the command succeeds.
Verify that the database size was reduced:
sh-4.4# etcdctl endpoint status -w table --cluster
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.0.191.37:2379 | 251cd44483d811c3 | 3.4.9 | 104 MB | false | false | 7 | 91624 | 91624 | |
| https://10.0.159.225:2379 | 264c7c58ecbdabee | 3.4.9 | 41 MB | false | false | 7 | 91624 | 91624 | | (1)
| https://10.0.199.170:2379 | 9ac311f93915cc79 | 3.4.9 | 104 MB | true | false | 7 | 91624 | 91624 | |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
This example shows that the database size for this etcd member is now 41 MB as opposed to the starting size of 104 MB.
Repeat these steps to connect to each of the other etcd members and defragment them. Always defragment the leader last.
Wait at least one minute between defragmentation actions to allow the etcd pod to recover. Until the etcd pod recovers, the etcd member will not respond.
If any NOSPACE
alarms were triggered due to the space quota being exceeded, clear them.
Check if there are any NOSPACE
alarms:
sh-4.4# etcdctl alarm list
memberID:12345678912345678912 alarm:NOSPACE
Clear the alarms:
sh-4.4# etcdctl alarm disarm
You can use a saved etcd backup to restore back to a previous cluster state. You use the etcd backup to restore a single control plane host. Then the etcd cluster Operator handles scaling to the remaining master hosts.
When you restore your cluster, you must use an etcd backup that was taken from the same z-stream release. For example, an OpenShift Container Platform 4.5.2 cluster must use an etcd backup that was taken from 4.5.2. |
Access to the cluster as a user with the cluster-admin
role.
SSH access to master hosts.
A backup directory containing both the etcd snapshot and the resources for the static pods, which were from the same backup. The file names in the directory must be in the following formats: snapshot_<datetimestamp>.db
and static_kuberesources_<datetimestamp>.tar.gz
.
Select a control plane host to use as the recovery host. This is the host that you will run the restore operation on.
Establish SSH connectivity to each of the control plane nodes, including the recovery host.
The Kubernetes API server becomes inaccessible after the restore process starts, so you cannot access the control plane nodes. For this reason, it is recommended to establish SSH connectivity to each control plane host in a separate terminal.
If you do not complete this step, you will not be able to access the master hosts to complete the restore procedure, and you will be unable to recover your cluster from this state. |
Copy the etcd backup directory to the recovery control plane host.
This procedure assumes that you copied the backup
directory containing the etcd snapshot and the resources for the static pods to the /home/core/
directory of your recovery control plane host.
Stop the static pods on all other control plane nodes.
It is not required to manually stop the pods on the recovery host. The recovery script will stop the pods on the recovery host. |
Access a control plane host that is not the recovery host.
Move the existing etcd pod file out of the kubelet manifest directory:
[core@ip-10-0-154-194 ~]$ sudo mv /etc/kubernetes/manifests/etcd-pod.yaml /tmp
Verify that the etcd pods are stopped.
[core@ip-10-0-154-194 ~]$ sudo crictl ps | grep etcd | grep -v operator
The output of this command should be empty. If it is not empty, wait a few minutes and check again.
Move the existing Kubernetes API server pod file out of the kubelet manifest directory:
[core@ip-10-0-154-194 ~]$ sudo mv /etc/kubernetes/manifests/kube-apiserver-pod.yaml /tmp
Verify that the Kubernetes API server pods are stopped.
[core@ip-10-0-154-194 ~]$ sudo crictl ps | grep kube-apiserver | grep -v operator
The output of this command should be empty. If it is not empty, wait a few minutes and check again.
Move the etcd data directory to a different location:
[core@ip-10-0-154-194 ~]$ sudo mv /var/lib/etcd/ /tmp
Repeat this step on each of the other master hosts that is not the recovery host.
Access the recovery control plane host.
If the cluster-wide proxy is enabled, be sure that you have exported the NO_PROXY
, HTTP_PROXY
, and HTTPS_PROXY
environment variables.
You can check whether the proxy is enabled by reviewing the output of |
Run the restore script on the recovery control plane host and pass in the path to the etcd backup directory:
[core@ip-10-0-143-125 ~]$ sudo -E /usr/local/bin/cluster-restore.sh /home/core/backup
...stopping kube-scheduler-pod.yaml
...stopping kube-controller-manager-pod.yaml
...stopping etcd-pod.yaml
...stopping kube-apiserver-pod.yaml
Waiting for container etcd to stop
.complete
Waiting for container etcdctl to stop
.............................complete
Waiting for container etcd-metrics to stop
complete
Waiting for container kube-controller-manager to stop
complete
Waiting for container kube-apiserver to stop
..........................................................................................complete
Waiting for container kube-scheduler to stop
complete
Moving etcd data-dir /var/lib/etcd/member to /var/lib/etcd-backup
starting restore-etcd static pod
starting kube-apiserver-pod.yaml
static-pod-resources/kube-apiserver-pod-7/kube-apiserver-pod.yaml
starting kube-controller-manager-pod.yaml
static-pod-resources/kube-controller-manager-pod-7/kube-controller-manager-pod.yaml
starting kube-scheduler-pod.yaml
static-pod-resources/kube-scheduler-pod-8/kube-scheduler-pod.yaml
Restart the kubelet service on all master hosts.
From the recovery host, run the following command:
[core@ip-10-0-143-125 ~]$ sudo systemctl restart kubelet.service
Repeat this step on all other master hosts.
Verify that the single member control plane has started successfully.
From the recovery host, verify that the etcd container is running.
[core@ip-10-0-143-125 ~]$ sudo crictl ps | grep etcd | grep -v operator
3ad41b7908e32 36f86e2eeaaffe662df0d21041eb22b8198e0e58abeeae8c743c3e6e977e8009 About a minute ago Running etcd 0 7c05f8af362f0
From the recovery host, verify that the etcd pod is running.
[core@ip-10-0-143-125 ~]$ oc get pods -n openshift-etcd | grep etcd
If you attempt to run
|
NAME READY STATUS RESTARTS AGE
etcd-ip-10-0-143-125.ec2.internal 1/1 Running 1 2m47s
If the status is Pending
, or the output lists more than one running etcd pod, wait a few minutes and check again.
Force etcd redeployment.
In a terminal that has access to the cluster as a cluster-admin
user, run the following command:
$ oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge (1)
1 | The forceRedeploymentReason value must be unique, which is why a timestamp is appended. |
When the etcd cluster Operator performs a redeployment, the existing nodes are started with new pods similar to the initial bootstrap scale up.
Verify all nodes are updated to the latest revision.
In a terminal that has access to the cluster as a cluster-admin
user, run the following command:
$ oc get etcd -o=jsonpath='{range .items[0].status.conditions[?(@.type=="NodeInstallerProgressing")]}{.reason}{"\n"}{.message}{"\n"}'
Review the NodeInstallerProgressing
status condition for etcd to verify that all nodes are at the latest revision. The output shows AllNodesAtLatestRevision
upon successful update:
AllNodesAtLatestRevision
3 nodes are at revision 7 (1)
1 | In this example, the latest revision number is 7 . |
If the output includes multiple revision numbers, such as 2 nodes are at revision 6; 1 nodes are at revision 7
, this means that the update is still in progress. Wait a few minutes and try again.
After etcd is redeployed, force new rollouts for the control plane. The Kubernetes API server will reinstall itself on the other nodes because the kubelet is connected to API servers using an internal load balancer.
In a terminal that has access to the cluster as a cluster-admin
user, run the following commands.
Update the kubeapiserver
:
$ oc patch kubeapiserver cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge
Verify all nodes are updated to the latest revision.
$ oc get kubeapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="NodeInstallerProgressing")]}{.reason}{"\n"}{.message}{"\n"}'
Review the NodeInstallerProgressing
status condition to verify that all nodes are at the latest revision. The output shows AllNodesAtLatestRevision
upon successful update:
AllNodesAtLatestRevision
3 nodes are at revision 7 (1)
1 | In this example, the latest revision number is 7 . |
If the output includes multiple revision numbers, such as 2 nodes are at revision 6; 1 nodes are at revision 7
, this means that the update is still in progress. Wait a few minutes and try again.
Update the kubecontrollermanager
:
$ oc patch kubecontrollermanager cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge
Verify all nodes are updated to the latest revision.
$ oc get kubecontrollermanager -o=jsonpath='{range .items[0].status.conditions[?(@.type=="NodeInstallerProgressing")]}{.reason}{"\n"}{.message}{"\n"}'
Review the NodeInstallerProgressing
status condition to verify that all nodes are at the latest revision. The output shows AllNodesAtLatestRevision
upon successful update:
AllNodesAtLatestRevision
3 nodes are at revision 7 (1)
1 | In this example, the latest revision number is 7 . |
If the output includes multiple revision numbers, such as 2 nodes are at revision 6; 1 nodes are at revision 7
, this means that the update is still in progress. Wait a few minutes and try again.
Update the kubescheduler
:
$ oc patch kubescheduler cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge
Verify all nodes are updated to the latest revision.
$ oc get kubescheduler -o=jsonpath='{range .items[0].status.conditions[?(@.type=="NodeInstallerProgressing")]}{.reason}{"\n"}{.message}{"\n"}'
Review the NodeInstallerProgressing
status condition to verify that all nodes are at the latest revision. The output shows AllNodesAtLatestRevision
upon successful update:
AllNodesAtLatestRevision
3 nodes are at revision 7 (1)
1 | In this example, the latest revision number is 7 . |
If the output includes multiple revision numbers, such as 2 nodes are at revision 6; 1 nodes are at revision 7
, this means that the update is still in progress. Wait a few minutes and try again.
Verify that all master hosts have started and joined the cluster.
In a terminal that has access to the cluster as a cluster-admin
user, run the following command:
$ oc get pods -n openshift-etcd | grep etcd
etcd-ip-10-0-143-125.ec2.internal 2/2 Running 0 9h
etcd-ip-10-0-154-194.ec2.internal 2/2 Running 0 9h
etcd-ip-10-0-173-171.ec2.internal 2/2 Running 0 9h
Note that it might take several minutes after completing this procedure for all services to be restored. For example, authentication by using oc login
might not immediately work until the OAuth server pods are restarted.
Understand and configure pod disruption budgets.
A pod disruption budget is part of the
Kubernetes API, which can be
managed with oc
commands like other object types. They
allow the specification of safety constraints on pods during operations, such as
draining a node for maintenance.
PodDisruptionBudget
is an API object that specifies the minimum number or
percentage of replicas that must be up at a time. Setting these in projects can
be helpful during node maintenance (such as scaling a cluster down or a cluster
upgrade) and is only honored on voluntary evictions (not on node failures).
A PodDisruptionBudget
object’s configuration consists of the following key
parts:
A label selector, which is a label query over a set of pods.
An availability level, which specifies the minimum number of pods that must be available simultaneously, either:
minAvailable
is the number of pods must always be available, even during a disruption.
maxUnavailable
is the number of pods can be unavailable during a disruption.
A |
You can check for pod disruption budgets across all projects with the following:
$ oc get poddisruptionbudget --all-namespaces
NAMESPACE NAME MIN-AVAILABLE SELECTOR
another-project another-pdb 4 bar=foo
test-project my-pdb 2 foo=bar
The PodDisruptionBudget
is considered healthy when there are at least
minAvailable
pods running in the system. Every pod above that limit can be evicted.
Depending on your pod priority and preemption settings, lower-priority pods might be removed despite their pod disruption budget requirements. |
You can use a PodDisruptionBudget
object to specify the minimum number or
percentage of replicas that must be up at a time.
To configure a pod disruption budget:
Create a YAML file with the an object definition similar to the following:
apiVersion: policy/v1beta1 (1)
kind: PodDisruptionBudget
metadata:
name: my-pdb
spec:
minAvailable: 2 (2)
selector: (3)
matchLabels:
foo: bar
1 | PodDisruptionBudget is part of the policy/v1beta1 API group. |
2 | The minimum number of pods that must be available simultaneously. This can
be either an integer or a string specifying a percentage, for example, 20% . |
3 | A label query over a set of resources. The result of matchLabels and
matchExpressions are logically conjoined. |
Or:
apiVersion: policy/v1beta1 (1)
kind: PodDisruptionBudget
metadata:
name: my-pdb
spec:
maxUnavailable: 25% (2)
selector: (3)
matchLabels:
foo: bar
1 | PodDisruptionBudget is part of the policy/v1beta1 API group. |
2 | The maximum number of pods that can be unavailable simultaneously. This can
be either an integer or a string specifying a percentage, for example, 20% . |
3 | A label query over a set of resources. The result of matchLabels and
matchExpressions are logically conjoined. |
Run the following command to add the object to project:
$ oc create -f </path/to/file> -n <project_name>