×

Deploying on Red Hat OpenStack Platform (RHOSP) with rootVolume and etcd on local disk is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

As a day 2 operation, you can resolve and prevent performance issues of your Red Hat OpenStack Platform (RHOSP) installation by moving etcd from a root volume (provided by OpenStack Cinder) to a dedicated ephemeral local disk.

Deploying RHOSP on local disk

If you have an existing RHOSP cloud, you can move etcd from that cloud to a dedicated ephemeral local disk.

This procedure is for testing etcd on a local disk only and should not be used on production clusters. In certain cases, complete loss of the control plane can occur. For more information, see "Overview of backup and restore operation" under "Backup and restore".

Prerequisites
  • You have an OpenStack cloud with a working Cinder.

  • Your OpenStack cloud has at least 75 GB of available storage to accommodate 3 root volumes for the OpenShift control plane.

  • The OpenStack cloud is deployed with Nova ephemeral storage that uses a local storage backend and not rbd.

Procedure
  1. Create a Nova flavor for the control plane with at least 10 GB of ephemeral disk by running the following command, replacing the values for --ram, --disk, and <flavor_name> based on your environment:

    $ openstack flavor create --<ram 16384> --<disk 0> --ephemeral 10 --vcpus 4 <flavor_name>
  2. Deploy a cluster with root volumes for the control plane; for example:

    Example YAML file
    # ...
    controlPlane:
      name: master
      platform:
        openstack:
          type: ${CONTROL_PLANE_FLAVOR}
          rootVolume:
            size: 25
            types:
            - ${CINDER_TYPE}
      replicas: 3
    # ...
  3. Deploy the cluster you created by running the following command:

    $ openshift-install create cluster --dir <installation_directory> (1)
    1 For <installation_directory>, specify the location of the customized ./install-config.yaml file that you previously created.
  4. Verify that the cluster you deployed is healthy before proceeding to the next step by running the following command:

    $ oc wait clusteroperators --all --for=condition=Progressing=false (1)
    1 Ensures that the cluster operators are finished progressing and that the cluster is not deploying or updating.
  5. Edit the ControlPlaneMachineSet (CPMS) to add the additional block ephemeral device that is used by etcd by running the following command:

    Details
    $ oc patch ControlPlaneMachineSet/cluster -n openshift-machine-api --type json -p ' (1)
    [
        {
          "op": "add",
          "path": "/spec/template/machines_v1beta1_machine_openshift_io/spec/providerSpec/value/additionalBlockDevices", (2)
          "value": [
            {
              "name": "etcd",
              "sizeGiB": 10,
              "storage": {
                "type": "Local" (3)
              }
            }
          ]
        }
      ]
    '
    1 Applies the JSON patch to the ControlPlaneMachineSet custom resource (CR).
    2 Specifies the path where the additionalBlockDevices are added.
    3 Adds the etcd devices with at least local storage of 10 GB to the cluster. You can specify values greater than 10 GB as long as the etcd device fits the Nova flavor. For example, if the Nova flavor has 15 GB, you can create the etcd device with 12 GB.
  6. Verify that the control plane machines are healthy by using the following steps:

    1. Wait for the control plane machine set update to finish by running the following command:

      $ oc wait --timeout=90m --for=condition=Progressing=false controlplanemachineset.machine.openshift.io -n openshift-machine-api cluster
    2. Verify that the 3 control plane machine sets are updated by running the following command:

      $ oc wait --timeout=90m --for=jsonpath='{.status.updatedReplicas}'=3 controlplanemachineset.machine.openshift.io -n openshift-machine-api cluster
    3. Verify that the 3 control plane machine sets are healthy by running the following command:

      $ oc wait --timeout=90m --for=jsonpath='{.status.replicas}'=3 controlplanemachineset.machine.openshift.io -n openshift-machine-api cluster
    4. Verify that the ClusterOperators are not progressing in the cluster by running the following command:

      $ oc wait clusteroperators --timeout=30m --all --for=condition=Progressing=false
    5. Verify that each of the 3 control plane machines has the additional block device you previously created by running the following script:

    $ cp_machines=$(oc get machines -n openshift-machine-api --selector='machine.openshift.io/cluster-api-machine-role=master' --no-headers -o custom-columns=NAME:.metadata.name) (1)
    
    
    if [[ $(echo "${cp_machines}" | wc -l) -ne 3 ]]; then
      exit 1
    fi (2)
    
    
    for machine in ${cp_machines}; do
      if ! oc get machine -n openshift-machine-api "${machine}" -o jsonpath='{.spec.providerSpec.value.additionalBlockDevices}' | grep -q 'etcd'; then
    	exit 1
      fi (3)
    done
    1 Retrieves the control plane machines running in the cluster.
    2 Iterates over machines which have an additionalBlockDevices entry with the name etcd.
    3 Outputs the name of every control plane machine which has an additionalBlockDevice named etcd.
  7. Create a file named 98-var-lib-etcd.yaml by using the following YAML file:

    This procedure is for testing etcd on a local disk and should not be used on a production cluster. In certain cases, complete loss of the control plane can occur. For more information, see "Overview of backup and restore operation" under "Backup and restore".

    Details
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: master
      name: 98-var-lib-etcd
    spec:
      config:
        ignition:
          version: 3.4.0
        systemd:
          units:
          - contents: |
              [Unit]
              Description=Mount local-etcd to /var/lib/etcd
    
              [Mount]
              What=/dev/disk/by-label/local-etcd (1)
              Where=/var/lib/etcd
              Type=xfs
              Options=defaults,prjquota
    
              [Install]
              WantedBy=local-fs.target
            enabled: true
            name: var-lib-etcd.mount
          - contents: |
              [Unit]
              Description=Create local-etcd filesystem
              DefaultDependencies=no
              After=local-fs-pre.target
              ConditionPathIsSymbolicLink=!/dev/disk/by-label/local-etcd (2)
    
              [Service]
              Type=oneshot
              RemainAfterExit=yes
              ExecStart=/bin/bash -c "[ -L /dev/disk/by-label/ephemeral0 ] || ( >&2 echo Ephemeral disk does not exist; /usr/bin/false )"
              ExecStart=/usr/sbin/mkfs.xfs -f -L local-etcd /dev/disk/by-label/ephemeral0 (3)
    
              [Install]
              RequiredBy=dev-disk-by\x2dlabel-local\x2detcd.device
            enabled: true
            name: create-local-etcd.service
          - contents: |
              [Unit]
              Description=Migrate existing data to local etcd
              After=var-lib-etcd.mount
              Before=crio.service (4)
    
              Requisite=var-lib-etcd.mount
              ConditionPathExists=!/var/lib/etcd/member
              ConditionPathIsDirectory=/sysroot/ostree/deploy/rhcos/var/lib/etcd/member (5)
    
              [Service]
              Type=oneshot
              RemainAfterExit=yes
    
              ExecStart=/bin/bash -c "if [ -d /var/lib/etcd/member.migrate ]; then rm -rf /var/lib/etcd/member.migrate; fi" (6)
    
              ExecStart=/usr/bin/cp -aZ /sysroot/ostree/deploy/rhcos/var/lib/etcd/member/ /var/lib/etcd/member.migrate
              ExecStart=/usr/bin/mv /var/lib/etcd/member.migrate /var/lib/etcd/member (7)
    
              [Install]
              RequiredBy=var-lib-etcd.mount
            enabled: true
            name: migrate-to-local-etcd.service
          - contents: |
              [Unit]
              Description=Relabel /var/lib/etcd
    
              After=migrate-to-local-etcd.service
              Before=crio.service
    
              [Service]
              Type=oneshot
              RemainAfterExit=yes
    
              ExecCondition=/bin/bash -c "[ -n \"$(restorecon -nv /var/lib/etcd)\" ]" (8)
    
              ExecStart=/usr/sbin/restorecon -R /var/lib/etcd
    
              [Install]
              RequiredBy=var-lib-etcd.mount
            enabled: true
            name: relabel-var-lib-etcd.service
    1 The etcd database must be mounted by the device, not a label, to ensure that systemd generates the device dependency used in this config to trigger filesystem creation.
    2 Do not run if the file system dev/disk/by-label/local-etcd already exists.
    3 Fails with an alert message if /dev/disk/by-label/ephemeral0 doesn’t exist.
    4 Migrates existing data to local etcd database. This config does so after /var/lib/etcd is mounted, but before CRI-O starts so etcd is not running yet.
    5 Requires that etcd is mounted and does not contain a member directory, but the ostree does.
    6 Cleans up any previous migration state.
    7 Copies and moves in separate steps to ensure atomic creation of a complete member directory.
    8 Performs a quick check of the mount point directory before performing a full recursive relabel. If restorecon in the file path /var/lib/etcd cannot rename the directory, the recursive rename is not performed.
  8. Create the new MachineConfig object by running the following command:

    $ oc create -f 98-var-lib-etcd.yaml

    Moving the etcd database onto the local disk of each control plane machine takes time.

  9. Verify that the etcd databases has been transferred to the local disk of each control plane by running the following commands:

    1. Verify that the cluster is still updating by running the following command:

      $ oc wait --timeout=45m --for=condition=Updating=false machineconfigpool/master
    2. Verify that the cluster is ready by running the following command:

      $ oc wait node --selector='node-role.kubernetes.io/master' --for condition=Ready --timeout=30s
    3. Verify that the cluster Operators are running in the cluster by running the following command:

      $ oc wait clusteroperators --timeout=30m --all --for=condition=Progressing=false