Hive metastore is responsible for storing all the metadata about the database tables created in Presto and Hive. By default, the metastore stores this information in a local embedded Derby database in a PersistentVolume attached to the pod.

Generally, the default configuration of the Hive metastore works for small clusters, but users may wish to improve performance or move storage requirements out of cluster by using a dedicated SQL database for storing the Hive metastore data.

Configuring PersistentVolumes

By default, Hive requires one PersistentVolume to operate.

hive-metastore-db-data is the main PersistentVolumeClaim (PVC) required by default. This PVC is used by the Hive metastore to store metadata about tables, such as table name, columns, and location. Hive metastore is used by Presto and the Hive server to look up table metadata when processing queries. You remove this requirement by using MySQL or PostgreSQL for the Hive metastore database.

To install, Hive metastore requires that dynamic volume provisioning be enabled via a StorageClass, a persistent volume of the correct size must be manually pre-created, or that you use a pre-existing MySQL or PostgreSQL database.

Configuring the storage class for Hive metastore

To configure and specify a StorageClass for the hive-metastore-db-data PVC, specify the StorageClass in your MeteringConfig. An example StorageClass section is included in metastore-storage.yaml file below.

apiVersion: metering.openshift.io/v1
kind: MeteringConfig
metadata:
  name: "operator-metering"
spec:
  hive:
    spec:
      metastore:
        storage:
          # Default is null, which means using the default storage class if it exists.
          # If you wish to use a different storage class, specify it here
          # class: "null" (1)
          size: "5Gi"
1 Uncomment this line and replace null with the name of the StorageClass to use. Leaving the value null will cause metering to use the default StorageClass for the cluster.

Configuring the volume sizes for the Hive Metastore

Use the metastore-storage.yaml file below as a template.

apiVersion: metering.openshift.io/v1
kind: MeteringConfig
metadata:
  name: "operator-metering"
spec:
  hive:
    spec:
      metastore:
        storage:
          # Default is null, which means using the default storage class if it exists.
          # If you wish to use a different storage class, specify it here
          # class: "null"
          size: "5Gi" (1)
1 Replace the value for size with your desired capacity. The example file shows "5Gi".

Using MySQL or PostgreSQL for the Hive metastore

The default installation of metering configures Hive to use an embedded Java database called Derby. This is unsuited for larger environments and can be replaced with either a MySQL or PostgreSQL database. Use the following example configuration files if your deployment requires a MySQL or PostgreSQL database for Hive.

There are three configuration options you can use to control the database that is used by Hive metastore: url, driver, and secretName.

Create your MySQL or Postgres instance with a user name and password. Then create a secret by using the OpenShift CLI (oc) or a YAML file. The secretName you create for this secret must map to the spec.hive.spec.config.db.secretName field in the MeteringConfig object resource.

Procedure
  1. Create a secret using the OpenShift CLI (oc) or by using a YAML file:

    • Create a secret by using the following command:

      $ oc --namespace openshift-metering create secret generic <YOUR_SECRETNAME> --from-literal=username=<YOUR_DATABASE_USERNAME> --from-literal=password=<YOUR_DATABASE_PASSWORD>
    • Create a secret by using a YAML file. For example:

      apiVersion: v1
      kind: Secret
      metadata:
        name: <YOUR_SECRETNAME> (1)
      data:
        username: <BASE64_ENCODED_DATABASE_USERNAME> (2)
        password: <BASE64_ENCODED_DATABASE_PASSWORD> (3)
      1 The name of the secret.
      2 Base64 encoded database user name.
      3 Base64 encoded database password.
  2. Create a configuration file to use a MySQL or PostgreSQL database for Hive:

    • To use a MySQL database for Hive, use the example configuration file below. Metering supports configuring the internal Hive metastore to use the MySQL server versions 5.6, 5.7, and 8.0.

      spec:
        hive:
          spec:
            metastore:
              storage:
                create: false
            config:
              db:
                url: "jdbc:mysql://mysql.example.com:3306/hive_metastore" (1)
                driver: "com.mysql.cj.jdbc.Driver"
                secretName: "REPLACEME" (2)

      When configuring Metering to work with older MySQL server versions, such as 5.6 or 5.7, you might need to add the enabledTLSProtocols JDBC URL parameter when configuring the internal Hive metastore.

      1 To use the TLS v1.2 cipher suite, set url to "jdbc:mysql://<hostname>:<port>/<schema>?enabledTLSProtocols=TLSv1.2".
      2 The name of the secret containing the base64-encrypted user name and password database credentials.

      You can pass additional JDBC parameters using the spec.hive.config.url. For more details, see the MySQL Connector/J 8.0 documentation.

    • To use a PostgreSQL database for Hive, use the example configuration file below:

      spec:
        hive:
          spec:
            metastore:
              storage:
                create: false
            config:
              db:
                url: "jdbc:postgresql://postgresql.example.com:5432/hive_metastore"
                driver: "org.postgresql.Driver"
                username: "REPLACEME"
                password: "REPLACEME"

      You can pass additional JDBC parameters using the spec.hive.config.url. For more details, see the PostgreSQL JDBC driver documentation.