Configuring Connectors to Minio and S3 Object Stores (Optional)

You can use PXF to access S3-compatible object stores. This topic describes how to configure the PXF connectors to these external data sources.

If you do not plan to use these PXF object store connectors, then you do not need to perform this procedure.

About Object Store Configuration

To access data in an object store, you must provide a server location and client credentials. When you configure a PXF object store connector, you add at least one named PXF server configuration for the connector as described in Configuring PXF Servers.

PXF provides a template configuration file for each object store connector. These template files are located in the $PXF_CONF/templates/ directory.

Minio Server Configuration

The template configuration file for Minio is $PXF_CONF/templates/minio-site.xml. When you configure a Minio server, you must provide the following server configuration properties and replace the template values with your credentials:

Property Description Value
fs.s3a.endpoint The Minio S3 endpoint to which to connect. Your endpoint.
fs.s3a.access.key The Minio account access key ID. Your access key.
fs.s3a.secret.key The secret key associated with the Minio access key ID. Your secret key.

S3 Server Configuration

The template configuration file for S3 is $PXF_CONF/templates/s3-site.xml. When you configure an S3 server, you must provide the following server configuration properties and replace the template values with your credentials:

Property Description Value
fs.s3a.access.key The AWS account access key ID. Your access key.
fs.s3a.secret.key The secret key associated with the AWS access key ID. Your secret key.

If required, fine-tune PXF S3 connectivity by specifying properties identified in the S3A section of the Hadoop-AWS module documentation in your s3-site.xml server configuration file.

You can override the credentials for an S3 server configuration by directly specifying the S3 access ID and secret key via custom options in the CREATE EXTERNAL TABLE command LOCATION clause. Refer to Overriding the S3 Server Configuration with DDL for additional information.

Configuring S3 Server-Side Encryption

PXF supports Amazon Web Service S3 Server-Side Encryption (SSE) for S3 files that you access with readable and writable Greenplum Database external tables that specify the pxf protocol and an s3:* profile. AWS S3 server-side encryption protects your data at rest; it encrypts your object data as it writes to disk, and transparently decrypts the data for you when you access it.

PXF supports the following AWS SSE encryption key management schemes:

  • SSE with S3-Managed Keys (SSE-S3) - Amazon manages the data and master encryption keys.
  • SSE with Key Management Service Managed Keys (SSE-KMS) - Amazon manages the data key, and you manage the encryption key in AWS KMS.
  • SSE with Customer-Provided Keys (SSE-C) - You set and manage the encryption key.

Your S3 access key and secret key govern your access to all S3 bucket objects, whether the data is encrypted or not.

S3 transparently decrypts data during a read operation of an encrypted file that you access via a readable external table that is created by specifying the pxf protocol and an s3:* profile. No additional configuration is required.

To encrypt data that you write to S3 via this type of external table, you have two options:

  • Configure the default SSE encryption key management scheme on a per-S3-bucket basis via the AWS console or command line tools (recommended).
  • Configure SSE encryption options in your PXF S3 server s3-site.xml configuration file.

Configuring SSE via an S3 Bucket Policy (Recommended)

You can create S3 Bucket Policy(s) that identify the objects that you want to encrypt, the encryption key management scheme, and the write actions permitted on those objects. Refer to Protecting Data Using Server-Side Encryption in the AWS S3 documentation for more information about the SSE encryption key management schemes. How Do I Enable Default Encryption for an S3 Bucket? describes how to set default encryption bucket policies.

Specifying SSE Options in a PXF S3 Server Configuration

You must include certain properties in s3-site.xml to configure server-side encryption in a PXF S3 server configuration. The properties and values that you add to the file are dependent upon the SSE encryption key management scheme.

SSE-S3

To enable SSE-S3 on any file that you write to any S3 bucket, set the following encryption algorithm property and value in the s3-site.xml file:

<property>
  <name>fs.s3a.server-side-encryption-algorithm</name>
  <value>AES256</value>
</property>

To enable SSE-S3 for a specific S3 bucket, use the property name variant that includes the bucket name. For example:

<property>
  <name>fs.s3a.bucket.YOUR_BUCKET1_NAME.server-side-encryption-algorithm</name>
  <value>AES256</value>
</property>

Replace YOUR_BUCKET1_NAME with the name of the S3 bucket.

SSE-KMS

To enable SSE-KMS on any file that you write to any S3 bucket, set both the encryption algorithm and encryption key ID. To set these properties in the s3-site.xml file:

<property>
  <name>fs.s3a.server-side-encryption-algorithm</name>
  <value>SSE-KMS</value>
</property>
<property>
  <name>fs.s3a.server-side-encryption.key</name>
  <value>YOUR_AWS_SSE_KMS_KEY_ARN</value>
</property>

Substitute YOUR_AWS_SSE_KMS_KEY_ARN with your key resource name. If you do not specify an encryption key, the default key defined in the Amazon KMS is used. Example KMS key: arn:aws:kms:us-west-2:123456789012:key/1a23b456-7890-12cc-d345-6ef7890g12f3.

Note: Be sure to create the bucket and the key in the same Amazon Availability Zone.

To enable SSE-KMS for a specific S3 bucket, use property name variants that include the bucket name. For example:

<property>
  <name>fs.s3a.bucket.YOUR_BUCKET2_NAME.server-side-encryption-algorithm</name>
  <value>SSE-KMS</value>
</property>
<property>
  <name>fs.s3a.bucket.YOUR_BUCKET2_NAME.server-side-encryption.key</name>
  <value>YOUR_AWS_SSE_KMS_KEY_ARN</value>
</property>

Replace YOUR_BUCKET2_NAME with the name of the S3 bucket.

SSE-C

To enable SSE-C on any file that you write to any S3 bucket, set both the encryption algorithm and the encryption key (base-64 encoded). All clients must share the same key.

To set these properties in the s3-site.xml file:

<property>
  <name>fs.s3a.server-side-encryption-algorithm</name>
  <value>SSE-C</value>
</property>
<property>
  <name>fs.s3a.server-side-encryption.key</name>
  <value>YOUR_BASE64-ENCODED_ENCRYPTION_KEY</value>
</property>

To enable SSE-C for a specific S3 bucket, use the property name variants that include the bucket name as described in the SSE-KMS example.

Example Server Configuration Procedure

Ensure that you have initialized PXF before you configure an object store connector server.

In this procedure, you name and add a PXF server configuration in the $PXF_CONF/servers directory on the Greenplum Database master host for the S3 Cloud Storage connector. You then use the pxf cluster sync command to sync the server configuration(s) to the Greenplum Database cluster.

  1. Log in to your Greenplum Database master node:

    $ ssh gpadmin@<gpmaster>
    
  2. Choose a name for the server. You will provide the name to end users that need to reference files in the object store.

  3. Create the $PXF_CONF/servers/<server_name> directory. For example, use the following command to create a server configuration for an S3 server named s3_user1cfg:

    gpadmin@gpmaster$ mkdir $PXF_CONF/servers/s3_user1cfg
    
  4. Copy the PXF template file for S3 to the server configuration directory. For example:

    gpadmin@gpmaster$ cp $PXF_CONF/templates/s3-site.xml $PXF_CONF/servers/s3_user1cfg/
    
  5. Open the template server configuration file in the editor of your choice, and provide appropriate property values for your environment. For example:

    <?xml version="1.0" encoding="UTF-8"?>
    <configuration>
        <property>
            <name>fs.s3a.access.key</name>
            <value>access_key_for_user1</value>
        </property>
        <property>
            <name>fs.s3a.secret.key</name>
            <value>secret_key_for_user1</value>
        </property>
        <property>
            <name>fs.s3a.fast.upload</name>
            <value>true</value>
        </property>
    </configuration>
    
  6. Save your changes and exit the editor.

  7. Use the pxf cluster sync command to copy the new server configuration to the Greenplum Database cluster. For example:

    gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster sync