Configuring PXF Servers

A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 5.x documentation.

This topic provides an overview of PXF server configuration. To configure a server, refer to the topic specific to the connector that you want to configure.

You read from or write data to an external data store via a PXF connector. To access an external data store, you must provide the server location. You may also be required to provide client access credentials and other external data store-specific properties. PXF simplifies configuring access to external data stores by:

  • Supporting file-based connector and user configuration
  • Providing connector-specific template configuration files

A PXF Server definition is simply a named configuration that provides access to a specific external data store. A PXF server name is the name of a directory residing in $PXF_CONF/servers/. The information that you provide in a server configuration is connector-specific. For example, a PXF JDBC Connector server definition may include settings for the JDBC driver class name, URL, username, and password. You can also configure connection-specific and session-specific properties in a JDBC server definition.

PXF provides a server template file for each connector; this template identifies the typical set of properties that you must configure to use the connector.

You will configure a server definition for each external data store that Greenplum Database users need to access. For example, if you require access to two Hadoop clusters, you will create a PXF Hadoop server configuration for each cluster. If you require access to an Oracle and a MySQL database, you will create one or more PXF JDBC server configurations for each database.

A server configuration may include default settings for user access credentials and other properties for the external data store. You can allow Greenplum Database users to access the external data store using the default settings, or you can configure access and other properties on a per-user basis. This allows you to configure different Greenplum Database users with different external data store access credentials in a single PXF server definition.

About Server Template Files

The configuration information for a PXF server resides in one or more <connector>-site.xml files in $PXF_CONF/servers/<server_name>/.

PXF provides a template configuration file for each connector. These server template configuration files are located in the $PXF_CONF/templates/ directory after you initialize PXF:

gpadmin@gpmaster$ ls $PXF_CONF/templates
adl-site.xml   hbase-site.xml  jdbc-site.xml    s3-site.xml
core-site.xml  hdfs-site.xml   mapred-site.xml  wasbs-site.xml
gs-site.xml    hive-site.xml   minio-site.xml   yarn-site.xml

For example, the contents of the s3-site.xml template file follow:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <property>
        <name>fs.s3a.access.key</name>
        <value>YOUR_AWS_ACCESS_KEY_ID</value>
    </property>
    <property>
        <name>fs.s3a.secret.key</name>
        <value>YOUR_AWS_SECRET_ACCESS_KEY</value>
    </property>
    <property>
        <name>fs.s3a.fast.upload</name>
        <value>true</value>
    </property>
</configuration>
You specify credentials to PXF in clear text in configuration files.

Note: The template files for the Hadoop connectors are not intended to be modified and used for configuration, as they only provide an example of the information needed. Instead of modifying the Hadoop templates, you will copy several Hadoop *-site.xml files from the Hadoop cluster to your PXF Hadoop server configuration.

About the Default Server

PXF defines a special server named default. When you initialize PXF, it automatically creates a $PXF_CONF/servers/default/ directory. This directory, initially empty, identifies the default PXF server configuration. You can configure and assign the default PXF server to any external data source. For example, you can assign the PXF default server to a Hadoop cluster, or to a MySQL database that your users frequently access.

PXF automatically uses the default server configuration if you omit the SERVER=<server_name> setting in the CREATE EXTERNAL TABLE command LOCATION clause.

Note: You must configure a Hadoop server as the PXF default server when your Hadoop cluster utilizes Kerberos authentication.

Configuring a Server

When you configure a PXF connector to an external data store, you add a named PXF server configuration for the connector. Among the tasks that you perform, you may:

  1. Determine if you are configuring the default PXF server, or choose a new name for the server configuration.
  2. Create the directory $PXF_CONF/servers/<server_name>.
  3. Copy template or other configuration files to the new server directory.
  4. Fill in appropriate default values for the properties in the template file.
  5. Add any additional configuration properties and values required for your environment.
  6. Configure one or more users for the server configuration as described in About Configuring a PXF User.
  7. Synchronize the server and user configuration to the Greenplum Database cluster.

Note: You must re-sync the PXF configuration to the Greenplum Database cluster after you add or update PXF server configuration.

After you configure a PXF server, you publish the server name to Greenplum Database users who need access to the data store. A user only needs to provide the server name when they create an external table that accesses the external data store. PXF obtains the external data source location and access credentials from server and user configuration files residing in the server configuration directory identified by the server name.

To configure a PXF server, refer to the connector configuration topic:

Configuring a PXF User

You can configure access to an external data store on a per-server, per-Greenplum-user basis.

You configure external data store user access credentials and properties for a specific Greenplum Database user by providing a <greenplum_user_name>-user.xml user configuration file in the PXF server configuration directory, $PXF_CONF/servers/<server_name>/. For example, you specify the properties for the Greenplum Database user named bill in the file $PXF_CONF/servers/<server_name>/bill-user.xml. You can configure zero, one, or more users in a PXF server configuration.

The properties that you specify in a user configuration file are connector-specific. You can specify any configuration property supported by the PXF connector server in a <greenplum_user_name>-user.xml configuration file.

For example, suppose you have configured access to a PostgreSQL database in the PXF JDBC server configuration named pgsrv1. To allow the Greenplum Database user named bill to access this database as the PostgreSQL user named pguser1, password changeme, you create the user configuration file $PXF_CONF/servers/pgsrv1/bill-user.xml with the following properties:

<configuration>
    <property>
        <name>jdbc.user</name>
        <value>pguser1</value>
    </property>
    <property>
        <name>jdbc.password</name>
        <value>changeme</value>
    </property>
</configuration>

If you want to configure a specific search path and a larger read fetch size for bill, you would also add the following properties to the bill-user.xml user configuration file:

    <property>
        <name>jdbc.session.property.search_path</name>
        <value>bill_schema</value>
    </property>
    <property>
        <name>jdbc.statement.fetchSize</name>
        <value>2000</value>
    </property>

Procedure

For each PXF user that you want to configure, you will:

  1. Identify the name of the Greenplum Database user.
  2. Identify the PXF server definition for which you want to configure user access.
  3. Identify the name and value of each property that you want to configure for the user.
  4. Create/edit the file $PXF_CONF/servers/<server_name>/<greenplum_user_name>-user.xml, and add the outer configuration block:

    <configuration>
    </configuration>
    
  5. Add each property/value pair that you identified in Step 3 within the configuration block in the <greenplum_user_name>-user.xml file.

  6. If you are adding the PXF user configuration to previously configured PXF server definition, synchronize the user configuration to the Greenplum Database cluster.

About Configuration Property Precedence

A PXF server configuration may include default settings for user access credentials and other properties for accessing an external data store. Some PXF connectors, such as the S3 and JDBC connectors, allow you to directly specify certain server properties via custom options in the CREATE EXTERNAL TABLE command LOCATION clause. A <greenplum_user_name>-user.xml file specifies property settings for an external data store that are specific to a Greenplum Database user.

For a given Greenplum Database user, PXF uses the following precedence rules (highest to lowest) to obtain configuration property settings for the user:

  1. A property that you configure in <server_name>/<greenplum_user_name>-user.xml overrides any setting of the property elsewhere.
  2. A property that is specified via custom options in the CREATE EXTERNAL TABLE command LOCATION clause overrides any setting of the property in a PXF server configuration.
  3. Properties that you configure in the <server_name> PXF server definition identify the default property values.

These precedence rules allow you create a single external table that can be accessed by multiple Greenplum Database users, each with their own unique external data store user credentials.

Using a Server Configuration

To access an external data store, the Greenplum Database user specifies the server name in the CREATE EXTERNAL TABLE command LOCATION clause SERVER=<server_name> option. The <server_name> that the user provides identifies the server configuration directory from which PXF obtains the configuration and credentials to access the external data store.

For example, the following command accesses an S3 object store using the server configuration defined in the $PXF_CONF/servers/s3srvcfg/s3-site.xml file:

CREATE EXTERNAL TABLE pxf_ext_tbl(name text, orders int)
  LOCATION ('pxf://BUCKET/dir/file.txt?PROFILE=s3:text&SERVER=s3srvcfg')
FORMAT 'TEXT' (delimiter=E',');

PXF automatically uses the default server configuration when no SERVER=<server_name> setting is provided.

For example, if the default server configuration identifies a Hadoop cluster, the following example command references the HDFS file located at /path/to/file.txt:

CREATE EXTERNAL TABLE pxf_ext_hdfs(location text, miles int)
  LOCATION ('pxf://path/to/file.txt?PROFILE=hdfs:text')
FORMAT 'TEXT' (delimiter=E',');
A Greenplum Database user who queries or writes to an external table accesses the external data store with the credentials configured for the <server_name> user. If no user-specific credentials are configured for <server_name>, the Greenplum user accesses the external data store with the default credentials configured for <server_name>.