Deploying a Connector
A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 5.x documentation.
The Greenplum Database administrator deploys a connector to a Greenplum Database cluster by first registering the connector and its dependencies, and then synchronizing the PXF configuration files across the Greenplum Database cluster. After deploying the connector, the administrator must restart the PXF agent.
Identifying Connector Run-Time Dependencies
You must identify your connector’s run-time dependencies on other JAR files.
You must also identify any dependencies that your connector has on third-party commands or other components on the system. These commands and components must be installed on all Greenplum Database segment hosts. Any programs on which your connector depends must be executable by the gpadmin
operating system user.
Registering Connector Run-Time Dependencies
The PXF agent determines certain runtime dependencies from files found in the $GPHOME/pxf/conf
, $PXF_CONF/conf
, and $PXF_CONF/lib
directories. Only the Greenplum Database administrative user can register PXF dependencies.
The $GPHOME/pxf/conf/pxf-private.classpath
configuration file identifies the runtime dependencies for the PXF agent itself. This file also identifies the runtime dependencies for the PXF built-in HDFS, Hive, and HBase connectors.
The Greenplum Database administrator registers a third-party connector and its runtime dependencies by copying the files to the $PXF_CONF/lib
directory, synchronizing the PXF configuration across the cluster, and then restarting the PXF agent on each segment host.
Deploying to the Greenplum Database Cluster
When a Greenplum Database administrator deploys a new connector, they:
- Copy the connector JAR file and the (JAR) files of any dependencies to the Greenplum Database master host
- Synchronize the PXF configuration to each Greenplum Database segment host
- Restart the PXF agent on each host
Only the Greenplum Database administrative user can restart the PXF agent.
For example, if PXF_CONF=/usr/local/greenplum-pxf
, these commands deploy the connector across the cluster:
gpadmin@gpmaster$ cp my-connector.jar $PXF_CONF/lib/
gpadmin@gpmaster$ cp connector-dependency.jar $PXF_CONF/lib/
gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster sync
Restart PXF on each Greenplum Database segment host as described in Restarting PXF.
The administrator must also install any third-party commands or other components used by the connector on all Greenplum Database segment hosts and ensure that these programs are executable by the gpadmin
operating system user.
Verifying Connector Deployment
To verify that you deployed a connector successfully, you create a Greenplum Database external table and invoke SELECT
and/or INSERT
commands on the table to test read and write operations on the external data source.
The Greenplum Database end user accesses an external data source by invoking a CREATE EXTERNAL TABLE
command specifying the pxf
protocol. (Refer to the CREATE EXTERNAL TABLE reference page for more about this Greenplum Database command.)
The LOCATION
clause of a CREATE EXTERNAL TABLE
command specifying the pxf
protocol is a URI that specifies the external data source and the location, path to, or name of the data. The query portion of the pxf
protocol URI, introduced by a question mark (?), must include a profile name or the fully-qualified plug-in class names of the connector.
Creating an External Table using PXF in the PXF end user documentation details how to create an external table when specifying a profile name. The syntax for a CREATE EXTERNAL TABLE
command that specifies the Java plug-in class names in the LOCATION
clause follows:
CREATE [WRITABLE] EXTERNAL TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table> )
LOCATION('pxf://<path-to-data>?[FRAGMENTER=<fragmenter_class>
&]ACCESSOR=<accessor_class>
&RESOLVER=<resolver_class>
[&<custom-option>=<value>[...]]')
FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
The LOCATION
clause identifies the Java plug-in classes that PXF will use to split (FRAGMENTER
), read and/or write (ACCESSOR
), and deserialize/serialize (RESOLVER
) the external data.
Example: Deploying the Demo Connector
In this exercise you deploy the Demo connector, its dependencies, and configuration files and verify the deploy operation.
About the Demo Connector
Recall that your copy of the Demo connector resides in the org.greenplum.pxf.example.demo
package.
Plug-in Classes
The Demo connector read operation uses the following plug-in classes:
- Fragmenter -
org.greenplum.pxf.example.demo.DemoFragmenter
- Read Accessor -
org.greenplum.pxf.example.demo.DemoAccessor
- Read Resolver -
org.greenplum.pxf.example.demo.DemoTextResolver
The Demo connector write operation uses the following plug-in classes:
- Write Accessor -
org.greenplum.pxf.example.demo.DemoFileWritableAccessor
- Write Resolver -
org.greenplum.pxf.example.demo.DemoTextResolver
Notice that the Demo connector read and write operations share the same Resolver class but use different Accessor classes.
Run-Time Dependencies
The Demo connector has a run-time dependency on the Apache Commons Logging JAR, commons-logging.jar
. This JAR file is installed with PXF and should already be specified in the pxf-private.classpath
file.
Prerequisites
Before attempting this exercise, ensure that you have:
- Built the Demo connector as described in Example: Building the Demo Connector JAR File.
- Administrative access to a running Greenplum Database cluster.
- Initialized, configured, and started the PXF agent on each Greenplum Database segment host as described in Configuring PXF.
- Enabled the PXF extension in the database, and optionally granted specific Greenplum Database roles access to the
pxf
protocol; Enabling/Disabling PXF and Granting Access to PXF describe these procedures.
Procedure
Perform the following procedure to deploy the Demo connector and to verify that you deployed the connector successfully:
Log in to the Greenplum Database master node as an administrative user:
$ ssh gpadmin@<gpmaster>
Copy the Demo connector JAR file that you previously built to the
$PXF_CONF/lib
directory on the Greenplum Database master host. ReplacePXFDEV_BASE
with the absolute path to your PXF development work area:gpadmin@gpmaster$ scp user@devsystem:/PXFDEV_BASE/demo_example/build/libs/my-demo-connector.jar $PXF_CONF/lib/my-demo-connector.jar
Copy the JAR file to all Greenplum Database segment hosts by synchronizing the PXF configuration. For example:
gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster sync
Restart PXF on each Greenplum Database segment host as described in Restarting PXF.
Verify that you correctly deployed the Demo connector by creating and accessing Greenplum Database readable and writable external tables that specify the Demo connector plug-ins:
Connect to a database in which you created the PXF extension as the
gpadmin
user. For example, to connect to a database namedpxf_exampledb
:gpadmin@gpmaster$ psql -d pxf_exampledb -U gpadmin
Create a readable Greenplum external table to exercise the Demo connector read operation. For example:
pxf_exampledb=# CREATE EXTERNAL TABLE demo_tbl_read (a TEXT, b TEXT, c TEXT) LOCATION ('pxf://default/tmp/dummy1?FRAGMENTER=org.greenplum.pxf.example.demo.DemoFragmenter&ACCESSOR=org.greenplum.pxf.example.demo.DemoAccessor&RESOLVER=org.greenplum.pxf.example.demo.DemoTextResolver') FORMAT 'TEXT' (DELIMITER ','); CREATE EXTERNAL TABLE
(The Demo connector read operation returns static data. You could have specified any file path in the
LOCATION
clause; it will be ignored by the Demo connector.)Query the
demo_tbl_read
table:pxf_exampledb=# SELECT * from demo_tbl_read; a | b | c ----------------+--------+-------- fragment2 row1 | value1 | value2 fragment2 row2 | value1 | value2 fragment1 row1 | value1 | value2 fragment1 row2 | value1 | value2 fragment3 row1 | value1 | value2 fragment3 row2 | value1 | value2 (6 rows)
Create a writable Greenplum external table to exercise the Demo connector write operation. For example:
pxf_exampledb=# CREATE WRITABLE EXTERNAL TABLE demo_tbl_write (a TEXT, b TEXT, c TEXT) LOCATION ('pxf://tmp/demo_write_1?ACCESSOR=org.greenplum.pxf.example.demo.DemoFileWritableAccessor&RESOLVER=org.greenplum.pxf.example.demo.DemoTextResolver') FORMAT 'TEXT' (DELIMITER ','); CREATE EXTERNAL TABLE
Write some text data into the
demo_tbl_write
table. For example:pxf_exampledb=# INSERT INTO demo_tbl_write VALUES ('x', 'y', 'z'); INSERT 0 1 pxf_exampledb=# INSERT INTO demo_tbl_write VALUES ('u', 'v', 'w'); INSERT 0 1 pxf_exampledb=# INSERT INTO demo_tbl_write VALUES ('r', 's', 't'); INSERT 0 1
Each
INSERT
command writes a file to the directory named/tmp/demo_write_1
on the local file system.View the contents of the
/tmp/demo_write_1
directory on the local file system. For example:gpadmin@gpmaster$ cat /tmp/demo_write_1/* x,y,z u,v,w r,s,t
You successfully deployed the Demo connector. If you choose, you may deploy read and write profiles for the Demo connector as a convenience to the end user. See Deploying a Profile for profile definition and deployment instructions.