Deploying a Connector
A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 5.x documentation.
The Greenplum Database administrator deploys a connector to a Greenplum Database cluster by first registering the connector and its dependencies, and then copying relevant JAR and configuration files across the Greenplum Database cluster. After deploying the connector, the administrator must restart the PXF agent.
You must identify your connector’s run-time dependencies on other JAR files.
You must also identify any dependencies that your connector has on third-party commands or other components on the system. These commands and components must be installed on all Greenplum Database segment hosts. Any programs on which your connector depends must be executable by the
gpadmin operating system user.
The PXF agent determines certain runtime dependencies from configuration files found in the
$GPHOME/pxf/conf directory. Only the Greenplum Database administrative user can register PXF dependencies.
pxf-private.classpath configuration file identifies the runtime dependencies for the PXF agent itself. This file also identifies the runtime dependencies for the PXF built-in HDFS, Hive, and HBase connectors.
The Greenplum Database administrator registers a third-party connector and its runtime dependencies in the
pxf-public.classpath configuration file. Entries in the
pxf-public.classpath file identify the absolute path of a JAR file or a configuration directory, and are listed one per line.
pxf-public.classpath file on a Greenplum Database segment host is shared by all third-party connectors running in the cluster.
When the Greenplum Database administrator deploys a new connector, they copy the connector JAR file and the (JAR) files of any dependencies to each Greenplum Database segment host. The administrator must also propagate any updates to the
pxf-public.classpath configuration file to each segment host, and then must restart the PXF agent on each host. Only the Greenplum Database administrative user can restart the PXF agent.
For example, if
seghostfile contains a list, one-host-per-line, of the segment hosts in your Greenplum Database cluster, these commands deploy the connector across the cluster:
gpadmin@gpmaster$ gpscp my-connector.jar -v -f seghostfile =:/my/jar/install/dir gpadmin@gpmaster$ gpscp connector-dependency.jar -v -seghostfile /dependency/install/dir gpadmin@gpmaster$ gpscp -v -f seghostfile $GPHOME/pxf/conf/pxf-public.classpath =:/usr/local/greenplum-db/pxf/conf/pxf-public.classpath gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf restart"
The administrator must also install any third-party commands or other components used by the connector on all Greenplum Database segment hosts and ensure that these programs are executable by the
gpadmin operating system user.
To verify that you deployed a connector successfully, you create a Greenplum Database external table and invoke
INSERT commands on the table to test read and write operations on the external data source.
The Greenplum Database end user accesses an external data source by invoking a
CREATE EXTERNAL TABLE command specifying the
pxf protocol. (Refer to the CREATE EXTERNAL TABLE reference page for more about this Greenplum Database command.)
LOCATION clause of a
CREATE EXTERNAL TABLE command specifying the
pxf protocol is a URI that specifies the external data source and the location, path to, or name of the data. The query portion of the
pxf protocol URI, introduced by a question mark (?), must include a profile name or the fully-qualified plug-in class names of the connector.
Creating an External Table using PXF in the PXF end user documentation details how to create an external table when specifying a profile name. The syntax for a
CREATE EXTERNAL TABLE command that specifies the Java plug-in class names in the
LOCATION clause follows:
CREATE [WRITABLE] EXTERNAL TABLE <table_name> ( <column_name> <data_type> [, ...] | LIKE <other_table> ) LOCATION('pxf://<path-to-data>?[FRAGMENTER=<fragmenter_class> &]ACCESSOR=<accessor_class> &RESOLVER=<resolver_class> [&<custom-option>=<value>[...]]') FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
LOCATION clause identifies the Java plug-in classes that PXF will use to split (
FRAGMENTER), read and/or write (
ACCESSOR), and deserialize/serialize (
RESOLVER) the external data.
In this exercise you deploy the Demo connector, its dependencies, and configuration files and verify the deploy operation.
Recall that your copy of the Demo connector resides in the
The Demo connector read operation uses the following plug-in classes:
- Fragmenter -
- Read Accessor -
- Read Resolver -
The Demo connector write operation uses the following plug-in classes:
- Write Accessor -
- Write Resolver -
Notice that the Demo connector read and write operations share the same Resolver class but use different Accessor classes.
The Demo connector has a run-time dependency on the Apache Commons Logging JAR,
commons-logging.jar. This JAR file is installed with a Hadoop distribution and should already be specified in the
Before attempting this exercise, ensure that you have:
- Built the Demo connector as described in Example: Building the Demo Connector JAR File.
- Administrative access to a running Greenplum Database cluster.
- Installed the Hadoop clients and initialized and started the PXF agent on each Greenplum Database segment host as described in Installing and Configuring PXF.
- Enabled the PXF extension in the database, and optionally granted specific Greenplum Database roles access to the
pxfprotocol; Enabling/Disabling PXF and Granting Access to PXF describe these procedures.
Perform the following procedure to deploy the Demo connector and to verify that you deployed the connector successfully:
Log in to the Greenplum Database master node as an administrative user and set up your environment:
$ ssh gpadmin@<gpmaster> gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
Copy the Demo connector JAR file that you previously built to the Greenplum Database master host. For example, to copy the JAR file to the
PXFDEV_BASEwith the absolute path to your PXF development work area:
gpadmin@gpmaster$ scp user@devsystem:/PXFDEV_BASE/demo_example/build/libs/my-demo-connector.jar /tmp/
pxf-public.classpathfile in the editor of your choosing. For example:
gpadmin@gpmaster$ vi $GPHOME/pxf/conf/pxf-public.classpath
Add an entry for the Demo connector JAR file to the
pxf-public.classpathfile. This entry should specify the absolute path of the Demo connector JAR file on the system. For example, if
my-demo-connector.jarresides in the
/tmpdirectory, add the following:
Save the file and exit the editor.
pxf-public.classpathfile to all segment hosts in your Greenplum Database cluster. For example, if
seghostfilecontains a list, one-host-per-line, of the segment hosts in your Greenplum Database cluster:
gpadmin@gpmaster$ gpscp -v -f seghostfile $GPHOME/pxf/conf/pxf-public.classpath =:/usr/local/greenplum-db/pxf/conf/pxf-public.classpath
Copy the Demo connector JAR file to all Greenplum Database segment hosts. Copy the JAR file to the location you specified in the
pxf-public.classpathfile. For example:
gpadmin@gpmaster$ gpscp -v -f seghostfile /tmp/my-demo-connector.jar =:/tmp/my-demo-connector.jar
Restart PXF on each segment host. For example:
gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf restart"
Verify that you correctly deployed the Demo connector by creating and accessing Greenplum Database readable and writable external tables that specify the Demo connector plug-ins:
Connect to a database in which you created the PXF extension as the
gpadminuser. For example, to connect to a database named
gpadmin@gpmaster$ psql -d pxf_exampledb -U gpadmin
Create a readable Greenplum external table to exercise the Demo connector read operation. For example:
pxf_exampledb=# CREATE EXTERNAL TABLE demo_tbl_read (a TEXT, b TEXT, c TEXT) LOCATION ('pxf://default/tmp/dummy1?FRAGMENTER=org.greenplum.pxf.example.demo.DemoFragmenter&ACCESSOR=org.greenplum.pxf.example.demo.DemoAccessor&RESOLVER=org.greenplum.pxf.example.demo.DemoTextResolver') FORMAT 'TEXT' (DELIMITER ','); CREATE EXTERNAL TABLE
(The Demo connector read operation returns static data. You could have specified any file path in the
LOCATIONclause; it will be ignored by the Demo connector.)
pxf_exampledb=# SELECT * from demo_tbl_read; a | b | c ----------------+--------+-------- fragment2 row1 | value1 | value2 fragment2 row2 | value1 | value2 fragment1 row1 | value1 | value2 fragment1 row2 | value1 | value2 fragment3 row1 | value1 | value2 fragment3 row2 | value1 | value2 (6 rows)
Create a writable Greenplum external table to exercise the Demo connector write operation. For example:
pxf_exampledb=# CREATE WRITABLE EXTERNAL TABLE demo_tbl_write (a TEXT, b TEXT, c TEXT) LOCATION ('pxf://tmp/demo_write_1?ACCESSOR=org.greenplum.pxf.example.demo.DemoFileWritableAccessor&RESOLVER=org.greenplum.pxf.example.demo.DemoTextResolver') FORMAT 'TEXT' (DELIMITER ','); CREATE EXTERNAL TABLE
Write some text data into the
demo_tbl_writetable. For example:
pxf_exampledb=# INSERT INTO demo_tbl_write VALUES ('x', 'y', 'z'); INSERT 0 1 pxf_exampledb=# INSERT INTO demo_tbl_write VALUES ('u', 'v', 'w'); INSERT 0 1 pxf_exampledb=# INSERT INTO demo_tbl_write VALUES ('r', 's', 't'); INSERT 0 1
INSERTcommand writes a file to the directory named
/tmp/demo_write_1on the local file system.
View the contents of the
/tmp/demo_write_1directory on the local file system. For example:
gpadmin@gpmaster$ cat /tmp/demo_write_1/* x,y,z u,v,w r,s,t
You successfully deployed the Demo connector. If you choose, you may deploy read and write profiles for the Demo connector as a convenience to the end user. See Deploying a Profile for profile definition and deployment instructions.