Deploying a Connector

A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 5.x documentation.

The Greenplum Database administrator deploys a connector to a Greenplum Database cluster by first registering the connector and its dependencies, and then synchronizing the PXF configuration files across the Greenplum Database cluster. After deploying the connector, the administrator must restart the PXF agent.

Identifying Connector Run-Time Dependencies

You must identify your connector’s run-time dependencies on other JAR files.

You must also identify any dependencies that your connector has on third-party commands or other components on the system. These commands and components must be installed on all Greenplum Database segment hosts. Any programs on which your connector depends must be executable by the gpadmin operating system user.

Registering Connector Run-Time Dependencies

The PXF agent determines certain runtime dependencies from files found in the $GPHOME/pxf/conf, $PXF_CONF/conf, and $PXF_CONF/lib directories. Only the Greenplum Database administrative user can register PXF dependencies.

The $GPHOME/pxf/conf/pxf-private.classpath configuration file identifies the runtime dependencies for the PXF agent itself. This file also identifies the runtime dependencies for the PXF built-in HDFS, Hive, and HBase connectors.

The Greenplum Database administrator registers a third-party connector and its runtime dependencies by copying the files to the $PXF_CONF/lib directory, synchronizing the PXF configuration across the cluster, and then restarting the PXF agent on each segment host.

Deploying to the Greenplum Database Cluster

When a Greenplum Database administrator deploys a new connector, they:

  • Copy the connector JAR file and the (JAR) files of any dependencies to the Greenplum Database master host
  • Synchronize the PXF configuration to each Greenplum Database segment host
  • Restart the PXF agent on each host

Only the Greenplum Database administrative user can restart the PXF agent.

For example, if PXF_CONF=/usr/local/greenplum-pxf, these commands deploy the connector across the cluster:

gpadmin@gpmaster$ cp my-connector.jar $PXF_CONF/lib/
gpadmin@gpmaster$ cp connector-dependency.jar $PXF_CONF/lib/
gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster sync

Restart PXF on each Greenplum Database segment host as described in Restarting PXF.

The administrator must also install any third-party commands or other components used by the connector on all Greenplum Database segment hosts and ensure that these programs are executable by the gpadmin operating system user.

Verifying Connector Deployment

To verify that you deployed a connector successfully, you create a Greenplum Database external table and invoke SELECT and/or INSERT commands on the table to test read and write operations on the external data source.

The Greenplum Database end user accesses an external data source by invoking a CREATE EXTERNAL TABLE command specifying the pxf protocol. (Refer to the CREATE EXTERNAL TABLE reference page for more about this Greenplum Database command.)

The LOCATION clause of a CREATE EXTERNAL TABLE command specifying the pxf protocol is a URI that specifies the external data source and the location, path to, or name of the data. The query portion of the pxf protocol URI, introduced by a question mark (?), must include a profile name or the fully-qualified plug-in class names of the connector.

Creating an External Table using PXF in the PXF end user documentation details how to create an external table when specifying a profile name. The syntax for a CREATE EXTERNAL TABLE command that specifies the Java plug-in class names in the LOCATION clause follows:

CREATE [WRITABLE] EXTERNAL TABLE <table_name>
        ( <column_name> <data_type> [, ...] | LIKE <other_table> )
LOCATION('pxf://<path-to-data>?[FRAGMENTER=<fragmenter_class>
        &]ACCESSOR=<accessor_class>
        &RESOLVER=<resolver_class>
        [&<custom-option>=<value>[...]]')
FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);

The LOCATION clause identifies the Java plug-in classes that PXF will use to split (FRAGMENTER), read and/or write (ACCESSOR), and deserialize/serialize (RESOLVER) the external data.

Example: Deploying the Demo Connector

In this exercise you deploy the Demo connector, its dependencies, and configuration files and verify the deploy operation.

About the Demo Connector

Recall that your copy of the Demo connector resides in the org.greenplum.pxf.example.demo package.

Plug-in Classes

The Demo connector read operation uses the following plug-in classes:

  • Fragmenter - org.greenplum.pxf.example.demo.DemoFragmenter
  • Read Accessor - org.greenplum.pxf.example.demo.DemoAccessor
  • Read Resolver - org.greenplum.pxf.example.demo.DemoTextResolver

The Demo connector write operation uses the following plug-in classes:

  • Write Accessor - org.greenplum.pxf.example.demo.DemoFileWritableAccessor
  • Write Resolver - org.greenplum.pxf.example.demo.DemoTextResolver

Notice that the Demo connector read and write operations share the same Resolver class but use different Accessor classes.

Run-Time Dependencies

The Demo connector has a run-time dependency on the Apache Commons Logging JAR, commons-logging.jar. This JAR file is installed with PXF and should already be specified in the pxf-private.classpath file.

Prerequisites

Before attempting this exercise, ensure that you have:

Procedure

Perform the following procedure to deploy the Demo connector and to verify that you deployed the connector successfully:

  1. Log in to the Greenplum Database master node as an administrative user:

    $ ssh gpadmin@<gpmaster>
    
  2. Copy the Demo connector JAR file that you previously built to the $PXF_CONF/lib directory on the Greenplum Database master host. Replace PXFDEV_BASE with the absolute path to your PXF development work area:

    gpadmin@gpmaster$ scp user@devsystem:/PXFDEV_BASE/demo_example/build/libs/my-demo-connector.jar $PXF_CONF/lib/my-demo-connector.jar
    
  3. Copy the JAR file to all Greenplum Database segment hosts by synchronizing the PXF configuration. For example:

    gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster sync
    
  4. Restart PXF on each Greenplum Database segment host as described in Restarting PXF.

  5. Verify that you correctly deployed the Demo connector by creating and accessing Greenplum Database readable and writable external tables that specify the Demo connector plug-ins:

    1. Connect to a database in which you created the PXF extension as the gpadmin user. For example, to connect to a database named pxf_exampledb:

      gpadmin@gpmaster$ psql -d pxf_exampledb -U gpadmin
      
    2. Create a readable Greenplum external table to exercise the Demo connector read operation. For example:

      pxf_exampledb=# CREATE EXTERNAL TABLE demo_tbl_read (a TEXT, b TEXT, c TEXT)
          LOCATION ('pxf://default/tmp/dummy1?FRAGMENTER=org.greenplum.pxf.example.demo.DemoFragmenter&ACCESSOR=org.greenplum.pxf.example.demo.DemoAccessor&RESOLVER=org.greenplum.pxf.example.demo.DemoTextResolver')
          FORMAT 'TEXT' (DELIMITER ',');
      CREATE EXTERNAL TABLE
      

      (The Demo connector read operation returns static data. You could have specified any file path in the LOCATION clause; it will be ignored by the Demo connector.)

    3. Query the demo_tbl_read table:

      pxf_exampledb=# SELECT * from demo_tbl_read;
             a        |   b    |   c    
      ----------------+--------+--------
       fragment2 row1 | value1 | value2
       fragment2 row2 | value1 | value2
       fragment1 row1 | value1 | value2
       fragment1 row2 | value1 | value2
       fragment3 row1 | value1 | value2
       fragment3 row2 | value1 | value2
      (6 rows)
      
    4. Create a writable Greenplum external table to exercise the Demo connector write operation. For example:

      pxf_exampledb=# CREATE WRITABLE EXTERNAL TABLE demo_tbl_write (a TEXT, b TEXT, c TEXT)
          LOCATION ('pxf://tmp/demo_write_1?ACCESSOR=org.greenplum.pxf.example.demo.DemoFileWritableAccessor&RESOLVER=org.greenplum.pxf.example.demo.DemoTextResolver')
          FORMAT 'TEXT' (DELIMITER ',');
      CREATE EXTERNAL TABLE
      
    5. Write some text data into the demo_tbl_write table. For example:

      pxf_exampledb=# INSERT INTO demo_tbl_write VALUES ('x', 'y', 'z');
      INSERT 0 1
      pxf_exampledb=# INSERT INTO demo_tbl_write VALUES ('u', 'v', 'w');
      INSERT 0 1
      pxf_exampledb=# INSERT INTO demo_tbl_write VALUES ('r', 's', 't');
      INSERT 0 1
      

      Each INSERT command writes a file to the directory named /tmp/demo_write_1 on the local file system.

    6. View the contents of the /tmp/demo_write_1 directory on the local file system. For example:

      gpadmin@gpmaster$ cat /tmp/demo_write_1/*
      x,y,z
      u,v,w
      r,s,t
      

You successfully deployed the Demo connector. If you choose, you may deploy read and write profiles for the Demo connector as a convenience to the end user. See Deploying a Profile for profile definition and deployment instructions.