Deploying a Connector

A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 5.x documentation.

The Greenplum Database administrator deploys a connector to a Greenplum Database cluster by first registering the connector and its dependencies, and then copying relevant JAR and configuration files across the Greenplum Database cluster. After deploying the connector, the administrator must restart the PXF agent.

Identifying Connector Run-Time Dependencies

You must identify your connector’s run-time dependencies on other JAR files.

You must also identify any dependencies that your connector has on third-party commands or other components on the system. These commands and components must be installed on all Greenplum Database segment hosts. Any programs on which your connector depends must be executable by the gpadmin operating system user.

Registering Connector Run-Time Dependencies

The PXF agent determines certain runtime dependencies from configuration files found in the $GPHOME/pxf/conf directory. Only the Greenplum Database administrative user can register PXF dependencies.

The pxf-private.classpath configuration file identifies the runtime dependencies for the PXF agent itself. This file also identifies the runtime dependencies for the PXF built-in HDFS, Hive, and HBase connectors.

The Greenplum Database administrator registers a third-party connector and its runtime dependencies in the pxf-public.classpath configuration file. Entries in the pxf-public.classpath file identify the absolute path of a JAR file or a configuration directory, and are listed one per line.

Note: The pxf-public.classpath file on a Greenplum Database segment host is shared by all third-party connectors running in the cluster.

Deploying to the Greenplum Database Cluster

When the Greenplum Database administrator deploys a new connector, they copy the connector JAR file and the (JAR) files of any dependencies to each Greenplum Database segment host. The administrator must also propagate any updates to the pxf-public.classpath configuration file to each segment host, and then must restart the PXF agent on each host. Only the Greenplum Database administrative user can restart the PXF agent.

For example, if seghostfile contains a list, one-host-per-line, of the segment hosts in your Greenplum Database cluster, these commands deploy the connector across the cluster:

gpadmin@gpmaster$ gpscp my-connector.jar -v -f seghostfile =:/my/jar/install/dir
gpadmin@gpmaster$ gpscp connector-dependency.jar -v -seghostfile /dependency/install/dir
gpadmin@gpmaster$ gpscp -v -f seghostfile $GPHOME/pxf/conf/pxf-public.classpath =:/usr/local/greenplum-db/pxf/conf/pxf-public.classpath
gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf restart"

The administrator must also install any third-party commands or other components used by the connector on all Greenplum Database segment hosts and ensure that these programs are executable by the gpadmin operating system user.

Verifying Connector Deployment

To verify that you deployed a connector successfully, you create a Greenplum Database external table and invoke SELECT and/or INSERT commands on the table to test read and write operations on the external data source.

The Greenplum Database end user accesses an external data source by invoking a CREATE EXTERNAL TABLE command specifying the pxf protocol. (Refer to the CREATE EXTERNAL TABLE reference page for more about this Greenplum Database command.)

The LOCATION clause of a CREATE EXTERNAL TABLE command specifying the pxf protocol is a URI that specifies the external data source and the location, path to, or name of the data. The query portion of the pxf protocol URI, introduced by a question mark (?), must include a profile name or the fully-qualified plug-in class names of the connector.

Creating an External Table using PXF in the PXF end user documentation details how to create an external table when specifying a profile name. The syntax for a CREATE EXTERNAL TABLE command that specifies the Java plug-in class names in the LOCATION clause follows:

CREATE [WRITABLE] EXTERNAL TABLE <table_name>
        ( <column_name> <data_type> [, ...] | LIKE <other_table> )
LOCATION('pxf://<path-to-data>?[FRAGMENTER=<fragmenter_class>
        &]ACCESSOR=<accessor_class>
        &RESOLVER=<resolver_class>
        [&<custom-option>=<value>[...]]')
FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);

The LOCATION clause identifies the Java plug-in classes that PXF will use to split (FRAGMENTER), read and/or write (ACCESSOR), and deserialize/serialize (RESOLVER) the external data.

Example: Deploying the Demo Connector

In this exercise you deploy the Demo connector, its dependencies, and configuration files and verify the deploy operation.

About the Demo Connector

Recall that your copy of the Demo connector resides in the org.greenplum.pxf.example.demo package.

Plug-in Classes

The Demo connector read operation uses the following plug-in classes:

  • Fragmenter - org.greenplum.pxf.example.demo.DemoFragmenter
  • Read Accessor - org.greenplum.pxf.example.demo.DemoAccessor
  • Read Resolver - org.greenplum.pxf.example.demo.DemoTextResolver

The Demo connector write operation uses the following plug-in classes:

  • Write Accessor - org.greenplum.pxf.example.demo.DemoFileWritableAccessor
  • Write Resolver - org.greenplum.pxf.example.demo.DemoTextResolver

Notice that the Demo connector read and write operations share the same Resolver class but use different Accessor classes.

Run-Time Dependencies

The Demo connector has a run-time dependency on the Apache Commons Logging JAR, commons-logging.jar. This JAR file is installed with a Hadoop distribution and should already be specified in the pxf-private.classpath file.

Prerequisites

Before attempting this exercise, ensure that you have:

Procedure

Perform the following procedure to deploy the Demo connector and to verify that you deployed the connector successfully:

  1. Log in to the Greenplum Database master node as an administrative user and set up your environment:

    $ ssh gpadmin@<gpmaster>
    gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
    
  2. Copy the Demo connector JAR file that you previously built to the Greenplum Database master host. For example, to copy the JAR file to the /tmp directory, replace PXFDEV_BASE with the absolute path to your PXF development work area:

    gpadmin@gpmaster$ scp user@devsystem:/PXFDEV_BASE/demo_example/build/libs/my-demo-connector.jar /tmp/
    
  3. Open the pxf-public.classpath file in the editor of your choosing. For example:

    gpadmin@gpmaster$ vi $GPHOME/pxf/conf/pxf-public.classpath
    
  4. Add an entry for the Demo connector JAR file to the pxf-public.classpath file. This entry should specify the absolute path of the Demo connector JAR file on the system. For example, if my-demo-connector.jar resides in the /tmp directory, add the following:

    /tmp/my-demo-connector.jar
    
  5. Save the file and exit the editor.

  6. Copy the pxf-public.classpath file to all segment hosts in your Greenplum Database cluster. For example, if seghostfile contains a list, one-host-per-line, of the segment hosts in your Greenplum Database cluster:

    gpadmin@gpmaster$ gpscp -v -f seghostfile $GPHOME/pxf/conf/pxf-public.classpath =:/usr/local/greenplum-db/pxf/conf/pxf-public.classpath
    
  7. Copy the Demo connector JAR file to all Greenplum Database segment hosts. Copy the JAR file to the location you specified in the pxf-public.classpath file. For example:

    gpadmin@gpmaster$ gpscp -v -f seghostfile /tmp/my-demo-connector.jar =:/tmp/my-demo-connector.jar
    
  8. Restart PXF on each segment host. For example:

    gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf restart"
    
  9. Verify that you correctly deployed the Demo connector by creating and accessing Greenplum Database readable and writable external tables that specify the Demo connector plug-ins:

    1. Connect to a database in which you created the PXF extension as the gpadmin user. For example, to connect to a database named pxf_exampledb:

      gpadmin@gpmaster$ psql -d pxf_exampledb -U gpadmin
      
    2. Create a readable Greenplum external table to exercise the Demo connector read operation. For example:

      pxf_exampledb=# CREATE EXTERNAL TABLE demo_tbl_read (a TEXT, b TEXT, c TEXT)
          LOCATION ('pxf://default/tmp/dummy1?FRAGMENTER=org.greenplum.pxf.example.demo.DemoFragmenter&ACCESSOR=org.greenplum.pxf.example.demo.DemoAccessor&RESOLVER=org.greenplum.pxf.example.demo.DemoTextResolver')
          FORMAT 'TEXT' (DELIMITER ',');
      CREATE EXTERNAL TABLE
      

      (The Demo connector read operation returns static data. You could have specified any file path in the LOCATION clause; it will be ignored by the Demo connector.)

    3. Query the demo_tbl_read table:

      pxf_exampledb=# SELECT * from demo_tbl_read;
             a        |   b    |   c    
      ----------------+--------+--------
       fragment2 row1 | value1 | value2
       fragment2 row2 | value1 | value2
       fragment1 row1 | value1 | value2
       fragment1 row2 | value1 | value2
       fragment3 row1 | value1 | value2
       fragment3 row2 | value1 | value2
      (6 rows)
      
    4. Create a writable Greenplum external table to exercise the Demo connector write operation. For example:

      pxf_exampledb=# CREATE WRITABLE EXTERNAL TABLE demo_tbl_write (a TEXT, b TEXT, c TEXT)
          LOCATION ('pxf://tmp/demo_write_1?ACCESSOR=org.greenplum.pxf.example.demo.DemoFileWritableAccessor&RESOLVER=org.greenplum.pxf.example.demo.DemoTextResolver')
          FORMAT 'TEXT' (DELIMITER ',');
      CREATE EXTERNAL TABLE
      
    5. Write some text data into the demo_tbl_write table. For example:

      pxf_exampledb=# INSERT INTO demo_tbl_write VALUES ('x', 'y', 'z');
      INSERT 0 1
      pxf_exampledb=# INSERT INTO demo_tbl_write VALUES ('u', 'v', 'w');
      INSERT 0 1
      pxf_exampledb=# INSERT INTO demo_tbl_write VALUES ('r', 's', 't');
      INSERT 0 1
      

      Each INSERT command writes a file to the directory named /tmp/demo_write_1 on the local file system.

    6. View the contents of the /tmp/demo_write_1 directory on the local file system. For example:

      gpadmin@gpmaster$ cat /tmp/demo_write_1/*
      x,y,z
      u,v,w
      r,s,t
      

You successfully deployed the Demo connector. If you choose, you may deploy read and write profiles for the Demo connector as a convenience to the end user. See Deploying a Profile for profile definition and deployment instructions.