Deploying a Profile

A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 5.x documentation.

A PXF profile encapsulates access to a specific format of data from a specific external data source using a simple name. PXF is installed with HDFS, Hive, and HBase connectors that expose a number of built-in profiles supporting specific data formats in these data stores. The PXF Profiles section in the PXF end user documentation identifies the list of PXF built-in profiles.

As a developer using the PXF SDK, you may extend the plug-in classes of PXF built-in connectors or you may implement your own plug-in classes. You or the Greenplum Database administrator may then choose to register and deploy one or more connector profile definitions as a convenience for the end user.

Constructing a Profile Definition

A profile definition for a connector is an XML block that identifies the name of the profile, a description, and the plug-in classes that implement access to the external data source.

<profile>
    <name>profile_name</name>
    <description>profile_description</description>
    <plugins>
        <fragmenter>fully.qualified.fragmenter.classname</fragmenter>
        <accessor>fully.qualified.accessor.classname</accessor>
        <resolver>fully.qualified.resolver.classname</resolver>
    </plugins>
</profile>

The profile <name> provides a simple mapping to the <plugins> classes. The Greenplum Database end user provides the profile name or the Java plug-in class names when creating an external table to access the external data.

Specifying the Plug-in Class Names

The profile <plugins> identify the fully-qualified names of the Java classes that PXF will use to split (<fragmenter>), read and/or write (<accessor>), and deserialize/serialize (<resolver>) the external data.

When you define a profile that supports a read operation from an external data store, you must provide one each of <fragmenter>, <accessor>, and <resolver> plug-in class names. You must provide both an <accessor> and a <resolver> plug-in class name for a profile that supports a write operation to an external data store.

You can use a profile or set of plug-ins for both reading and writing when the <accessor> plug-in class implements both the ReadAccessor and the WriteAccessor interfaces and the <resolver> plug-in class implements both the ReadResolver and the WriteResolver interfaces.

You can re-use a single plug-in class name in multiple profile definitions. For example, the PXF built-in HDFS connector HdfsTextSimple and HdfsTextMulti profiles use the same <fragmenter> and <resolver> plug-in classes.

<profile>
    <name>HdfsTextSimple</name>
    <description>This profile is suitable for using when reading
        delimited single line records from plain text files on HDFS
    </description>
    <plugins>
        <fragmenter>org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter>
        <accessor>org.apache.hawq.pxf.plugins.hdfs.LineBreakAccessor</accessor>
        <resolver>org.apache.hawq.pxf.plugins.hdfs.StringPassResolver</resolver>
    </plugins>
</profile>

<profile>
    <name>HdfsTextMulti</name>
    <description>This profile is suitable for using when reading delimited single
        or multi line records (with quoted linefeeds) from plain text files on HDFS.
        It is not splittable (non parallel) and slower than HdfsTextSimple.
    </description>
    <plugins>
        <fragmenter>org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter>
        <accessor>org.apache.hawq.pxf.plugins.hdfs.QuotedLineBreakAccessor</accessor>
        <resolver>org.apache.hawq.pxf.plugins.hdfs.StringPassResolver</resolver>
    </plugins>
</profile>

Registering a New Profile

When you register a new profile with PXF, you add the profile definition to the $GPHOME/pxf/conf/pxf-profiles.xml file. Ensure that you add the profile definition between the <profiles> and <\profiles> (notice the s) keywords when you edit the pxf-profiles.xml file.

Note: In a typical Greenplum Database deployment, the pxf-profiles.xml file contains multiple third-party profile definitions.

Deploying to the Greenplum Database Cluster

When you deploy a new profile, you copy the updated pxf-profiles.xml file to each segment host, and then you must restart the PXF service on each host. Only the Greenplum Database administrative user can register profiles and restart the PXF service.

For example, if seghostfile contains a list, one-host-per-line, of the segment hosts in your Greenplum Database cluster, these commands deploy the profile across the cluster:

gpadmin@gpmaster$ gpscp -v -f seghostfile $GPHOME/pxf/conf/pxf-profiles.xml =:/usr/local/greenplum-db/pxf/conf/pxf-profiles.xml
gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf restart"

Verifying Profile Registration

To verify that you registered and deployed a new profile correctly, you create a Greenplum Database external table specifying the profile name and invoke SELECT and/or INSERT commands on the table to test read and write operations on the external data source.

Example: Defining and Deploying Demo Connector Profiles

In this exercise, you define and register both a read profile and a write profile for the Demo connector, deploy the profiles, and verify the deploy operation.

Prerequisites

Before attempting this exercise, ensure that you have completed the Example: Deploying the Demo Connector exercise.

Procedure

Perform the following procedure to define and register a read profile and a write profile for your Demo connector and verify that you deployed the profiles successfully:

  1. Log in to the Greenplum Database master node as an administrative user and set up your environment:

    $ ssh gpadmin@<gpmaster>
    gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
    
  2. Open the pxf-profiles.xml file in the editor of your choosing. For example:

    gpadmin@gpmaster$ vi $GPHOME/pxf/conf/pxf-profiles.xml
    
  3. Add a definition for a Demo connector read profile named DemoReadLocalFS. Copy/paste the following text to the pxf-profiles.xml file, making sure to paste inside the <profiles> and </profiles> keywords:

    <profile>
        <name>DemoReadLocalFS</name>
        <description>This profile reads text files on the local file system.
        </description>
        <plugins>
            <fragmenter>org.greenplum.pxf.example.demo.DemoFragmenter</fragmenter>
            <accessor>org.greenplum.pxf.example.demo.DemoAccessor</accessor>
            <resolver>org.greenplum.pxf.example.demo.DemoTextResolver</resolver>
        </plugins>
    </profile>
    
  4. Add a definition for a Demo connector write profile named DemoWriteLocalFS. Copy/paste the following text to the pxf-profiles.xml file, again making sure to paste between the <profiles> and </profiles> keywords:

    <profile>
        <name>DemoWriteLocalFS</name>
        <description>This profile writes a text file on the local file system.
        </description>
        <plugins>
            <accessor>org.greenplum.pxf.example.demo.DemoFileWritableAccessor</accessor>
            <resolver>org.greenplum.pxf.example.demo.DemoTextResolver</resolver>
        </plugins>
    </profile>
    
  5. Save the file and exit the editor.

  6. Copy the pxf-profiles.xml file to all segment hosts in your Greenplum Database cluster. For example, if seghostfile contains a list, one-host-per-line, of the segment hosts in your Greenplum Database cluster:

    gpadmin@gpmaster$ gpscp -v -f seghostfile $GPHOME/pxf/conf/pxf-profiles.xml =:/usr/local/greenplum-db/pxf/conf/pxf-profiles.xml
    
  7. Run the pxf restart command to restart PXF on each segment host. For example:

    gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf restart"
    
  8. Verify that you correctly deployed the Demo connector profiles by creating and accessing Greenplum Database external tables:

    1. Connect to a database in which you created the PXF extension as the gpadmin user. For example, to connect to a database named pxf_exampledb:

      gpadmin@gpmaster$ psql -d pxf_exampledb -U gpadmin
      
    2. Create a readable Greenplum external table specifying the DemoReadLocalFS profile name. For example:

      pxf_exampledb=# CREATE EXTERNAL TABLE demo_tbl_read_wp (a TEXT, b TEXT, c TEXT)
          LOCATION ('pxf://default/tmp/dummy1?PROFILE=DemoReadLocalFS')
          FORMAT 'TEXT' (DELIMITER ',');
      CREATE EXTERNAL TABLE
      
    3. Query the demo_tbl_read_wp table:

      pxf_exampledb=# SELECT * from demo_tbl_read_wp;
             a        |   b    |   c    
      ----------------+--------+--------
       fragment2 row1 | value1 | value2
       fragment2 row2 | value1 | value2
       fragment1 row1 | value1 | value2
       fragment1 row2 | value1 | value2
       fragment3 row1 | value1 | value2
       fragment3 row2 | value1 | value2
      (6 rows)
      
    4. Create a writable Greenplum external table specifying the DemoWriteLocalFS profile name. For example:

      pxf_exampledb=# CREATE WRITABLE EXTERNAL TABLE demo_tbl_write_wp (a TEXT, b TEXT, c TEXT)
          LOCATION ('pxf://tmp/demo_write_2?PROFILE=DemoWriteLocalFS')
          FORMAT 'TEXT' (DELIMITER ',');
      CREATE EXTERNAL TABLE
      
    5. Write some text data into the demo_tbl_write_wp table. For example:

      pxf_exampledb=# INSERT INTO demo_tbl_write_wp VALUES ('x', 'y', 'z');
      INSERT 0 1
      pxf_exampledb=# INSERT INTO demo_tbl_write_wp VALUES ('u', 'v', 'w');
      INSERT 0 1
      pxf_exampledb=# INSERT INTO demo_tbl_write_wp VALUES ('r', 's', 't');
      INSERT 0 1
      
    6. View the contents of the /tmp/demo_write_2 directory on the local file system. For example:

      gpadmin@gpmaster$ cat /tmp/demo_write_2/*
      x,y,z
      u,v,w
      r,s,t
      

You successfully deployed Demo connector read and write profiles.