Introducing the PXF API

A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 5.x documentation.

You use the PXF API to add support for a new external data store, data format, or data access API to Greenplum Database. You can also use the PXF API to extend existing external data stores, formats, and APIs. The PXF API defines the data structures, classes, and interfaces that you need to map external data into a tabular form suitable for Greenplum Database, and vice-versa.

The PXF API interfaces that you will implement or extend depend upon the external data store/API, data types, and operations (read, write) that your connector or plug-in will support, as well as the specific features you want to provide.

The PXF API defines classes including the following:

Class Name Description
Plugin The base class for all PXF plug-in types.
Fragmenter The abstract class that defines how to split data from an external source into fragments.
Fragment A subset of data that can be read in parallel.
OneRow One record or row in a Fragment.
OneField One deserialized or serialized field in a OneRow record.
InputData Input data from Greenplum Database and common configuration information available to all plug-ins, and the helper methods to access this information.

The PXF API exposes the following interfaces:

Interface Name Description
ReadAccessor Reads a Fragment and generates a list of OneRow records.
ReadResolver Deserializes a single OneRow record into a list of OneField objects.
ReadVectorizedResolver Deserializes a batch of OneRow records into tuples of OneField objects.
WriteResolver Serializes a list of OneField objects into a single OneRow record.
WriteAccessor Writes OneRow records to the external data source.

General PXF API Information

Package Name

The PXF API base package name is org.greenplum.pxf.api. All PXF API classes and interfaces reside in this package.

JAR File

You need the PXF API JAR file to develop with the PXF SDK. This file is named pxf-api-<version>.jar, where <version> is a dot-separated 4 digit version number. For example:

pxf-api-4.0.0.jar

PXF JAR files are not currently available from a remote repository. You can obtain the PXF API JAR file from your Greenplum Database installation here:

$GPHOME/pxf/lib/pxf-api-<version>.jar

If any plug-in that you implement extends a class from a built-in PXF connector (HDFS, Hive, HBase), you will also need the JAR file associated with that connector. The built-in PXF connector JAR files also reside in $GPHOME/pxf/lib:

pxf-hbase-<version>.jar
pxf-hdfs-<version>.jar
pxf-hive-<version>.jar
pxf-json-<version>.jar

You can also choose to build PXF JAR file(s) from source.