Introducing the PXF API

A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 5.x documentation.

You use the PXF API to add support for a new external data store, data format, or data access API to Greenplum Database. You can also use the PXF API to extend existing external data stores, formats, and APIs. The PXF API defines the data structures, classes, and interfaces that you need to map external data into a tabular form suitable for Greenplum Database, and vice-versa.

The PXF API interfaces that you will implement or extend depend upon the external data store/API, data types, and operations (read, write) that your connector or plug-in will support, as well as the specific features you want to provide.

The PXF API defines classes including the following:

Class Name Description
Plugin The base class for all PXF plug-in types.
Fragmenter The abstract class that defines how to split data from an external source into fragments.
Fragment A subset of data that can be read in parallel.
OneRow One record or row in a Fragment.
OneField One deserialized or serialized field in a OneRow record.
InputData Input data from Greenplum Database and common configuration information available to all plug-ins, and the helper methods to access this information.

The PXF API exposes the following interfaces:

Interface Name Description
ReadAccessor Reads a Fragment and generates a list of OneRow records.
ReadResolver Deserializes a single OneRow record into a list of OneField objects.
ReadVectorizedResolver Deserializes a batch of OneRow records into tuples of OneField objects.
WriteResolver Serializes a list of OneField objects into a single OneRow record.
WriteAccessor Writes OneRow records to the external data source.

Refer to the PXF API JavaDocs for detailed information about the classes and interfaces exposed by the API.

General PXF API Information

Package Name

The PXF API base package name is org.apache.hawq.pxf.api. All PXF API classes and interfaces reside in this package.

JAR File

You need the PXF API JAR file to develop with the PXF SDK. This file is named pxf-api-<version>.jar, where <version> is a dot-separated 4 digit version number. For example:


PXF JAR files are not currently available from a remote repository. You can obtain the PXF API JAR file from your Greenplum Database installation here:


If any plug-in that you implement extends a class from a built-in PXF connector (HDFS, Hive, HBase), you will also need the JAR file associated with that connector. The built-in PXF connector JAR files also reside in $GPHOME/pxf/lib:


You can also choose to build PXF JAR file(s) from source.