PXF Developer Concepts

A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 5.x documentation.

The PXF SDK provides the classes and interfaces that you use to implement support for new external data sources, data formats, and data access APIs in Greenplum Database. You can also use the PXF SDK to extend existing external data sources, formats, and APIs.

When you develop with the PXF SDK, you ultimately build a JAR file. The Greenplum Database administrator deploys your JAR file to the Greenplum Database cluster. The Greenplum Database end user accesses your external data source/format/API when invoking a CREATE EXTERNAL TABLE command specifying the pxf protocol.

This topic introduces concepts central to developing with the PXF SDK. These concepts will be discussed in detail in later sections of this guide.

Plug-ins, Connectors, and Profiles

The PXF API includes the Fragmenter class and read and write Accessor and Resolver interfaces. You implement theses classes and interfaces when you extend PXF to add support for a new external data source, data format, or data access API. The classes that you create are called plug-ins.

A connector is a set of plug-in classes that support read and/or write operation(s) to an external data source:

  • A set of a single Fragmenter, Accessor, and Resolver plug-in class together comprise a read connector.
  • An Accessor and Resolver plug-in pair comprise a write connector.
  • A single Accessor or Resolver class may implement both read and write operations.

A profile is a simple name mapping to a set of connector plug-in class names. You or the Greenplum Database administrator may choose to configure one or more profiles for your connector as a convenience for the Greenplum Database end user.

Summary of terms:

Term Description
Plug-in A Fragmenter, Accessor, or Resolver class implementation.
Connector A set of plug-in classes that support the read and/or write operation to an external data source.
Profile A name mapping to a set of connector plug-in class names.

Data Flow

PXF in Greenplum Database has two components:

  • A C shared library that is loaded into Greenplum Database when the CREATE EXTENSION pxf command is invoked on a database.
  • A Java service, referred to as the PXF agent, a single JVM process located on each Greenplum Database segment host. You start the PXF agent when you run pxf cluster start.

Operations on Greenplum Database external tables are first routed to the PXF C shared library extension then on to the PXF agent.

The PXF C library validates PXF-specific parameters in the CREATE EXTERNAL TABLE command. The PXF agent invokes a connector’s plug-in classes only after the end user performs a SELECT (read) or INSERT (write) operation on the external table.

The PXF agent initiates a read operation on the external data source when the user runs a SELECT command on a PXF external table. The PXF agent spawns a thread that invokes the connector Fragmenter, which splits data from an external data source into a list of fragments that can be read in parallel. A read Accessor reads a single fragment from an external data source and produces a list of records/rows. The read Resolver deserializes a record/row into fields. Finally, PXF translates these fields into Greenplum Database table column values.

The PXF agent initiates a write operation to the external data source when the user runs an INSERT or similar command on a PXF writable external table. When writing to an external data source, PXF translates Greenplum Database table column values into fields and invokes the connector write Resolver. The write Resolver serializes these fields into a record. The write Accessor writes a record directly to the external data source.