Using PXF with External Data

A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 5.x documentation.

The Greenplum Platform Extension Framework (PXF) provides parallel, high throughput data access and federated queries across heterogeneous data sources via built-in connectors that map a Greenplum Database external table definition to an external data source. This Greenplum Database extension is based on PXF from Apache HAWQ (incubating).

  • PXF Architecture

    This topic describes the architecture of PXF and its integration with Greenplum Database.

  • Installing and Configuring PXF

    This topic details the installation, configuration, and startup procedures for PXF and supporting clients.

  • Upgrading PXF

    This topic describes the procedure that you must perform to upgrade PXF when you install a new version of Greenplum Database.

  • Using PXF to Read and Write External Data

    This topic describes important PXF procedures and concepts, including enabling PXF for use in a database and PXF protocol and external table definitions.

  • Reading Data from HDFS

    This topic describes how to use the PXF HDFS connector and related profiles to read Text and Avro format HDFS files.

  • Writing Data to HDFS

    This topic describes how to use the PXF HDFS connector and related profiles to write Text and SequenceFile format binary data to HDFS files.

  • Accessing Hive Table Data

    This topic describes how to use the PXF Hive connector and related profiles to read Hive tables stored in TextFile, RCFile, Parquet, and ORC storage formats.

  • Accessing HBase Table Data

    This topic describes how to use the PXF HBase connector to read HBase table data.

  • Troubleshooting PXF

    This topic details the service- and database- level logging configuration procedures for PXF. It also identifies some common PXF errors and describes how to address PXF memory issues.