Using PXF with External Data

A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 5.x documentation.

The PXF Extension Framework (PXF) provides parallel, high throughput data access and federated queries across heterogeneous data sources via built-in connectors that map a Greenplum Database external table definition to an external data source. This Greenplum Database extension is based on PXF from Apache HAWQ (incubating).

  • PXF Architecture

    This topic describes the architecture of PXF and its integration with Greenplum Database.

  • Installing and Configuring PXF

    This topic details the PXF installation, configuration, and startup procedures.

  • Using PXF

    This topic describes important PXF procedures and concepts, including enabling PXF for use in a database and PXF protocol and external table definitions.

  • Reading HDFS File Data

    This topic describes how to use the PXF HDFS connector and related profiles to read Text and Avro format HDFS files.

  • Accessing Hive Table Data

    This topic describes how to use the PXF Hive connector and related profiles to read Hive tables stored in Text, RCFile, Parquet, and ORC storage formats.

  • Troubleshooting PXF

    This topic details the service- and database- level logging configuration procuredures for PXF. It also identifies some common PXF errors.