A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 5.x documentation.
You can use the PXF Extension Framework pxf:// protocol to access data on external HDFS and Hive systems.
The PXF Extension Framework pxf protocol is packaged as a Greenplum Database extension. The pxf protocol supports reading HDFS file and Hive table data. The protocol does not yet support writing to HDFS or Hive data stores.
When you use the pxf protocol to query HDFS and Hive systems, you specify the HDFS file or Hive table you want to access. PXF requests the data from HDFS and delivers the relevant portions in parallel to each Greenplum Database segment instance serving the query.
You must explicitly initialize and start the PXF Extension Framework before you can use the pxf protocol to read external data. You must also grant permissions to the pxf protocol and enable PXF in each database in which you want to create external tables to access external data.
For detailed information about configuring and using the PXF Extension Framework and the pxf protocol, refer to Accessing HDFS and Hive Data with PXF.