A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 5.x documentation.
You can use the Greenplum Platform Extension Framework (PXF) pxf:// protocol to access data on external HDFS, Hive, and HBase systems.
The PXF pxf protocol is packaged as a Greenplum Database extension. The pxf protocol supports reading from HDFS, Hive, and HBase data stores. You can also write text and binary data to HDFS with the pxf protocol.
When you use the pxf protocol to query HDFS, Hive, or HBase systems, you specify the HDFS file or Hive or HBase table that you want to access. PXF requests the data from the data store and delivers the relevant portions in parallel to each Greenplum Database segment instance serving the query.
You must explicitly initialize and start PXF before you can use the pxf protocol to read external data. You must also grant permissions to the pxf protocol and enable PXF in each database in which you want to create external tables to access external data.
For detailed information about configuring and using PXF and the pxf protocol, refer to Accessing External Data with PXF.