Installing and Configuring PXF

A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 5.x documentation.

The Greenplum Platform Extension Framework (PXF) provides connectors to Hadoop, Hive, HBase and external SQL data stores. To use these PXF connectors, you must install Hadoop, Hive, and HBase clients on each Greenplum Database segment host as described in this one-time installation and configuration procedure:

PXF accesses Hadoop services on behalf of Greenplum Database end users. By default, PXF tries to access data source services (HDFS, Hive, HBase) using the identity of the Greenplum Database user account that logs into Greenplum Database. In order to support this functionality, you must configure proxy settings for Hadoop, as well as for Hive and HBase if you intend to use those PXF connectors. Follow procedures in:

to configure user impersonation and proxying for Hadoop services, or to turn off PXF user impersonation.

You must also configure and initialize PXF itself, and start the PXF service on each segment host:

If your Hadoop cluster is secured with Kerberos, you must configure PXF and generate Kerberos principals and keytabs for each segment host: