A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 5.x documentation.
You can use the Greenplum Platform Extension Framework (PXF) pxf:// protocol to access data residing on external Hadoop systems (HDFS, Hive, HBase), object store systems (Azure, Google Cloud Storage, Minio, S3), and SQL databases.
The PXF pxf protocol is packaged as a Greenplum Database extension. The pxf protocol supports reading from external data stores. You can also write text, binary, and parquet-format data with the pxf protocol.
When you use the pxf protocol to query an external data store, you specify the directory, file, or table that you want to access. PXF requests the data from the data store and delivers the relevant portions in parallel to each Greenplum Database segment instance serving the query.
You must explicitly initialize and start PXF before you can use the pxf protocol to read or write external data. You must also enable PXF in each database in which you want to allow users to create external tables to access external data, and grant permissions on the pxf protocol to those Greenplum Database users.
For detailed information about configuring and using PXF and the pxf protocol, refer to Accessing External Data with PXF.