Configuring User Impersonation and Proxying
A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 6.x documentation.
PXF accesses Hadoop services on behalf of Greenplum Database end users. By default, PXF tries to access data source services (HDFS, Hive, HBase) using the identity of the Greenplum Database user account that logs into Greenplum Database and performs an operation using a PXF connector profile. Keep in mind that PXF uses only the login identity of the user when accessing Hadoop services. For example, if a user logs into Greenplum Database as the user jane
and then execute SET ROLE
or SET SESSION AUTHORIZATION
to assume a different user identity, all PXF requests still use the identity jane
to access Hadoop services.
With the default PXF configuration, you must explicitly configure each Hadoop data source (HDFS, Hive, HBase) to allow the PXF process owner (usually gpadmin
) to act as a proxy for impersonating users or groups. See Configuring Hadoop Proxying, Hive User Impersonation, and HBase User Impersonation.
As an alternative, you can disable PXF user impersonation. With user impersonation disabled, PXF executes all Hadoop service requests as the PXF process owner (usually gpadmin
). This behavior matches earlier releases of PXF, but it provides no means to control access to Hadoop services for different Greenplum Database users in Hadoop. It requires that the gpadmin
user have access to all files and directories in HDFS, and all tables in Hive and HBase that are referenced in PXF external table definitions. See Configuring PXF User Impersonation for information about disabling user impersonation.
Configure PXF User Impersonation
Perform the following procedure to turn PXF user impersonation on or off in your Greenplum Database cluster. If you are configuring PXF for the first time, user impersonation is enabled by default. You need not perform this procedure.
Log in to your Greenplum Database master node as the administrative user:
$ ssh gpadmin@<gpmaster>
Recall the location of the PXF user configuration directory (
$PXF_CONF
). Open the$PXF_CONF/conf/pxf-env.sh
configuration file in a text editor. For example:gpadmin@gpmaster$ vi $PXF_CONF/conf/pxf-env.sh
Locate the
PXF_USER_IMPERSONATION
setting in thepxf-env.sh
file. Set the value totrue
to turn PXF user impersonation on, orfalse
to turn it off. For example:PXF_USER_IMPERSONATION="true"
Use the
pxf cluster sync
command to copy the updatedpxf-env.sh
file to the Greenplum Database cluster. For example:gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster sync
If you have previously started PXF, restart it on each Greenplum Database segment host as described in Restarting PXF to apply the new setting.
Configure Hadoop Proxying
When PXF user personation is enabled (the default), you must configure the Hadoop core-site.xml
configuration file to permit user impersonation for PXF. Follow these steps:
On your Hadoop cluster, open the
core-site.xml
configuration file using a text editor, or use Ambari to add or edit the Hadoop property values described in this procedure.Set the property
hadoop.proxyuser.<name>.hosts
to specify the list of PXF host names from which proxy requests are permitted. Substitute the PXF proxy user (generallygpadmin
) for<name>
, and provide multiple PXF host names in a comma-separated list. For example:<property> <name>hadoop.proxyuser.gpadmin.hosts</name> <value>pxfhost1,pxfhost2,pxfhost3</value> </property>
Set the property
hadoop.proxyuser.<name>.groups
to specify the list of HDFS groups that PXF can impersonate. You should limit this list to only those groups that require access to HDFS data from PXF. For example:<property> <name>hadoop.proxyuser.gpadmin.groups</name> <value>group1,group2</value> </property>
After changing
core-site.xml
, you must restart Hadoop for your changes to take effect.Copy the updated
core-site.xml
file to the PXF Hadoop server configuration directory$PXF_CONF/servers/<server_name>
on the master and synchronize the configuration to the standby master and each Greenplum Database segment host.
Hive User Impersonation
The PXF Hive connector uses the Hive MetaStore to determine the HDFS locations of Hive tables, and then accesses the underlying HDFS files directly. No specific impersonation configuration is required for Hive, because the Hadoop proxy configuration in core-site.xml
also applies to Hive tables accessed in this manner.
HBase User Impersonation
In order for user impersonation to work with HBase, you must enable the AccessController
coprocessor in the HBase configuration and restart the cluster. See 61.3 Server-side Configuration for Simple User Access Operation in the Apache HBase Reference Guide for the required hbase-site.xml
configuration settings.