Configuring User Impersonation and Proxying
A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 5.x documentation.
PXF accesses Hadoop services on behalf of Greenplum Database end users. By default, PXF tries to access data source services (HDFS, Hive, HBase) using the identity of the Greenplum Database user account that logs into Greenplum Database and performs an operation using a PXF connector profile. Keep in mind that PXF uses only the login identity of the user when accessing Hadoop services. For example, if a user logs into Greenplum Database as the user
jane and then execute
SET ROLE or
SET SESSION AUTHORIZATION to assume a different user identity, all PXF requests still use the identity
jane to access Hadoop services.
With the default PXF configuration, you must explicitly configure each Hadoop data source (HDFS, Hive, HBase) to allow the PXF process owner (usually
gpadmin) to act as a proxy for impersonating users or groups. See Configuring Hadoop Proxying, Hive User Impersonation, and HBase User Impersonation.
As an alternative, you can disable PXF user impersonation. With user impersonation disabled, PXF executes all Hadoop service requests as the PXF process owner (usually
gpadmin). This behavior matches earlier releases of PXF, but it provides no means to control access to Hadoop services for different Greenplum Database users in Hadoop. It requires that the
gpadmin user have access to all files and directories in HDFS, and all tables in Hive and HBase that need to be accessed as PXF external tables. See Configuring PXF User Impersonation for information about disabling user impersonation.
Perform the following procedure to turn PXF user impersonation on or off in your Greenplum Database cluster. User impersonation is enabled by default.
Log in to your Greenplum Database master node as the administrative user and set up the environment:
$ ssh gpadmin@<gpmaster> gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
$GPHOME/pxf/conf/pxf-env.shfile in a text editor. For example:
gpadmin@gpmaster$ vi $GPHOME/pxf/conf/pxf-env.sh
PXF_USER_IMPERSONATIONsetting in the
pxf-env.shfile. Set the value to
trueto turn PXF user impersonation on, or
falseto turn it off. For example:
Copy the updated
pxf-env.shfile to each Greenplum Database segment host. For example, if
seghostfilecontains a list, one-host-per-line, of the segment hosts in your Greenplum Database cluster:
gpadmin@gpmaster$ gpscp -v -f seghostfile $GPHOME/pxf/conf/pxf-env.sh =:/usr/local/greenplum-db/pxf/conf/pxf-env.sh
Restart PXF on each Greenplum Database segment host to apply the new setting. For example:
$ gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf restart"
When PXF user personation is enabled (the default), you must configure the Hadoop
core-site.xml configuration file permit user impersonation for PXF. Follow these steps:
core-site.xmlconfiguration file using a text editor, or use Ambari to add or edit the property values described in this procedure.
Set the property
hadoop.proxyuser.<name>.hoststo specify the list of PXF host names where proxy requests are permitted. Substitute
<name>for the PXF user (generally
gpadmin) and provide multiple hostnames in a comma-separated list. For example:
<property> <name>hadoop.proxyuser.gpadmin.hosts</name> <value>pxfhost1,pxfhost2,pxfhost3</value> </property>
Set the property
hadoop.proxyuser.<name>.groupsto specify the list of HDFS groups that PXF can impersonate. You should limit this list to only those groups that require access to HDFS data from PXF. For example:
<property> <name>hadoop.proxyuser.gpadmin.groups</name> <value>group1,group2</value> </property>
core-site.xml, restart Hadoop for your changes to take effect.
The PXF Hive connector uses the Hive MetaStore to determine the HDFS locations of Hive tables, and then accesses the underlying HDFS files directly. No specific impersonation configuration is required for Hive, because the Hadoop proxy configuration in
core-site.xml also applies to Hive access.
In order for user impersonation to work with HBase, you must enable the
AccessController coprocessor in the HBase configuration and restart the cluster. See 61.3 Server-side Configuration for Simple User Access Operation in the Apache HBase Reference Guide for the required
hbase-site.xml configuration settings.