Configuring PXF for Secure HDFS
A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 6.x documentation.
When Kerberos is enabled for your HDFS filesystem, PXF, as an HDFS client, requires a principal and keytab file to authenticate access to HDFS. To read or write files on a secure HDFS, you must create and deploy Kerberos principals and keytabs for PXF, and ensure that Kerberos authentication is enabled and functioning.
Prerequisites
Before you configure PXF for access to a secure HDFS filesystem, ensure that you have:
Configured the Hadoop connectors using the default PXF server configuration.
Initialized, configured, and started PXF as described in Configuring PXF, including enabling PXF and Hadoop user impersonation.
Enabled Kerberos for your Hadoop cluster per the instructions for your specific distribution and verified the configuration.
Verified that the HDFS configuration parameter
dfs.block.access.token.enable
is set totrue
. You can find this setting in thehdfs-site.xml
configuration file on a host in your Hadoop cluster.Noted the host name or IP address of each Greenplum Database segment host (<seghost>) and the Kerberos Key Distribution Center (KDC) <kdc-server> host.
Noted the name of the Kerberos <realm> in which your cluster resides.
Installed the Kerberos client packages on each Greenplum Database segment host if they are not already installed. You must have superuser permissions to install operating system packages. For example:
root@gphost$ rpm -qa | grep krb root@gphost$ yum install krb5-libs krb5-workstation
Procedure
There are different procedures for configuring PXF for secure HDFS with a Microsoft Active Directory KDC Server vs. with an MIT Kerberos KDC Server.
Configuring PXF with a Microsoft Active Directory Kerberos KDC Server
When you configure PXF for secure HDFS using an AD Kerberos KDC server, you will perform tasks on both the KDC server host and the Greenplum Database master host.
Perform the following steps the Active Directory domain controller:
- Start Active Directory Users and Computers.
- Expand the forest domain and the top-level UNIX organizational unit that describes your Greenplum user domain.
- Select Service Accounts, right-click, then select New->User.
- Type a name, eg.
ServiceGreenplumPROD1
, and change the login name togpadmin
. Note that the login name should be in compliance with POSIX standard and match hadoop.proxyuser..hosts/groups in the Hadoop core-site.xml
andPXF_PRINCIPAL
in$PXF_CONF/conf/pxf-env.sh
. - Type and confirm the Active Directory service account password. Select the User cannot change password and Password never expires check boxes, then click Next. For security reasons, if you can’t have Password never expires checked, you will need to generate new keytab file (step 7) every time you change the password of the service account.
- Click Finish to complete the creation of the new user principal.
Open Powershell or a command prompt and run the
ktpass
command to generate the keytab file. For example:powershell#>ktpass -out pxf.service.keytab -princ gpadmin@EXAMPLE.COM -mapUser ServiceGreenplumPROD1 -pass ******* -crypto all -ptype KRB5_NT_PRINCIPAL
With Active Directory, the principal and the keytab file are shared by all Greenplum Database segment hosts.
Copy the
pxf.service.keytab
file to the Greenplum master host.
Perform the following steps on the Greenplum Database master host:
Log in to the Greenplum Database master host. For example:
$ ssh gpadmin@<gpmaster>
Open the
$PXF_CONF/conf/pxf-env.sh
file in an editor. Update thePXF_KEYTAB
andPXF_PRINCIPAL
settings, if required, specifying the location of the keytab file and the Kerberos principal. ReplaceEXAMPLE.COM
with your Kerberos realm.export PXF_KEYTAB="${PXF_CONF}/keytabs/pxf.service.keytab" export PXF_PRINCIPAL="gpadmin@EXAMPLE.COM"
Save the file and exit the editor.
Synchronize the PXF configuration to your Greenplum Database cluster and restart PXF. For example:
gpadmin@master$ $GPHOME/pxf/bin/pxf cluster sync gpadmin@master$ $GPHOME/pxf/bin/pxf cluster stop gpadmin@master$ $GPHOME/pxf/bin/pxf cluster start
Step 5 does not synchronize the keytabs in
$PXF_CONF
. You must distribute the keytab file to$PXF_CONF/keytabs/
. Locate the keytab file, copy the file to the$PXF_CONF
user configuration directory, and set required permissions. For example:gpadmin@gpmaster$ gpscp -f hostfile_all pxf.service.keytab =:$PXF_CONF/keytabs/ gpadmin@gpmaster$ gpssh -f hostfile_all chmod 400 $PXF_CONF/keytabs/pxf.service.keytab
Configuring PXF with an MIT Kerberos KDC Server
When you configure PXF for secure HDFS using an MIT Kerberos KDC server, you will perform tasks on both the KDC server host and the Greenplum Database master host.
Perform the following steps on the MIT Kerberos KDC server host:
Log in to the Kerberos KDC server as the
root
user.$ ssh root@<kdc-server> root@kdc-server$
Distribute the
/etc/krb5.conf
Kerberos configuration file on the KDC server host to each segment host in your Greenplum Database cluster if not already present. For example:root@kdc-server$ scp /etc/krb5.conf seghost:/etc/krb5.conf
Use the
kadmin.local
command to create a Kerberos PXF service principal for each Greenplum Database segment host. The service principal should be of the formgpadmin/<seghost>@<realm>
where <seghost> is the DNS resolvable, fully-qualified hostname of the segment host system (output of thehostname -f
command).For example, these commands create PXF service principals for the hosts named host1.example.com, host2.example.com, and host3.example.com in the Kerberos realm named
EXAMPLE.COM
:root@kdc-server$ kadmin.local -q "addprinc -randkey -pw changeme gpadmin/host1.example.com@EXAMPLE.COM" root@kdc-server$ kadmin.local -q "addprinc -randkey -pw changeme gpadmin/host2.example.com@EXAMPLE.COM" root@kdc-server$ kadmin.local -q "addprinc -randkey -pw changeme gpadmin/host3.example.com@EXAMPLE.COM"
Generate a keytab file for each PXF service principal that you created in the previous step. Save the keytab files in any convenient location (this example uses the directory
/etc/security/keytabs
). You will deploy the keytab files to their respective Greenplum Database segment host machines in a later step. For example:root@kdc-server$ kadmin.local -q "xst -norandkey -k /etc/security/keytabs/pxf-host1.service.keytab gpadmin/host1.example.com@EXAMPLE.COM" root@kdc-server$ kadmin.local -q "xst -norandkey -k /etc/security/keytabs/pxf-host2.service.keytab gpadmin/host2.example.com@EXAMPLE.COM" root@kdc-server$ kadmin.local -q "xst -norandkey -k /etc/security/keytabs/pxf-host3.service.keytab gpadmin/host3.example.com@EXAMPLE.COM"
Repeat the
xst
command as necessary to generate a keytab for each PXF service principal that you created in the previous step.List the principals. For example:
root@kdc-server$ kadmin.local -q "listprincs"
Copy the keytab file for each PXF service principal to its respective segment host. For example, the following commands copy each principal generated in step 4 to the PXF default keytab directory on the segment host when
PXF_CONF=/usr/local/greenplum-pxf
:root@kdc-server$ scp /etc/security/keytabs/pxf-host1.service.keytab host1.example.com:/usr/local/greenplum-pxf/keytabs/pxf.service.keytab root@kdc-server$ scp /etc/security/keytabs/pxf-host2.service.keytab host2.example.com:/usr/local/greenplum-pxf/keytabs/pxf.service.keytab root@kdc-server$ scp /etc/security/keytabs/pxf-host3.service.keytab host3.example.com:/usr/local/greenplum-pxf/keytabs/pxf.service.keytab
Note the file system location of the keytab file on each PXF host; you will need this information for a later configuration step.
Change the ownership and permissions on the
pxf.service.keytab
files. The files must be owned and readable by only thegpadmin
user. For example:root@kdc-server$ ssh host1.example.com chown gpadmin:gpadmin /usr/local/greenplum-pxf/keytabs/pxf.service.keytab root@kdc-server$ ssh host1.example.com chmod 400 /usr/local/greenplum-pxf/keytabs/pxf.service.keytab root@kdc-server$ ssh host2.example.com chown gpadmin:gpadmin /usr/local/greenplum-pxf/keytabs/pxf.service.keytab root@kdc-server$ ssh host2.example.com chmod 400 /usr/local/greenplum-pxf/keytabs/pxf.service.keytab root@kdc-server$ ssh host3.example.com chown gpadmin:gpadmin /usr/local/greenplum-pxf/keytabs/pxf.service.keytab root@kdc-server$ ssh host3.example.com chmod 400 /usr/local/greenplum-pxf/keytabs/pxf.service.keytab
Perform the following steps on the Greenplum Database master host:
Log in to the master host. For example:
$ ssh gpadmin@<gpmaster>
Open the PXF
pxf-env.sh
user configuration file in the editor of your choice. For example, to open the file withvi
whenPXF_CONF=/usr/local/greenplum-pxf
:gpadmin@seghost$ vi /usr/local/greenplum-pxf/conf/pxf-env.sh
Update the
PXF_KEYTAB
andPXF_PRINCIPAL
settings, if required. Specify the location of the keytab file and the Kerberos principal, substituting your realm. The default values for these settings are identified below:export PXF_KEYTAB="${PXF_CONF}/keytabs/pxf.service.keytab" export PXF_PRINCIPAL="gpadmin/_HOST@EXAMPLE.COM"
PXF automatically replaces
_HOST
with the FQDN of the segment host.Save the file and exit the editor.
Synchronize the PXF configuration to your Greenplum Database cluster and restart PXF. For example:
gpadmin@seghost$ $GPHOME/pxf/bin/pxf cluster sync gpadmin@master$ $GPHOME/pxf/bin/pxf cluster stop gpadmin@master$ $GPHOME/pxf/bin/pxf cluster start