Upgrading PXF

A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 5.x documentation.

If you are using PXF in your current Greenplum Database installation, you must upgrade the PXF service when you upgrade to a new version of Greenplum Database.

The PXF upgrade procedure describes how to upgrade PXF in your Greenplum Database installation. This procedure uses PXF.from to refer to your currently-installed PXF version and PXF.to to refer to the PXF version installed when you upgrade to the new version of Greenplum Database.

Most PXF installations do not require modifications to PXF configuration files and should experience a seamless upgrade.

Note: Starting in Greenplum Database version 5.12, PXF no longer requires a Hadoop client installation. PXF now bundles all of the JAR files on which it depends, and loads these JARs at runtime.

The PXF upgrade procedure has two parts. You perform one procedure before, and one procedure after, you upgrade to a new version of Greenplum Database:

Step 1: PXF Pre-Upgrade Actions

Perform this procedure before you upgrade to a new version of Greenplum Database:

  1. Log in to the Greenplum Database master node. For example:

    $ ssh gpadmin@<gpmaster>
    
  2. Stop PXF on each segment host as described in Stopping PXF.

  3. If you are upgrading from Greenplum Database version 5.14 or earlier:

    1. Back up the PXF.from configuration files found in the $GPHOME/pxf/conf/ directory. These files should be the same on all segment hosts, so you need only copy from one of the hosts. For example:

      gpadmin@gpmaster$ mkdir -p /save/pxf-from-conf
      gpadmin@gpmaster$ scp gpadmin@seghost1:/usr/local/greenplum-db/pxf/conf/* /save/pxf-from-conf/
      
    2. Note the locations of any custom JAR files that you may have added to your PXF.from installation. Save a copy of these JAR files.

  4. Upgrade to the new version of Greenplum Database and then continue your PXF upgrade with Step 2: Upgrading PXF.

Step 2: Upgrading PXF

After you upgrade to the new version of Greenplum Database, perform the following procedure to upgrade and configure the PXF.to software:

  1. Log in to the Greenplum Database master node. For example:

    $ ssh gpadmin@<gpmaster>
    
  2. Initialize PXF on each segment host as described in Initializing PXF.

  3. PXF user impersonation is on by default in Greenplum Database version 5.5.0 and later. If you are upgrading from an older PXF.from version, you must configure user impersonation for the underlying Hadoop services. Refer to Configuring User Impersonation and Proxying for instructions, including the configuration procedure to turn off PXF user impersonation.

  4. If you are upgrading from Greenplum Database version 5.14 or earlier:

    1. If you updated the pxf-env.sh configuration file in your PXF.from installation, re-apply those changes to $PXF_CONF/conf/pxf-env.sh, and copy the updated pxf-env.sh to all segment hosts. For example, if seghostfile contains a list, one-host-per-line, of the segment hosts in your Greenplum Database cluster and PXF_CONF=/etc/pxf/usercfg:

      gpadmin@gpmaster$ vi $PXF_CONF/conf/pxf-env.sh
         <update the file>
      gpadmin@gpmaster$ gpscp -v -f seghostfile $PXF_CONF/conf/pxf-env.sh =:/etc/pxf/usercfg/conf/pxf-env.sh
      
    2. If you updated the pxf-profiles.xml configuration file in your PXF.from installation, re-apply those changes to $PXF_CONF/conf/pxf-profiles.xml and copy the updated file to all segment hosts. Refer to Step 4a above for a similar gpscp command.

      Note: Starting in Greenplum Database version 5.12, the package name for PXF classes was changed to use the prefix org.greenplum.*. If you are upgrading from an older PXF.from version and you customized the pxf-profiles.xml file, you must change any org.apache.hawq.pxf.* references to org.greenplum.pxf.* when you re-apply your changes.

    3. If you updated the pxf-log4j.properties configuration file in your PXF.from installation, re-apply those changes to $PXF_CONF/conf/pxf-log4j.properties and copy the updated file to all segment hosts. Refer to Step 4a above for a similar gpscp command.

    4. If you updated the pxf-public.classpath configuration file in your PXF.from installation, copy every JAR referenced in the file to $PXF_CONF/lib on each segment host. Refer to Step 4a above for a similar gpscp command.

    5. If you added additional JAR files to your PXF.from installation, copy them to $PXF_CONF/lib on all segment hosts.

    6. Starting in Greenplum Database version 5.15, PXF requires that the Hadoop configuration files reside in the $PXF_CONF/servers/default directory. If you configured PXF Hadoop connectors in your PXF.from installation, copy the Hadoop configuration files in /etc/<hadoop_service>/conf to $PXF_CONF/servers/default on the Greenplum Database master and on all segment hosts.

    7. Starting in Greenplum Database version 5.15, the default Kerberos keytab file location for PXF is $PXF_CONF/keytabs. If you previously configured PXF for secure HDFS and the PXF keytab file is located in a PXF.from installation directory (for example, $GPHOME/pxf/conf), consider relocating the keytab file to $PXF_CONF/keytabs. Alternatively, update the PXF_KEYTAB property setting in the $PXF_CONF/conf/pxf-env.sh file to reference your keytab file. Be sure to propagate any updated files to each segment host.

  5. Start PXF on each segment host as described in Starting PXF.