Upgrading PXF

If you are using PXF in your current Greenplum Database installation, you must upgrade the PXF service when you upgrade to a new version of Greenplum Database.

The PXF upgrade procedure describes how to upgrade PXF in your Greenplum Database installation. This procedure uses PXF.from to refer to your currently-installed PXF version and PXF.to to refer to the PXF version installed when you upgrade to the new version of Greenplum Database.

Most PXF installations do not require modifications to PXF configuration files and should experience a seamless upgrade.

Note: Starting in Greenplum Database version 5.12, PXF no longer requires a Hadoop client installation. PXF now bundles all of the JAR files on which it depends, and loads these JARs at runtime.

The PXF upgrade procedure has two parts. You perform one procedure before, and one procedure after, you upgrade to a new version of Greenplum Database:

Step 1: PXF Pre-Upgrade Actions

Perform this procedure before you upgrade to a new version of Greenplum Database:

  1. Log in to the Greenplum Database master node. For example:

    $ ssh gpadmin@<gpmaster>
    
  2. Stop PXF on each segment host as described in Stopping PXF.

  3. If you are upgrading from Greenplum Database version 5.14 or earlier:

    1. Back up the PXF.from configuration files found in the $GPHOME/pxf/conf/ directory. These files should be the same on all segment hosts, so you need only copy from one of the hosts. For example:

      gpadmin@gpmaster$ mkdir -p /save/pxf-from-conf
      gpadmin@gpmaster$ scp gpadmin@seghost1:/usr/local/greenplum-db/pxf/conf/* /save/pxf-from-conf/
      
    2. Note the locations of any custom JAR files that you may have added to your PXF.from installation. Save a copy of these JAR files.

  4. Upgrade to the new version of Greenplum Database and then continue your PXF upgrade with Step 2: Upgrading PXF.

Step 2: Upgrading PXF

After you upgrade to the new version of Greenplum Database, perform the following procedure to upgrade and configure the PXF.to software:

  1. Log in to the Greenplum Database master node. For example:

    $ ssh gpadmin@<gpmaster>
    
  2. Initialize PXF on each segment host as described in Initializing PXF.

  3. PXF user impersonation is on by default in Greenplum Database version 5.5.0 and later. If you are upgrading from an older PXF.from version, you must configure user impersonation for the underlying Hadoop services. Refer to Configuring User Impersonation and Proxying for instructions, including the configuration procedure to turn off PXF user impersonation.

  4. If you are upgrading from Greenplum Database version 5.14 or earlier:

    1. If you updated the pxf-env.sh configuration file in your PXF.from installation, re-apply those changes to $PXF_CONF/conf/pxf-env.sh. For example:

      gpadmin@gpmaster$ vi $PXF_CONF/conf/pxf-env.sh
         <update the file>
      
    2. Similarly, if you updated the pxf-profiles.xml configuration file in your PXF.from installation, re-apply those changes to $PXF_CONF/conf/pxf-profiles.xml on the master host.

      Note: Starting in Greenplum Database version 5.12, the package name for PXF classes was changed to use the prefix org.greenplum.*. If you are upgrading from an older PXF.from version and you customized the pxf-profiles.xml file, you must change any org.apache.hawq.pxf.* references to org.greenplum.pxf.* when you re-apply your changes.

    3. If you updated the pxf-log4j.properties configuration file in your PXF.from installation, re-apply those changes to $PXF_CONF/conf/pxf-log4j.properties on the master host.

    4. If you updated the pxf-public.classpath configuration file in your PXF.from installation, copy every JAR referenced in the file to $PXF_CONF/lib on the master host.

    5. If you added additional JAR files to your PXF.from installation, copy them to $PXF_CONF/lib on the master host.

    6. Starting in Greenplum Database version 5.15, PXF requires that the Hadoop configuration files reside in the $PXF_CONF/servers/default directory. If you configured PXF Hadoop connectors in your PXF.from installation, copy the Hadoop configuration files in /etc/<hadoop_service>/conf to $PXF_CONF/servers/default on the Greenplum Database master host.

    7. Starting in Greenplum Database version 5.15, the default Kerberos keytab file location for PXF is $PXF_CONF/keytabs. If you previously configured PXF for secure HDFS and the PXF keytab file is located in a PXF.from installation directory (for example, $GPHOME/pxf/conf), consider relocating the keytab file to $PXF_CONF/keytabs. Alternatively, update the PXF_KEYTAB property setting in the $PXF_CONF/conf/pxf-env.sh file to reference your keytab file.

  5. If you are upgrading from Greenplum Database version 5.18 or earlier:

    1. PXF now bundles Hadoop version 2.9.2 dependent JAR files. If you registered additional Hadoop-related JAR files in $PXF_CONF/lib, ensure that these libraries are compatible with Hadoop version 2.9.2.
    2. The PXF JDBC connector now supports file-based server configuration. If you choose to utilize this new feature with existing external tables that reference an external SQL database, refer to the configuration instructions in Configuring the JDBC Connector for additional information.
    3. The PXF JDBC connector now supports statement query timeouts. This feature requires explicit support from the JDBC driver. The default query timeout is 0, wait forever. Some JDBC drivers support a 0 timeout value without fully supporting the statement query timeout feature. Ensure that any JDBC driver that you have registered supports the default timeout, or better yet, fully supports this feature. You may need to update your JDBC driver version to obtain this support. Refer to the configuration instructions in JDBC Driver JAR Registration for information about registering JAR files with PXF.
    4. If you plan to use Hive partition filtering with integral types, you must set the hive.metastore.integral.jdo.pushdown Hive property in your Hadoop cluster’s hive-site.xml and in your PXF user configuration $PXF_CONF/servers/default/hive-site.xml file. See Updating Hadoop Configuration.
  6. If you are upgrading from Greenplum Database version 5.21.1 or earlier: The PXF Hive connector no longer supports providing a DELIMITER=<delim> option in the LOCATION URI when you create an external table that specifies the HiveText or HiveRC profiles. If you have previously created an external table that specified a DELIMITER in the LOCATION URI, you must drop the table, and recreate it omitting the DELIMITER from the LOCATION. You are still required to provide a non-default delimiter in the external table formatting options.

  7. Synchronize the PXF configuration from the master host to the standby master and each Greenplum Database segment host. For example:

    gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster sync
    
  8. Start PXF on each segment host as described in Starting PXF.