Upgrading from an Earlier PXF 6 Release

If you have installed a PXF 6.x rpm or deb package and have configured and are using PXF in your current Greenplum Database 5.21.2+ or 6.x installation, you must perform some upgrade actions when you install a new version of PXF 6.x.

The PXF upgrade procedure has three steps. You perform one pre-install procedure, the install itself, and then a post-install procedure to upgrade to PXF 6.x

Step 1: Perform the PXF Pre-Upgrade Actions

Perform this procedure before you upgrade to a new version of PXF 6.x:

  1. Log in to the Greenplum Database master node. For example:

    $ ssh gpadmin@<gpmaster>
  2. Identify and note the version of PXF currently running in your Greenplum cluster:

    gpadmin@gpmaster$ pxf version
  3. Stop PXF on each Greenplum host as described in Stopping PXF:

    gpadmin@gpmaster$ pxf cluster stop
  4. (Optional, Recommended) Back up the PXF user configuration files; for example, if PXF_BASE=/usr/local/pxf-gp6:

    gpadmin@gpmaster$ cp -avi /usr/local/pxf-gp6 pxf_base.bak

Step 2: Installing the New PXF 6.x

Install PXF 6.x and identify and note the new PXF version number.

Step 3: Completing the Upgrade to a Newer PXF 6.x

After you install the new version of PXF, perform the following procedure:

  1. Log in to the Greenplum Database master node. For example:

    $ ssh gpadmin@<gpmaster>
  2. PXF 6.x includes a new version of the pxf extension. Register the extension files with Greenplum Database (see pxf cluster register). $GPHOME must be set when you run this command:

    gpadmin@gpmaster$ pxf cluster register
  3. You must update the pxf extension in every Greenplum database in which you are using PXF. A database superuser or the database owner must run this SQL command in the psql subsystem or in an SQL script:

  4. If you are upgrading from PXF version 6.0.x:

    • If you previously set the pxf.connection.timeout property to change the write/upload timeout, you must now set the pxf.connection.upload-timeout property for this purpose.
    • Existing external tables that access Avro arrays and JSON objects will continue to work as-is. If you want to take advantage of the new Avro array read/write functionality or the new JSON object support, create a new external table with the adjusted DDL. If you can access the data with the new external table as you expect, you may choose to drop and recreate the existing external table.
  5. If you are upgrading to PXF version 6.2.0 to resolve an erroneous replay attack issue in a Kerberos-secured environment:

    1. If you want to change the default value of the new pxf.sasl.connection.retries property, add the following to the pxf-site.xml file for your PXF server:

          Specifies the number of retries to perform when a SASL connection is refused by a Namenode due to 'GSS initiate failed' error.
    2. (Recommended) Configure PXF to use a host-specific Kerberos principal for each segment host. If you specify the following pxf.service.kerberos.principal property setting in the PXF server’s pxf-site.xml file, PXF automatically replaces _HOST with the FQDN of the segment host:

  6. Synchronize the PXF configuration from the master host to the standby master and each Greenplum Database segment host. For example:

    gpadmin@gpmaster$ pxf cluster sync
  7. Start PXF on each Greenplum host as described in Starting PXF:

    gpadmin@gpmaster$ pxf cluster start