Pivotal Greenplum Platform Extension Framework 5.x Release Notes

The Pivotal Greenplum Platform Extension Framework (PXF) is included in the Pivotal Greenplum Database distribution in Greenplum versions including and older than 5.28 and 6.11. PXF for Redhat/CentOS and Oracle Enterprise Linux is updated and distributed independently of Greenplum Database starting with PXF version 5.13.0. You may need to download and install the PXF package to obtain the most recent version of this component.

Supported Platforms

The independent PXF distribution is compatible with these operating system platform and Greenplum versions:

OS Version Greenplum Version
RHEL 7.x, CentOS 7.x 5.21.2+, 6.x
RHEL 6.x, CentOS 6.x 5.21.2+
OEL 7.x 6.x
Starting in 6.x, Greenplum does not bundle cURL and instead loads the system-provided library. PXF requires cURL version 7.29.0 or newer. The officially-supported cURL for the CentOS 6.x and Red Hat Enterprise Linux 6.x operating systems is version 7.19.*. Greenplum Database 6 does not support running PXF on CentOS 6.x or RHEL 6.x due to this limitation.

PXF is compatible with these Java and Hadoop component versions:

PXF Version Java Versions Hadoop Versions Hive Server Versions HBase Server Version
5.15.x, 5.14, 5.13 8, 11 2.x, 3.1+ 1.x, 2.x, 3.1+ 1.3.2

Release 5.15.1

Release Date: September 11, 2020

Changes

PXF 5.15.1 includes these changes:

  • PXF bundles a new version of Tomcat, 7.0.105.
  • PXF improves the performance of Parquet write operations (see Resolved Issue 30788, 30779) by:
    • No longer splitting files that are over 128MB in size.
    • Bundling Parquet version 1.11.1 libraries.
    • Providing a new ENABLE_DICTIONARY option to enable/disable dictionary encoding when PXF writes Parquet data.
    • Using the Parquet logical date and physical int32 types when writing dates. See Resolved Issue 174433819.
  • When dictionary-encoding is enabled, the default DICTIONARY_PAGE_SIZE that PXF uses when writing Parquet data is now 1 * 1024 * 1024 (it was previously 1 * 512 * 1024).
  • PXF provides integrated native library registration support by exposing the new user configuration directory $PXF_CONF/lib/native and a template for setting the LD_LIBRARY_PATH option. See Resolved Issue 264 and Registering PXF Library Dependencies.

Resolved Issues

PXF 5.15.1 resolves these issues:

Issue # Summary
264 Resolves an issue where it was not clear how to register a native library with PXF. PXF now provides integrated native library registration support and related documentation.
30788, 30779 Improves PXF performance when writing Parquet data by not splitting files larger than 128MB, using newer parquet libraries, and exposing a new ENABLE_DICTIONARY option.
174433819 Resolves an issue where PXF used the utf8 string logical type and binary physical type to write a Parquet date. PXF now stores dates using the Parquet logical date and physical int32 types.

Release 5.15.0

Release Date: August 25, 2020

New and Changed Features

PXF 5.15.0 includes these new and changed features:

  • PXF bundles the opencsv library to satisfy a missing transitive dependency that is required when PXF reads Hive tables created with the OpenCSVSerde serialization/deserialization class.
  • PXF bundles newer jackson-core, jackson-databind, and jackson-annotations supporting libraries.
  • PXF supports bzip2 and xz compression when reading from or writing to Avro files.
  • PXF introduces a new option named SKIP_HEADER_COUNT=<N> that you can use to instruct PXF to skip the first N lines in the first split of a text file.
  • PXF includes improvements to Hive error handling and error surfacing.
  • PXF no longer restricts operations using bzip2 compression to a single thread.
  • PXF 5.15.0 deprecates and ignores the THREAD-SAFE custom option setting. All query and write operations on a PXF external table are now always thread-safe.

Resolved Issues

PXF 5.15.0 resolves these issues:

Issue # Summary
30788 Resolves a PXF performance degradation issue that was encountered when writing very wide (greater than 1MB) rows.
30787 PXF did not surface a meaningful error when it encountered a problem accessing Hive 1.x. This issue is resolved.
30767 There was no way to instruct PXF to skip reading one or more lines at the beginning of a text file. This issue is resolved; PXF now exposes the SKIP_HEADER_COUNT option to specify the number of lines to skip when processing the first split of the file.
173344237 Resolves a NoClassDefFoundError error by adding the missing serialization/deserialization library to the PXF download package and installation.

Release 5.14.0

Release Date: July 7, 2020

New and Changed Features

PXF 5.14.0 includes these new and changed features:

  • PXF supports the deflate and snappy compression codecs when writing Avro data to an external data store. By default, PXF now compresses all Avro data with the deflate codec before writing it to the external store.
  • Before writing Avro data, PXF converts smallint-type columns to the int data type. You must specify an int-type column in an external table definition to read this data.

Resolved Issues

PXF 5.14.0 resolves these issues:

Issue # Summary
30708 PXF can now compress Avro data before writing it to an external data store.
30671 PXF fixes an issue where it did not correctly handle writing Avro data when the external table definition included a smallint type column.

Release 5.13.0

Release Date: June 30, 2020

New and Changed Features

PXF 5.13.0 includes these new and changed features since PXF 5.12.0:

  • PXF 5.13.0 is the first standalone release of PXF for RedHat/CentOS that is distributed separately from Greenplum Database.

Resolved Issues

PXF 5.13.0 resolves these issues:

Issue # Summary
364 PXF fixes an issue where it did not correctly read from an external table when the LOCATION clause included back-quoted text.
30640 The use of the pxf.service.user.name property was not clear. The PXF pxf-site.xml template file now includes an enhanced description of the property.

Deprecated Features

Deprecated features may be removed in a future major release of PXF. PXF version 5.x deprecates:

  • The THREAD-SAFE custom option setting. All query and write operations are thread-safe (deprecated since PXF version 5.15.0).
  • The PXF_USER_IMPERSONATION, PXF_PRINCIPAL, and PXF_KEYTAB settings in the pxf-env.sh file. You can use the pxf-site.xml file to configure Kerberos and impersonation settings for your new Hadoop server configurations (deprecated since PXF version 5.10.0).
  • The pxf.impersonation.jdbc property setting in the jdbc-site.xml file. You can use the pxf.service.user.impersonation property to configure user impersonation for a new JDBC server configuration (deprecated since PXF version 5.10.0).
  • The HDFS profile names for the Text, Avro, JSON, Parquet, and SequenceFile data formats (deprecated since PXF version 5.0.1). Refer to Connectors, Data Formats, and Profiles in the PXF Hadoop documentation for more information.

Known Issues and Limitations

PXF 5.x has these known issues and limitations:

Issue # Description
168957894 The PXF Hive Connector does not support using the Hive* profiles to access Hive transactional tables.
Workaround: Use the PXF JDBC Connector to access Hive.