Tanzu Greenplum Platform Extension Framework 5.x Release Notes
The Tanzu Greenplum Platform Extension Framework (PXF) is included in the Tanzu Greenplum distribution in Greenplum versions including and older than 5.28 and 6.11. PXF for Redhat/CentOS and Oracle Enterprise Linux is updated and distributed independently of Greenplum Database starting with PXF version 5.13.0. You may need to download and install the PXF package to obtain the most recent version of this component.
Supported Platforms
The independent PXF distribution is compatible with these operating system platform and Greenplum versions:
OS Version | Greenplum Version |
---|---|
RHEL 6.x, CentOS 6.x | 5.21.2+ |
RHEL 7.x, CentOS 7.x | 5.21.2+, 6.x |
OEL 7.x, Ubuntu 18.04 LTS | 6.x |
cURL
and instead loads the system-provided library. PXF requires cURL
version 7.29.0 or newer. The officially-supported cURL
for the CentOS 6.x and Red Hat Enterprise Linux 6.x operating systems is version 7.19.*. Greenplum Database 6 does not support running PXF on CentOS 6.x or RHEL 6.x due to this limitation.PXF is compatible with these Java and Hadoop component versions:
PXF Version | Java Versions | Hadoop Versions | Hive Server Versions | HBase Server Version |
---|---|---|---|---|
5.16.x, 5.15.x, 5.14, 5.13 | 8, 11 | 2.x, 3.1+ | 1.x, 2.x, 3.1+ | 1.3.2 |
Release 5.16.2
Release Date: February 25, 2021
Resolved Issues
PXF 5.16.2 resolves these issues:
Issue # | Summary |
---|---|
31253 | Resolves an issue where LATIN1 -encoded data read by PXF displayed incorrectly because PXF performed a string conversion without taking the encoding into account. PXF now lets Greenplum Database handle the encoding conversion. |
31219 | Resolves an issue where an insert from a PXF external table defined with a SEGMENT REJECT LIMIT failed when PXF encountered an error while reading the last tuple. PXF now more robustly handles the case where all external data has been consumed, but Greenplum requests additional data. |
176987367 | Performance improvement that avoids a reverse DNS lookup. |
Release 5.16.1
Release Date: February 5, 2021
Resolved Issues
PXF 5.16.1 resolves this issue:
Issue # | Summary |
---|---|
31105 | Resolves an issue where PXF returned improperly formatted data from Hive when it accessed a nested struct that contained strings with escaped special characters. PXF now correctly resolves and escapes strings in nested complex data types that it reads from Hive. |
Release 5.16
Release Date: November 6, 2020
New and Changed Features
PXF 5.16.0 includes these new and changed features:
- Version 5.16 is the first standalone PXF release that includes a
.deb
installation package for Ubuntu 18.04 LTS systems. - PXF adds support for reading files from, and writing files to, a network file system mounted on each Greenplum Database segment host. See Accessing Files on a Network File System with PXF for prerequisites, configuration, and usage information for this feature.
- PXF disallows specifying relative paths and environment variables in the
CREATE EXTERNAL TABLE LOCATION clause file path. - PXF adds the new
pxf.fs.basePath
property to the PXFpxf-site.xml
template file. The property is commented out by default; you set this property to specify the base path from which PXF accesses the path that you specify in theCREATE EXTERNAL TABLE LOCATION clause. See About the pxf.fs.basePath Property for more information. - The
pxf.service.user.name
property in the PXFpxf-site.xml
template file is now commented out by default, and the file now includes an enhanced description of the property. Additionally, the documentation has been enhanced to provide specific use cases and configuration scenarios for access to secured and non-secured Hadoop clusters. See secured cluster use cases and non-secured cluster use cases. - The default value of the
jdbc.pool.property.maximumPoolSize
property in thejdbc-site.xml
template file was increased from5
to15
to better support out-of-the-box reads and writes of large amounts of data. - PXF now supports reading from, and writing to, external tables altered by dropping columns.
- PXF adds the new
--skip-register
flag to thepxf [cluster] init
command. This flag instructs PXF to skip the initialization step that copies the PXF extension files to the Greenplum installation on the host(s). - The PXF
Hive
profile now supports column projection and predicate pushdown when you use it to access Hive tablesSTORED AS Parquet
. - Column projection and predicate pushdown are now enabled by default when you use PXF and the
Hive
profile to access Hive tablesSTORED AS
Parquet
,RCFile
, andORC
. - PXF now supports using the
Hive
profile to read data from a Hive tableSTORED AS Parquet
when the underlying Parquet file(s) has a different column order than the defining Hive table. - PXF now reduces and optimizes its memory usage during fragmentation for
Hive*
profiles.
Resolved Issues
PXF 5.16 resolves these issues:
Issue # | Summary |
---|---|
459 | Resolves an issue where PXF failed to write NULL bytea - and smallint - type Avro data values. |
30987 | PXF generated a buffer that exceeded a 1GB limit during fragmentation of a user query that specified the Hive profile. PXF now reduces and optimizes its memory usage during fragmentation for Hive* profiles. |
30953, 30855 | Resolves an issue where PXF failed to release some resources when it encountered an error during filter execution. |
30930 | The use of the pxf.service.user.name property was not clear. The PXF pxf-site.xml template file now comments out this property by default and includes an enhanced description of the property. |
30905 | Resolves an issue where PXF returned an error when it was used to access a readable external table in which one or more columns had been dropped. PXF now supports both reading from and writing to external tables that have had columns dropped. |
30638 | Resolves an issue where PXF failed to read a Hive table STORED AS Parquet when the Greenplum Database external table specified the Hive profile, the table was backed by one or more Parquet files, and an underlying Parquet file had a different column order than the defining Hive table. PXF can now read such a Hive table. |
Release 5.15.1
Release Date: September 11, 2020
Changes
PXF 5.15.1 includes these changes:
- PXF bundles a new version of Tomcat, 7.0.105.
- PXF improves the performance of Parquet write operations (see Resolved Issue 30788, 30779) by:
- No longer splitting files that are over 128MB in size.
- Bundling Parquet version 1.11.1 libraries.
- Providing a new
ENABLE_DICTIONARY
option to enable/disable dictionary encoding when PXF writes Parquet data. - Using the Parquet logical
date
and physicalint32
types when writing dates. See Resolved Issue 174433819.
- When dictionary-encoding is enabled, the default
DICTIONARY_PAGE_SIZE
that PXF uses when writing Parquet data is now1 * 1024 * 1024
(it was previously1 * 512 * 1024
). - PXF provides integrated native library registration support by exposing the new user configuration directory
$PXF_CONF/lib/native
and a template for setting theLD_LIBRARY_PATH
option. See Resolved Issue 264 and Registering PXF Library Dependencies.
Resolved Issues
PXF 5.15.1 resolves these issues:
Issue # | Summary |
---|---|
264 | Resolves an issue where it was not clear how to register a native library with PXF. PXF now provides integrated native library registration support and related documentation. |
30788, 30779 | Improves PXF performance when writing Parquet data by not splitting files larger than 128MB, using newer parquet libraries, and exposing a new ENABLE_DICTIONARY option. |
174433819 | Resolves an issue where PXF used the utf8 string logical type and binary physical type to write a Parquet date. PXF now stores dates using the Parquet logical date and physical int32 types. |
Release 5.15.0
Release Date: August 25, 2020
New and Changed Features
PXF 5.15.0 includes these new and changed features:
- PXF bundles the
opencsv
library to satisfy a missing transitive dependency that is required when PXF reads Hive tables created with theOpenCSVSerde
serialization/deserialization class. - PXF bundles newer
jackson-core
,jackson-databind
, andjackson-annotations
supporting libraries. - PXF supports
bzip2
andxz
compression when reading from or writing to Avro files. - PXF introduces a new option named
SKIP_HEADER_COUNT=<N>
that you can use to instruct PXF to skip the firstN
lines in the first split of a text file. - PXF includes improvements to Hive error handling and error surfacing.
- PXF no longer restricts operations using
bzip2
compression to a single thread. - PXF 5.15.0 deprecates and ignores the
THREAD-SAFE
custom option setting. All query and write operations on a PXF external table are now always thread-safe.
Resolved Issues
PXF 5.15.0 resolves these issues:
Issue # | Summary |
---|---|
30788 | Resolves a PXF performance degradation issue that was encountered when writing very wide (greater than 1MB) rows. |
30787 | PXF did not surface a meaningful error when it encountered a problem accessing Hive 1.x. This issue is resolved. |
30767 | There was no way to instruct PXF to skip reading one or more lines at the beginning of a text file. This issue is resolved; PXF now exposes the SKIP_HEADER_COUNT option to specify the number of lines to skip when processing the first split of the file. |
173344237 | Resolves a NoClassDefFoundError error by adding the missing serialization/deserialization library to the PXF download package and installation. |
Release 5.14.0
Release Date: July 7, 2020
New and Changed Features
PXF 5.14.0 includes these new and changed features:
- PXF supports the
deflate
andsnappy
compression codecs when writing Avro data to an external data store. By default, PXF now compresses all Avro data with thedeflate
codec before writing it to the external store. - Before writing Avro data, PXF converts
smallint
-type columns to theint
data type. You must specify anint
-type column in an external table definition to read this data.
Resolved Issues
PXF 5.14.0 resolves these issues:
Issue # | Summary |
---|---|
30708 | PXF can now compress Avro data before writing it to an external data store. |
30671 | PXF fixes an issue where it did not correctly handle writing Avro data when the external table definition included a smallint type column. |
Release 5.13.0
Release Date: June 30, 2020
New and Changed Features
PXF 5.13.0 includes these new and changed features since PXF 5.12.0:
- PXF 5.13.0 is the first standalone release of PXF for RedHat/CentOS that is distributed separately from Greenplum Database.
Resolved Issues
PXF 5.13.0 resolves these issues:
Issue # | Summary |
---|---|
364 | PXF fixes an issue where it did not correctly read from an external table when the LOCATION clause included back-quoted text. |
30640 | The use of the pxf.service.user.name property was not clear. The PXF pxf-site.xml template file now includes an enhanced description of the property. |
Deprecated Features
Deprecated features may be removed in a future major release of PXF. PXF version 5.x deprecates:
- The
THREAD-SAFE
custom option setting. All query and write operations are thread-safe (deprecated since PXF version 5.15.0). - The
PXF_USER_IMPERSONATION
,PXF_PRINCIPAL
, andPXF_KEYTAB
settings in thepxf-env.sh
file. You can use thepxf-site.xml
file to configure Kerberos and impersonation settings for your new Hadoop server configurations (deprecated since PXF version 5.10.0). - The
pxf.impersonation.jdbc
property setting in thejdbc-site.xml
file. You can use thepxf.service.user.impersonation
property to configure user impersonation for a new JDBC server configuration (deprecated since PXF version 5.10.0). - The HDFS profile names for the Text, Avro, JSON, Parquet, and SequenceFile data formats (deprecated since PXF version 5.0.1). Refer to Connectors, Data Formats, and Profiles in the PXF Hadoop documentation for more information.
Known Issues and Limitations
PXF 5.x has these known issues and limitations:
Issue # | Description |
---|---|
168957894 | The PXF Hive Connector does not support using the Hive* profiles to access Hive 3 managed (CRUD and insert-only transactional, and temporary) tables.Workaround: Use the PXF JDBC Connector to access Hive 3 managed tables. |