Tanzu Greenplum Platform Extension Framework 6.x Release Notes
The Tanzu Greenplum Platform Extension Framework (PXF) is included in the Tanzu Greenplum Database distribution in Greenplum version 6.x and in version 5.28 and older. PXF for Redhat/CentOS and Oracle Enterprise Linux is updated and distributed independently of Greenplum Database starting with PXF version 5.13.0. PXF version 5.16.0 is the first independent release that includes an Ubuntu distribution. You may need to download and install the PXF package to obtain the most recent version of this component.
The independent PXF 6.x distribution is compatible with these operating system platform and Greenplum Database versions:
|OS Version||Greenplum Version|
|RHEL 7.x, CentOS 7.x||5.21.2+, 6.x|
|OEL 7.x, Ubuntu 18.04 LTS||6.x|
PXF is compatible with these Java and Hadoop component versions:
|PXF Version||Java Versions||Hadoop Versions||Hive Server Versions||HBase Server Version|
|6.1.0, 6.0.x||8, 11||2.x, 3.1+||1.x, 2.x, 3.1+||1.3.2|
|5.16.x, 5.15.x, 5.14, 5.13||8, 11||2.x, 3.1+||1.x, 2.x, 3.1+||1.3.2|
Release Date: June 24, 2021
PXF 6.1.0 includes these new and changed features:
- PXF now natively supports reading and writing Avro arrays.
- PXF adds support for reading JSON objects, such as embedded arrays, as
text. The data returned by PXF is a valid JSON string that you can manipulate with the existing Greenplum Database JSON functions and operators.
- PXF improves its error reporting by displaying the exception class when there is no error message available.
- PXF introduces a new property that you can use to configure the connection timeout for data upload/write operations to an external datastore. This property is named
pxf.connection.upload-timeout, and is located in the pxf-application.properties file.
- PXF now uses the
pxf.connection.timeoutconfiguration property to set the connection timeout only for read operations. If you previously set this property to specify the write timeout, you should now use
- PXF bundles a newer
gp-common-go-libssupporting library along with its dependencies.
PXF 6.1.0 resolves these issues:
|31389||Resolves an issue where certain
|31317||PXF did not support writing Avro arrays. PXF 6.1.0 includes native support for reading and writing Avro arrays. (Resolved by PR-636.)|
Release Date: May 11, 2021
PXF 6.0.1 resolves these issues:
|Resolves an issue where PXF returned wrong results for batches of ORC data that were shorter than the default batch size. (Resolved by PR-630.)|
|Resolves an issue where PXF threw a
|178013439||Resolves an issue where using the profile
|31409||Resolves an issue where PXF intermittently failed with the error
Release Date: March 29, 2021
PXF 6.0.0 includes these new and changed features:
Architecture and Bundled Libraries
PXF 6.0.0 is built on the Spring Boot framework:
- PXF distributes a single JAR file that includes all of its dependencies.
- PXF no longer installs and uses a standalone Tomcat server; it uses the Tomcat version 9.0.43 embedded in the PXF Spring Boot application.
PXF bundles the
postgresql-42.2.14.jarPostgreSQL driver JAR file.
PXF library dependencies have changed with new, updated, and removed libraries.
The PXF API has changed. If you are upgrading from PXF 5.x, you must update the PXF extension in each database in which it is registered as described in Upgrading from PXF 5.
PXF 6 moves fragment allocation from its C extension to the PXF Service running on each segment host.
The PXF Service now also runs on the Greenplum Database master and standby master hosts. If you used PXF 5.x to access Kerberos-secured HDFS, you must now generate principals and keytabs for the master and standby master as described in Upgrading from PXF 5.
Files, Configuration, and Commands
- PXF 6 uses the
$PXF_BASEenvironment variable to identify its runtime configuration directory; it no longer uses
$PXF_CONFfor this purpose.
- By default, PXF installs its executables and runtime configuration into the same directory,
PXF_BASE=$PXF_HOME. See About the PXF Installation and Configuration Directories for the new installation file layout.
- You can relocate the
$PXF_BASEruntime configuration directory to a different directory after you install PXF by running the new
pxf [cluster] preparecommand as described in Relocating $PXF_BASE.
- PXF template server configuration files now reside in
$PXF_HOME/templates; they were previously located in the
pxf [cluster] registercommand now copies only the PXF
pxf.controlextension file to the Greenplum Database installation. Run this command after your first installation of PXF, and/or after you upgrade your Greenplum Database installation.
- PXF 6 no longer requires initialization, and deprecates the
pxf [cluster] initis now equivalent to
pxf [cluster] register, and
pxf [cluster] resetis a no-op.
PXF 6 includes new and changed configuration; see About the PXF Configuration Files for more information:
- PXF 6 integrates with Apache Log4j 2; the PXF logging configuration file is now named
pxf-log4j2.xml, and is in
PXF 6 adds a new configuration file for the PXF server application,
pxf-application.properties; this file includes:
- New properties to configure the PXF streaming thread pool.
pxf.log.levelproperty to set the PXF logging level.
Configuration properties moved from the PXF 5
pxf-env.shfile and renamed:
pxf-env.sh Property Name pxf-application.properties Property Name PXF_MAX_THREADS pxf.max.threads
PXF 6 adds new configuration environment variables to
pxf-env.shto simplify the registration of external library dependencies:
New Property Name Description PXF_LOADER_PATH Additional directories and JARs for PXF to class-load. LD_LIBRARY_PATH Additional directories and native libraries for PXF to load.
See Registering PXF Library Dependencies for more information.
PXF 6 deprecates the
PXF_FRAGMENTER_CACHEconfiguration property; fragment metadata caching is no longer configurable and is now always enabled.
- PXF 6 integrates with Apache Log4j 2; the PXF logging configuration file is now named
PXF 6 introduces new profile names and deprecates some older profile names. The old profile names still work, but it is highly recommended to switch to using the new profile names:
New Profile Name Old/Deprecated Profile Name hive Hive hive:rc HiveRC hive:orc HiveORC hive:orc HiveVectorizedORC1 hive:text HiveText jdbc Jdbc hbase HBase
1 To use the
HiveVectorizedORCprofile in PXF 6, specify the
hive:orcprofile name with the new
PXF adds support for natively reading an ORC file located in Hadoop, an object store, or a network file system. See the Hadoop ORC and Object Store ORC documentation for prerequisites and usage information.
PXF adds support for reading and writing comma-separated value form text data located in Hadoop, an object store, or a network file system though a separate
CSVprofile. See the Hadoop Text and Object Store Text documentation for usage information.
PXF supports predicate pushdown on
PXF supports predicate pushdown for the
INoperator when you specify one of the
*:parquetprofiles to read a parquet file.
PXF supports specifying a codec short name (alias) rather than the Java class name when you create a writable external table for a
*:SequenceFileprofile that includes a
- PXF now supports monitoring of the PXF Service process at runtime. Refer to About PXF Service Runtime Monitoring for more information.
- PXF improves the display of error messages in the
psqlclient, in some cases including a
HINTthat provides possible error resolution actions.
- When PXF is configured to auto-terminate on detection of an out of memory condition, it now logs messages to
PXF version 6.0.0 removes:
THREAD-SAFEexternal table custom option (deprecated since 5.10.0).
PXF_KEYTABconfiguration properties in
pxf-env.sh(deprecated since 5.10.0).
jdbc.user.impersonationconfiguration property in
jdbc-site.xml(deprecated since 5.10.0).
- The Hadoop profile names
SequenceWritable(deprecated since 5.0.1).
PXF 6.0.0 resolves these issues:
|30987||Resolves an issue where PXF returned an
Deprecated features may be removed in a future major release of PXF. PXF version 6.x deprecates:
PXF_FRAGMENTER_CACHEconfiguration property (deprecated since PXF version 6.0.0).
pxf [cluster] initcommands (deprecated since PXF version 6.0.0).
pxf [cluster] resetcommands (deprecated since PXF version 6.0.0).
- The Hive profile names
HiveVectorizedORC(deprecated since PXF version 6.0.0). Refer to Connectors, Data Formats, and Profiles in the PXF Hadoop documentation for the new profile names.
HBaseprofile name (now
hbase) (deprecated since PXF version 6.0.0).
Jdbcprofile name (now
jdbc) (deprecated since PXF version 6.0.0).
- Specifying a
COMPRESSION_CODECusing the Java class name; use the codec short name instead.
PXF 6.x has these known issues and limitations:
|178013439||(Resolved in 6.0.1) Using the deprecated
Workaround: Use the
|31409||(Resolved in 6.0.1) PXF can intermittently fail with the following error when it accesses Hive tables
Workaround: Use vectorized query execution by adding the
|168957894||The PXF Hive Connector does not support using the
Workaround: Use the PXF JDBC Connector to access Hive 3 managed tables.