Pivotal Greenplum 6.5 Release Notes
Pivotal Greenplum 6.5 Release Notes
This document contains pertinent release information about Pivotal Greenplum Database 6.5 releases. For previous versions of the release notes for Greenplum Database, go to Pivotal Greenplum Database Documentation. For information about Greenplum Database end of life, see Pivotal Greenplum Database end of life policy.
Pivotal Greenplum 6 software is available for download from the Pivotal Greenplum page on Pivotal Network.
Pivotal Greenplum 6 is based on the open source Greenplum Database project code.
Release Date: 2020-03-20
Greenplum Database 6.5.0 includes these new and changed features:
- When creating a user-defined function, you can specify the attribute EXECUTE
ON INITPLAN to indicate that the function contains an SQL command that
dispatches queries to the segment instances and requires special processing on the
master instance by Greenplum Database. When possible, Greenplum Database handles the
function on the master instance in the following manner:
- First, Greenplum Database executes the function as part of an InitPlan node on the master instance and holds the function output temporarily.
- Then, in the MainPlan of the query plan, the function is called in an EntryDB (a special query executor (QE) that runs on the master instance) and Greenplum Database returns the data that was captured when the function was executed as part of the InitPlan node. The function is not executed in the MainPlan.
For more information about the attribute and limitations when using the attribute, see CREATE FUNCTION.
- GPORCA introduces a new costing model for bitmap indexes. The new model is designed to
choose faster, bitmap nested loop joins instead of hash joins. The new costing model is
implemented as a Beta feature, and it is used as a default only if you enable it by
setting the configuration
set optimizer_cost_model = experimental
The optimizer_cost_model parameter is required only during the Beta test period for this cost model. After further testing and validation, the new cost model will be enabled by default.
- Greenplum Database includes the server configuration parameter plan_cache_mode that controls whether a prepared statement (either
explicitly prepared or implicitly generated, for example by PL/pgSQL) can be executed
using a custom plan or a generic plan plan.
Custom plans are created for each execution using its specific set of parameter values, while generic plans do not rely on the parameter values and can be re-used across executions. By default, choice between these options is made automatically, but it can be overridden by setting this parameter. If the prepared statement has no parameters, a generic plan is always used. The allowed values are auto (the default), force_custom_plan and force_generic_plan. This setting is considered when a cached plan is to be executed, not when it is prepared.
- PXF version 5.11.1 is included, which introduces new and changed features and bug fixes. See PXF Version 5.11.1 below.
- The s3 external table protocol automatically recognizes and uncompresses as deflate format any file that it reads that has a .deflate extension.
- Greenplum Database introduces the Greenplum R Client Beta, an interactive
in-database data analytics tool. Refer to the Greenplum Database R Client (Beta) documentation for
installation and usage information for this tool.
The Greenplum R Client (GreenplumR) is currently a Beta feature, and is not supported for production environments.
- gpload adds the --max_retries option to specify the number of times the utility attempts to connect to Greenplum Database after a connection timeout. The default value, 0, does not attempt a connection after a timeout.
- Greenplum Database introduces PL/Container version 3.0 Beta, which:
- Provides support for the new GreenplumR interface.
- Reduces the number of processes created by PL/Container, in order to save system resources.
- Supports more containers running concurrently.
- Includes improved log messages to help diagnose problems.
PL/Container 3 is currently a Beta feature, and is not supported for production environments. It provides only an R Docker image for executing functions; Python images are not yet available. See PL/Container Language for installation changes related to PL/Container 3.
PXF Version 5.11.1
PXF includes the following new and changed features:
- PXF provides a restart command to stop, and then restart, all PXF server instances in the cluster. See Restarting PXF.
- The pxf [cluster] sync command now recognizes a [-d | --delete] option. When specified, PXF deletes files on the remote host(s) that are not present in the PXF user configuration on the Greenplum Database master host. Refer to pxf and pxf cluster.
- PXF supports filter predicate pushdown for Parquet data that you access with the Hadoop and Object Store Connectors. Parquet Data Type Mapping describes filter pushdown support for Parquet data types in PXF.
- PXF includes improvements to error handling and error surfacing.
- PXF bundles newer guava and Google Cloud Storage hadoop2 libraries.
- The PXF pxf-log4j.properties template file updates a log filter and changes the level from INFO to WARN.
- PXF removes unused and default Tomcat applications and files, hardening its default Tomcat security.
- PXF no longer requires a $JAVA_HOME setting in gpadmin's .bashrc file on the master, standby master, and segment hosts. You can now specify JAVA_HOME before or during PXF initialization. Refer to the Initialization Overview in the PXF initialization documentation.
Pivotal Greenplum 6.5.0 resolves these issues:
- 307 - PXF
- PXF did not correctly handle an external table that was created with the ESCAPE 'OFF' or DELIMITER 'OFF' formatting options. This issue is resolved. PXF now correctly neither escapes nor adds delimiters when reading external data with an external table created with these options.
- 30155 - gpstart
- On systems that use a custom tablespace or filespace, gpstart could
fail to start a cluster if the standby master host was down (for example, if the
standby was taken offline for maintenance), showing the error:
Error occured while stopping the standby master: ExecutionError: 'non-zero rc: 255' occured.This problem occurred because gpstart was attempting to check and sync the filespace or tablespace on the unavailable standby master host. gpstart was modified to skip filespace and tablespace checks when the standby server is not available.
- 30255 - Query Optimizer
- The GPORCA cost model for bitmap indexes could would sometimes cost bitmap nested loop joins higher than hash joins, resulting in poor query performance. Greenplum Database 6.5 introduces a revised cost model for bitmap indexes to address this issue. See Features.
- 30287 - Server: Execution
- When GPORCA was enabled, queries against an append-only, column-oriented table could cause a PANIC due to shared memory corruption. The code was modified to guard against out-of-bound writes that caused the memory corruption.
- 30367 - Query Optimizer
- For GPORCA, query performance was poor for some queries against tables with columns that are defined with the citext datatype. The poor performance was because GPORCA did not gather statistics and calculate cardinalities for those columns. Now GPORCA gathers statistics and calculates cardinalities for columns defined with the citext datatype.
- 30369 - Query Execution
- Greenplum Database generated a PANIC when executing a query that contains a JOIN LATERAL and the LATERAL subquery contains a LIMIT clause. Now the specified type of query completes.
- 30379 - ANALYZE
- In some cases, performing an ANALYZE operation on a table with a column that is defined with the citext datatype returns the error permission denied for schema <name>. The error was generated when the user performing the operation did not have USAGE privilege in the schema where the citext datatype was defined with the CREATE EXTENSION citext command. Greenplum Database has been modified to not require USAGE privilege in the citext datatype schema for ANALYZE operations.
- 30382 - VACUUM,TRUNCATE
- In some cases, performing a VACUUM FULL operation on the pg_class catalog table and concurrently performing a TRUNCATE operation on a user created heap table returned the error updated tuple is already HEAP_MOVED_OFF and caused the database to become unavailable. The TRUNCATE command did not properly manage the heap table entry in pg_class during the TRUNCATE operation. This issue is resolved.
- 30390 - gprecoverseg
- In some cases, performance was poor when performing an incremental recovery with the gprecoverseg utility on a system with a large number of segment instances. Performance is improved, now the utility performs some recovery operations in parallel.
- 30405 - gpcheckcat
- The gpcheckcat utility failed when the dbid of Greenplum Database master was not 1. Now the master dbid is not required to be 1.
- 30426 - Query Execution
- Some queries that use the window function cume_dist() return the error Backward scanning of tuplestores are not supported if the query generates spill files. This issue is resolved and backward scanning of tuplestore spill files is allowed during query execution.
- 30437 - Query Optimizer
- Queries using dynamic partition elimination (DPE) with range predicates were running slow. This issue has been fixed by allowing only equality comparisons with DPE.
- 30438 - Catalog and Metadata
- If the server configuration parameter gp_use_legacy_hashops was set to on, Greenplum Database incorrectly used the non-legacy opclasses when redistributing a table with an ALTER TABLE...(REORGANIZE = TRUE) command if the command contained DISTRIBUTED BY clause the did not specify an opclass. This caused SELECT commands against a table with redistributed data to return incorrect results.
- 30441 - analyzedb
- The analyzedb utility could fail with an error similar to ERROR: relation "pg_aoseg.pg_aocsseg_xxxxxx" does not exist if a table was dropped during the analyzedb operation. This problem was resolved by ensuring that analyzedb skips any dropped tables when determining the list of tables to analyze.
- 30450 - PXF
- PXF initialization and reset failed when the default system Java version differed from that specified in PXF's $JAVA_HOME. This issue is resolved; PXF has added flexibility to the specification of the $JAVA_HOME setting.
- 30452 - Dispatch
- If the server configuration parameter check_function_bodies was set in a session on the master, the parameter setting did not persist when a related segment instance session was reset. This caused some functions to fail. Now the parameter setting persists when a segment instance session is reset.
- 30464 - Query Optimizer
- GPORCA incorrectly determined that the plan for a query with a filter on a window function over the distribution key column was direct dispatchable. Now direct dispatch requires the filter to be on a table scan.
- 30471, 8987 - Postgres Planner
- When the Postgres Planner executed some queries that contain a subquery that contain both a distinct qualified aggregate expression and a GROUP BY clause, the Postgres Planner returned the error could not find pathkey item to sort. The error was returned when the Postgres Planner supply did not properly manage information used for sorting.
- 30474 - Query Execution
- In some specific situations, some specific types of queries generated a Greenplum Database PANIC. The PANIC occurred when Greenplum Database did not properly handle skew optimization for multi-batch hash joins. Criteria for a query that caused the PANIC include the query contains a join, the join key has segment-local statistics (such as a catalog table), and the join key is one of the most common values, and the query plan is multi-batch hash join and the hash join is rescannable.
- 30477 - Query Analyze
- While gathering statistics for a partitioned table, the pg_class columns relpages and reltuples were not populated for the root partition, only for leaf partitions. This issue has been fixed by changing the method to calculate if a partition table is empty or not.
- 30485 - gpinitsystem
When initializing a Greenplum Database system, the gpinitsystem utility failed to set the password for a user name when the name is numeric.
- 30493 - analyzedb
- When used with the --config-file option, analyzedb did not enumerate the leaf partitions of a partitioned table and processed the root partition as a non-partitioned table. For heap tables this produced an error. For append-optimized tables, no error was raised, but DML changes to leaf partitions were not tracked properly. This issue has been resolved. Using the --config-file option correctly analyzes partitioned tables.
- 168199193 - COPY
- In some cases, performance of the COPY command in Greenplum Database 6 was poor when compared to Greenplum Database 5. The performance of the COPY command is improved.
- 168828451, 8677 - Planner
- Some queries returned incorrect results when the queries contain subqueries that perform a join and also contain one or more equality predicates, and optionally contain an IS NULL predicate. Incorrect results were returned when either a merge join or a nested loop join did not correctly process the predicates. This issue is resolved.
- 169030090 - Server
- Superusers were limited to 3 connections by default, causing "too many clients" errors when users run maintenance scripts. The maximum number of superuser connections is set with the superuser_default_connections server configuration parameter. This issue is resolved. The default value for this parameter has changed from 3 to 10.
- 170745356, 9407 - Query Execution
- With gp_enable_global_deadlock_detector set to on, concurrent updates to the same table could produce an incorrect query result. This issue is resolved. Segments report waited-for transaction IDs to the master so that the master has the same transaction order as the segments.
- 170861600 - Server
- Using ALTER TABLE tablename SPLIT PARTITION could cause rows to be assigned to the wrong partition, or could cause a crash, if one or more columns before the partition key were dropped. This issue is resolved.
- 171481916 - gpinitstandby
- In some cases, utilities that checked for host IP address such as gpinitstandby failed. A Python utility (ifaddrs) that is used those Greenplum Database utilities caused the failure. ifaddrs has been updated.
- 171596248 9679 Query Execution
- A Greenplum Database segment instance might generate a PANIC when a query that joins tables with a compound data type generates a query plan that performs a data motion and contains Nested Loop joins. The PANIC occurs due to an error in the prefetch logic for the motion. The prefetch logic issue has been corrected.
Upgrading to Greenplum 6.5.0
See Upgrading from an Earlier Greenplum 6 Release to upgrade your existing Greenplum 6.x software to Greenplum 6.5.0.
Deprecated features will be removed in a future major release of Greenplum Database. Pivotal Greenplum 6.x deprecates:
- The analzyedb option --skip_root_stats (deprecated
If the option is specified, a warning is issued stating that the option will be ignored.
- The server configuration parameter gp_statistics_use_fkeys (deprecated since 6.2).
- The following PXF configuration properties (deprecated since 6.2):
- The PXF_USER_IMPERSONATION, PXF_PRINCIPAL, and PXF_KEYTAB settings in the pxf-env.sh file. You can use the pxf-site.xml file to configure Kerberos and impersonation settings for your new Hadoop server configurations.
- The pxf.impersonation.jdbc property setting in the jdbc-site.xml file. You can use the pxf.service.user.impersonation property to configure user impersonation for a new JDBC server configuration.
- The server configuration parameter gp_ignore_error_table
(deprecated since 6.0).
To avoid a Greenplum Database syntax error, set the value of this parameter to true when you run applications that execute CREATE EXTERNAL TABLE or COPY commands that include the now removed Greenplum Database 4.3.x INTO ERROR TABLE clause.
- Specifying => as an operator name in the CREATE OPERATOR command (deprecated since 6.0).
- The Greenplum external table C API (deprecated since 6.0).
Any developers using this API are encouraged to use the new Foreign Data Wrapper API in its place.
- Commas placed between a SUBPARTITION TEMPLATE clause and its
corresponding SUBPARTITION BY clause, and between consecutive
SUBPARTITION BY clauses in a CREATE TABLE
command (deprecated since 6.0).
Using this undocumented syntax will generate a deprecation warning message.
- The timestamp format YYYYMMDDHH24MISS (deprecated since 6.0).
This format could not be parsed unambiguously in previous Greenplum Database releases, and is not supported in PostgreSQL 9.4.
- The createlang and droplang utilities (deprecated since 6.0).
- The pg_resqueue_status system view (deprecated since 6.0).
Use the gp_toolkit.gp_resqueue_status view instead.
- The GLOBAL and LOCAL modifiers when
creating a temporary table with the CREATE TABLE and
CREATE TABLE AS commands (deprecated since 6.0).
These keywords are present for SQL standard compatibility, but have no effect in Greenplum Database.
- The Greenplum Platform Extension Framework (PXF) HDFS profile names
for the Text, Avro, JSON, Parquet, and SequenceFile data formats
(deprecated since 5.16).
Refer to Connectors, Data Formats, and Profiles in the PXF Hadoop documentation for more information.
- Using WITH OIDS or oids=TRUE to assign an OID system column when creating or altering a table (deprecated since 6.0).
- Allowing superusers to specify the SQL_ASCII encoding
regardless of the locale settings (deprecated since 6.0).
This choice may result in misbehavior of character-string functions when data that is not encoding-compatible with the locale is stored in the database.
- The @@@ text search operator (deprecated since 6.0).
This operator is currently a synonym for the @@ operator.
- The unparenthesized syntax for option lists in the VACUUM command
(deprecated since 6.0).
This syntax requires that the options to the command be specified in a specific order.
- The plain pgbouncer authentication type (auth_type = plain) (deprecated since 4.x).
Migrating Data to Greenplum 6
See Migrating Data from Greenplum 4.3 or 5 for guidelines and considerations for migrating existing Greenplum data to Greenplum 6, using standard backup and restore procedures.
Known Issues and Limitations
Pivotal Greenplum 6 has these limitations:
- Upgrading a Greenplum Database 4 or 5 release, or Greenplum 6 Beta release, to Pivotal Greenplum 6 is not supported.
- MADlib, GPText, and PostGIS are not yet provided for installation on Ubuntu systems.
- Greenplum 6 is not supported for installation on DCA systems.
- Greenplum for Kubernetes is not yet provided with this release.
The following table lists key known issues in Pivotal Greenplum 6.x.
|n/a||gpperfmon||The Ubuntu build of Greenplum Database 6.5.0 does not include the gpperfmon database, which is required for using Greenplum Command Center. Customers deploying to Ubuntu should not install or upgrade to Greenplum Database 6.5 until a maintenance release is provided to resolve this issue.|
|n/a||Materialized Views||By default, certain gp_toolkit views do not display data for materialized views. If you want to include this information in gp_toolkit view output, you must redefine a gp_toolkit internal view as described in Including Data for Materialized Views.|
|N/A||Greenplum Client/Load Tools on Windows||The Greenplum Database client and load tools on Windows have not been tested with Active Directory Kerberos authentication.|
pxf [cluster] init may fail to recognize a new
JAVA_HOME setting when the value is provided via
the shell environment.
Workaround: Edit $PXF_CONF/conf/pxf-env.sh and manually set JAVA_HOME to the new value, run pxf cluster sync to synchronize this configuration change across the Greenplum cluster, and then re-run pxf [cluster] init.
|170824967||gpfidsts||For Greenplum Database 6.x, a command that accesses an external table that uses the gpfdists protocol fails if the external table does not use an IP address when specifying a host system in the LOCATION clause of the external table definition.|
|170202002||Greenplum-Kafka Integration||Updating the METADATA:SCHEMA property and restarting a previously-run load job could cause gpkafka to re-read Kafka messages published to the topic, and load duplicate messages into Greenplum Database.|
|169200795||Greenplum Stream Server||When loading Kafka data into Greenplum Database in UPDATE and MERGE modes, GPSS requires that a MAPPING exist for each column name identified in the MATCH_COLUMNS and UPDATE_COLUMNS lists.|
|168957894||PXF||The PXF Hive Connector does not support using the
Hive* profiles to access Hive transactional tables.
Workaround: Use the PXF JDBC Connector to access Hive.
|168689202||PXF||PXF fails to run any query on Java 11 that specifies a Hive*
profile due to this Hive known issue: ClassCastException when initializing HiveMetaStoreClient on JDK10
Workaround: Run PXF on Java 8 or use the PXF JDBC Connector to access Hive.
|168548176||gpbackup||When using gpbackup to back up a Greenplum Database 5.7.1 or earlier 5.x release with resource groups enabled, gpbackup returns a column not found error for t6.value AS memoryauditor.|
|164791118||PL/R||PL/R cannot be installed using the deprecated createlang
utility, and displays the
createlang: language installation failed: ERROR: no schema has been selected to create inWorkaround: Use CREATE EXTENSION to install PL/R, as described in the documentation.
Differences Compared to Open Source Greenplum Database
- Product packaging and installation script
- Support for QuickLZ compression. QuickLZ compression is not provided in the open source version of Greenplum Database due to licensing restrictions.
- Support for data connectors:
- Greenplum-Spark Connector
- Greenplum-Informatica Connector
- Greenplum-Kafka Integration
- Greenplum Stream Server
- Data Direct ODBC/JDBC Drivers
- gpcopy utility for copying or migrating objects between Greenplum systems
- Support for managing Greenplum Database using Pivotal Greenplum Command Center
- Support for full text search and text analysis using Pivotal GPText
- Greenplum backup plugin for DD Boost
- Backup/restore storage plugin API (Beta)