Pivotal Greenplum 6.0.0 Beta Release Notes

Pivotal Greenplum 6.0.0 Beta Release Notes

Updated: August 15, 2019

This document contains pertinent release information about Pivotal Greenplum Database 6.0.0 Beta. For previous versions of the release notes for Greenplum Database, go to Pivotal Greenplum Database Documentation. For information about Greenplum Database end of life, see Pivotal Greenplum Database end of life policy.

Pivotal Greenplum 6 software is available for download from the Pivotal Greenplum page on Pivotal Network.

Pivotal Greenplum 6 is based on the open source Greenplum Database project code.

Important: Pivotal Support does not provide support for open source versions of Greenplum Database. Only Pivotal Greenplum Database is supported by Pivotal Support.

Changes in Beta-7

The Pivotal Greenplum 6 Beta-7 release includes the following significant fixes and changes. (See the Github Report for a complete list of all pull requests merged since the previous Beta release.)

  • The MADlib extension package is provided with this Beta release. Greenplum 6 is compatible with MADlib 1.16, with support for deep learning.

Changes in Beta-6

Note: Pivotal Greenplum 6.0.0 Beta-5 was tagged but not released.
The Pivotal Greenplum 6 Beta-6 release includes the following significant fixes and changes. (See the Github Report for a complete list of all pull requests merged since the previous Beta release.)
Note: This Beta release contains system catalog updates. Perform a new install to use Greenplum 6 Beta-6, as upgrading from previous Beta releases is not possible.
  • GPText version 3.3 is provided with Greenplum 6.
  • Greenplum Command Center 6.0 is provided with Greenplum 6.
  • Greenplum Stream Server (GPSS) is provided for the Ubuntu platform.
  • The Greenplum Client and Loader Tools Package is available for the Ubuntu platform.
  • The following Greenplum extension packages are now provided with this Beta release: MADlib, R and Python Data Science Module packages, and PL/Container
  • PXF version 5.7.0 is included, which introduces several new and changed features and bug fixes. See PXF Version 5.7.0 below.

PXF Version 5.7.0

New Features

PXF includes the following new features:

  • The PXF JDBC Connector now supports JDBC connection pooling, and exposes properties that you can set to configure pool connection reuse and timeouts. Refer to About JDBC Connection Pooling.

Changed Features

PXF includes the following changes:

  • PXF now supports JDBC named queries that are specified both with and without an ending semicolon. PXF previously required that you not specify the ending semicolon.
  • PXF includes a newer version of the PostgreSQL JDBC driver JAR file, postgresql-42.2.5.jar.
  • The PXF JDBC Connector now uses the PARTITION_BY, RANGE, and INTERVAL partition settings to aid in partition construction, and always returns the complete dataset. PXF previously returned only the data set identified by the RANGE, and did not include NULLs or the RANGE end value.
  • PXF now solely uses the com.fasterxml.jackson JSON parser library for Java. PXF previously used both the fasterxml and the org.codehaus.jackson libraries.

Resolved Issues

PXF includes the following resolved issues:

166491258 - PXF
PXF returned an error when it encountered a query configured in a JDBC named query file that ended in a semicolon.
This issue has been resolved. PXF now supports JDBC named queries that are specified both with and without an ending semicolon.
170 - PXF
When a PARTITION_BY column, and partition RANGE and INTERVAL were specified, the PXF JDBC Connector did not include NULL values or the RANGE end value in the query constraints. Additionally, the Connector returned only the data set identified by the RANGE.
This issue has been resolved. PXF now uses the JDBC partition options as a hint when it constructs partitions. PXF generates fragments with the contraints necessary to read the complete data set, including NULLs and the RANGE end value.
29973 - PXF
PXF did not return an error when an external table specifying the JSON profile was used to read JSON-format data into an integer field, but the data was a non-numeric type. PXF converted the non-numeric type to the integer 0.
This issue has been resolved. PXF now recognizes malformed JSON and data type mismatches, and returns an error.
167077097- PXF
PXF returned an error when a writable external table with a LOCATION clause that specified a compression codec alias (rather than the complete class name) was used to INSERT parquet data.
This issue has been resolved. PXF now correctly supports compression codec aliases when writing parquet data.

Changes in Beta-4

The Pivotal Greenplum 6 Beta-4 release includes the following significant fixes and changes.
Note: This Beta release contains system catalog updates. Perform a new install to use Greenplum 6 Beta-4, as upgrading from previous Beta releases is not possible.
  • Greenplum Database is now provided for Ubuntu 18.04 systems. Note that only the Greenplum Database package is currently provided for Ubuntu, and PXF is not included; other Greenplum packages for Ubuntu will be made available at a later time.
  • The Greenplum-Spark connector is now available for Greenplum 6.
  • The diskquota and auto_explain contrib modules are included as part of the Greenplum 6 installation.
  • The Data Direct ODBC and JDBC Drivers aer now provided for download.
  • The gpcheck utility is no longer included in Greenplum Database 6.
  • The input file format for the gpmovemirrors, gpaddmirrors, gprecoverseg and gpexpand utilities has changed. Instead of using a colon character (:) as a separator, the new file format uses a pipe character (|). For example, in previous releases a line in a gpexpand input file would resemble:
    sdw5:sdw5-1:50011:/gpdata/primary/gp9:11:9:p
    The updated file format is:
    sdw5|sdw5-1|50011|/gpdata/primary/gp9|11|9|p
    In addition, gpaddmirrors removes the mirror prefix from lines in its input file. Whereas a line from the previous release might resemble:
    mirror0=0:sdw1:sdw1-1:52001:53001:54001:/gpdata/mir1/gp0
    The revised format is:
    0=0|sdw1|sdw1-1|52001|53001|54001|/gpdata/mir1/gp0
  • The PL/R package now includes the R language dependency that was missing in previous Beta releases; you no longer need to manually install R before you install PL/R with gppkg.
  • The Greenplum-Kafka Integration has reinstated support for LZMA/XZ compression for the Kafka Avro data format.
  • Greenplum Stream Server (GPSS) version 1.2.5 is included, which introduces these new features:
    • GPSS supports disabling external table reuse at the gpss service instance level. When you specify ReuseTables: false in the gpss.json configuration file, GPSS creates a new external table when any job submitted to the service instance is restarted.
    • GPSS supports configuring the gpfdist bind address via the new BindAddress property in the gpss.json configuration file. GPSS uses this property to identify the address where it binds the gpfdist port.

    Refer to the gpss.json reference page for more information about these new gpss configuration options.

  • Greenplum Database includes these new resource group features:
    • You no longer are required to specify a MEMORY_LIMIT when you configure a Greenplum Database resource group. When you specify MEMORY_LIMIT=0, Greenplum Database will use the resource group global shared memory pool to service queries running in the group.
    • When you specify MEMORY_SPILL_RATIO=0, Greenplum Database will now use the statement_mem server configuration parameter setting to identify the initial amount of query operator memory.

    When used together to configure a resource group (MEMORY_LIMIT=0 and MEMORY_SPILL_RATIO=0), these new capabilities provide a memory management scheme similar to that provided by Greenplum Database resource queues.

  • PXF version 5.5.1 is included, which introduces several new and changed features and bug fixes. See PXF Version 5.5.1 below.

PXF Version 5.5.1

New Features

PXF includes the following new features:

  • The new PXF JDBC Connector named query feature enables you to specify a statically-defined query to run against the remote SQL database. Refer to JDBC Named Query Configuration and About Using Named Queries.
  • PXF supports reading a multi-line text or JSON file stored in HDFS or an object store as a single row. This feature enables you to read a directory of files into a single external table. Refer to Reading a Multi-Line Text File into a Single Table Row.
  • You can configure PXF to auto-kill the server or dump the Java heap when the PXF JVM detects an out of memory condition. Refer to Configuring Out of Memory Condition Actions.
  • PXF supports per-server, per-Greenplum-user configuration. With this feature, you can configure different Greenplum Database users with different remote data store access credentials or properties in a single PXF server definition. Refer to Configuring a PXF User.
  • Enhancements to the PXF JDBC Connector include:
    • PXF has improved the INSERT performance of the JDBC Connector to an external SQL database.
    • Support for PXF user impersonation at the JDBC server level. With this feature, you can use the Connector to access external SQL databases as the current Greenplum Database user. Refer to About JDBC User Impersonation.
    • Improved support for accessing a Hive service via JDBC configured with and without Hive impersonation and utilizing Kerberos authentication. Refer to Configuring Hive Access.

Changed Features

PXF includes the following changes:

  • PXF now exposes the configuration of the maximum number of Tomcat threads via the $PXF_CONF/conf/pxf-env.sh user configuration file; changes to the default value will survive PXF upgrades. PXF also decreased the default maximum number of Tomcat threads from 300 to 200. Refer to the Another Option for Resource-Constrained PXF Segment Hosts section in the PXF troubleshooting topic for more information about this configuration procedure.

Resolved Issues

PXF includes the following resolved issue:

29916
The PXF JDBC Connector returned an error when you accessed a date field in an Oracle database, you specified a filter with a timestamp type cast, and the filter value included an hour outside of the range 0-11 inclusive.
This issue has been resolved. PXF now uses 24 hour timestamp format when accessing date fields in an Oracle database.

Changes in Beta-3

The Pivotal Greenplum 6 Beta-3 release includes several significant fixes and changes:
  • The on-disk representation of tablespaces has changed in the Beta-3 release. If you have a previously-installed Beta release, this change requires that you first drop all existing tablespaces before you upgrade to Beta-3. Recreate the tablespaces after upgrade to use the current on-disk represenatation. See Creating and Managing Tablespaces for information about dropping and creating tablespaces.
  • The gpseginstall utility is no longer included. You must install the Greenplum software RPM on each segment host, as described in Installing the Greenplum Database Software.
  • PXF version 5.3.2 is included, which introduces several new and changed features and bug fixes. See PXF Version 5.3.2 below.

See the Github Comparison Page for a full listing of pull requests that were merged into the Beta-3 release.

PXF Version 5.3.2

New Features

PXF 5.3.2 includes the following new features:

  • The PXF JDBC Connector now supports specifying per-connection, per-session, and per-statement properties via the JDBC server configuration. See JDBC Server Configuration for detailed information about setting and using these properties.
  • The PXF JDBC Connector also supports specifying the connection transaction isolation mode via the JDBC server configuration. Refer to Connection Transaction Isolation Property for information about this property.
  • The pxf cluster init and sync subcommands now also perform the operation on the Greenplum Database standby master host. With this new feature, PXF will continue to function after fail over from the Greenplum Database master host to the standby host.
  • PXF now supports the pxf cluster status subcommand. This command displays the status of the PXF service instance on each segment host in the Greenplum Database cluster. Refer to the pxf cluster reference page for more information.
  • PXF has improved memory usage by caching external file metadata per query within each PXF JVM. This caching is on by default. PXF Fragment Metadata Caching further describes this feature and its configuration procedure.

Changed Features

PXF 5.3.2 includes the following changes:

  • The PXF Hive Connector hive-site.xml template file now includes the hive.metastore.integral.jdo.pushdown property. This property is required to enable partition filter pushdown for Hive integral types. Refer to Partition Filter Pushdown for more information about the PXF Hive Connector's support for this feature.
  • The pxf cluster sync command no longer verifies that it is run on the Greenplum Database master host.
  • PXF has improved the pxf cluster sync command. The command now uses rsync to synchronize configuration from the master to the segments. Previously, the segment hosts were initiating the rsync operation.

Resolved Issues

PXF 5.3.2 includes the following resolved issues:

29896 - PXF
In some situations, the PXF JDBC Connector returned an out of memory error when you queried a PXF external table that referenced a large table in the external SQL database.
This issue has been resolved. The PXF JDBC Connector now supports a configurable read fetch size property. The availability and default value of this property mitigates potential memory issues.
165265354 - PXF
In some cases, when you performed a query with an OR or NOT predicate on a PXF external table that referenced a partitioned Hive table, the PXF Hive Connector may have incorrectly pruned partitions and returned an incomplete data set.
This issue has been resolved. The PXF Hive Connector now correctly handles OR predicates, and does not push down a predicate in which it encounters a relational or logical filter operator that it does not support. Note that the Connector does not currently optimize pushdown of NOT predicates.

Changes in Beta-2

The Pivotal Greenplum 6 Beta-2 release includes these changes:
  • PXF version 5.2.1 is included, which introduces several new and changed features and bug fixes. See PXF Version 5.2.1 below.
  • Recursive WITH Queries (Common Table Expressions) are no longer considered a Beta feature, and are now enabled by default. See WITH Queries (Common Table Expressions).
  • A new warning message is displayed when executing pg_create_physical_replication_slot() to indicate that the utility is not MPP-aware. In general, replication slots should only be created using Greenplum utilities such as gpaddmirrors and gprecoverseg.
  • The gpinitsystem option to specify the standby master data directory changed from -F to -S. The -S option no longer specifies spread mirroring. A new gpinitsystem option is introduced to specify the mirroring configuration: --mirror-mode={group|spread}.
  • The default value of the server configuration parameter log_rotation_size has changed from 0 to 1GB. This changes the default log rotation behavior so that a new log file is opened when more than 1GB has been written to the current log file or when the current log file has been open for 24 hours.
  • The gpcopy utility can use Kerberos authentication if the Greenplum Database system is Kerberos-enabled.

PXF Version 5.2.1

New Features

PXF 5.2.1 includes the following new features:

  • PXF now supports column projection. This enables connectors to project only the specific columns requested by the user. The JDBC, Parquet, and HiveORC profiles leverage this PXF feature to support column projection. See About Column Projection in PXF for information about PXF column projection support.
  • PXF now supports file-based server configuration for the JDBC Connector. When you initialize PXF, it copies a jdbc-site.xml template file to the $PXF_CONF/templates directory. You can use this template as a guide for PXF JDBC server configuration. Refer to Configuring the JDBC Connector for more information.
  • The PXF JDBC Connector now provides the QUOTE_COLUMNS option to control whether PXF should quote column names.
  • PXF exposes new configuration options that you can specify when you write Parquet data to HDFS or to an object store. These options allow you to set the row group size, page size, dictionary page size, and the Parquet version. Refer to the Parquet Custom Options topic for syntax and usage information.
  • You can specify the host name and/or port number of the PXF agent. The configuration procedure is described in Configuring the PXF Agent Host and Port.

Changed Features

PXF 5.2.1 includes the following changes:

  • PXF now bundles Hadoop 2.9.2 libraries.
  • PXF has increased the date precision to microseconds when reading Parquet files from an external system.
  • PXF now maps date and timestamp data types between Greenplum Database and Parquet as follows:
    • PXF previously serialized a repeated Greenplum Database date type to/from JSON as a numeric timestamp. PXF now serializes a repeated date type to/from JSON as a string.
    • When writing a Greenplum Database timestamp type field to Parquet, PXF previously converted the timestamp directly to an int96. PXF now localizes the timestamp to the current system timezone and converts it to universal time (UT) before finally converting to int96.

Resolved Issues

PXF 5.2.1 includes the following resolved issues:

29832
The PXF JDBC Connector in some cases did not propagate an INSERT error to the user. As a result, the operation appeared to succeed when it actually had not.
This issue has been resolved. The PXF JDBC Connector now correctly propagates the INSERT error to the user.
164473961
In some cases, a SELECT query on a PXF external table that referenced a Hive table incorrectly generated a Hive MetaStore connection error when the Hive service was running in a Kerberos-secured Hadoop cluster. This situation occurred because PXF did not instantiate the connect request with the Hive configuration settings.
This issue has been resolved. The PXF Hive Connector now properly includes the Hive configuration settings.

New Features

PostgreSQL Core Features

Pivotal Greenplum 6 incorporates several new features from PostgreSQL versions 8.4 through version 9.4.

INTERVAL Data Type Handling

PostgreSQL 8.4 improves the parsing of INTERVAL literals to align with SQL standards. This changes the output for queries that use INTERVAL labels between versions 5.x and 6.x. For example:
$ psql
psql (8.3.23)
Type "help" for help.

gpadmin=# select INTERVAL '1' YEAR;
 interval
----------
 00:00:00
(1 row)
```
``` sql
$ psql
psql (9.2beta2)
Type "help" for help.

gpadmin=# select INTERVAL '1' YEAR;
 interval
----------
 1 year
(1 row)

See Date/Time Types for more information.

Additional PostgreSQL Features

Greenplum Database 6.0 also includes these features and changes from PostgreSQL:
  • Support for user-defined I/O conversion casts. (PostgreSQL 8.4).
  • Support for column-level privileges (PostgreSQL 8.4).
  • The pg_db_role_setting catalog table, which provides support for setting server configuration parameters for a specific database and role combination (PostgreSQL 9.0).
  • Values in the relkind column of the pg_class catalog table were changed to match entries in PostgreSQL 9.3.
  • Support for GIN index method (PostgreSQL (8.3).
  • Support for jsonb data type (PostgreSQL 9.4).
  • DELETE, INSERT, and UPDATE supports the WITH clause, CTE (common table expression) (PostgreSQL 9.1).
  • Collation support to specify sort order and character classification behavior for data at the column level (PostgreSQL 9.1).
    Note: GPORCA supports collation only when all columns in the query use the same collation. If columns in the query use different collations, then Greenplum uses the Postgres Planner.

Zstandard Compression Algorithm

Greenplum Database 6.0 adds support for zstd (Zstandard) compression for some database operations. See Enabling Compression.

Relaxed Rules for Specifying Table Distribution Columns

In previous releases, if you specified both a UNIQUE constraint and a DISTRIBUTED BY clause in a CREATE TABLE statement, then the DISTRIBUTED BY clause was required to be equal to or a left-subset of the UNIQUE columns. Greenplum 6.x relaxes this rule so that any subset of the UNIQUE columns is accepted.

This change also affects the rules for how Greenplum 6.x selects a default distribution key. If gp_create_table_random_default_distribution is off (the default) and you do not include a DISTRIBUTED BY clause, then Greenplum chooses the table distribution key based on the command:
  • If a LIKE or INHERITS clause is specified, then Greenplum copies the distribution key from the source or parent table.
  • If a PRIMARY KEY or UNIQUE constraints are specified, then Greenplum chooses the largest subset of all the key columns as the distribution key.
  • If neither constraints nor a LIKE or INHERITS clause is specified, then Greenplum chooses the first suitable column as the distribution key. (Columns with geometric or user-defined data types are not eligible as Greenplum distribution key columns.)

PL/pgSQL Procedural Language Enhancements

PL/pgSQL in Greenplum Database 6.0 includes support for the following new features:

  • Attaching DETAIL and HINT text to user-thrown error messages. You can also specify the SQLSTATE and SQLERRMSG codes to return on a user-thrown error (PostgreSQL 8.4).
  • The RETURN QUERY EXECUTE statement, which specifies a query to execute dynamically (PostgreSQL 8.4).
  • Conditional execution using the CASE statement (PostgreSQL 8.4). See Conditionals in the PostgreSQL documentation.

Replicated Table Data

The CREATE TABLE command supports DISTRIBUTED REPLICATED as a distribution policy. If this distribution policy is specified, Greenplum Database distributes all rows of the table to all segment instances in the Greenplum Database system.
Note: The hidden system columns (ctid, cmin, cmax, xmin, xmax, and gp_segment_id) cannot be referenced in user queries on replicated tables because they have no single, unambiguous value. Greenplum Database returns a column does not exist error for the query.

Concurrency Improvements in Greenplum 6

Greenplum Database 6 includes the following concurrency improvements:

  • Global Deadlock Detector - Previous versions of Greenplum Database prevented global deadlock by holding exclusive table locks for UPDATE and DELETE operations. While this strategy did prevent deadlocks, it came at the cost of poor performance on concurrent updates. Greenplum Database 6 includes a global deadlock detector. This backend process collects and analyzes lock waiting data in the Greenplum cluster. If the Global Deadlock Detector determines that deadlock exists, it breaks the deadlock by cancelling one or more backend processes. By default, the global deadlock detector is disabled and table-level exclusive locks are held for table updates. When the global deadlock detector is enabled, Greenplum Database holds row-level exclusive locks and concurrent updates are allowed. See Global Deadlock Detector.
  • Transaction Lock Optimization - Greenplum Database 6 optimizes transaction lock usage both when you BEGIN and COMMIT a transaction. This benefits highly concurrent mixed workloads.
  • Upstream PostgreSQL Features - Greenplum 6 includes upstream PostgreSQL features, including those for fastpath lock, which reduce lock contention. This benefits concurrent short queries and mixed workloads.
  • VACUUM can more easily skip pages it cannot lock. This reduces the frequency of a vacuum appearing to be "stuck," which occurs when VACUUM waits to lock a block for cleanup and another session has held a lock on the block for a long time. Now VACUUM skips a block it cannot lock and retries the block later.
  • VACUUM rechecks block visibility after it has removed dead tuples. If all remaining tuples in the block are visible to current and future transactions, the block is marked as all-visible.
  • The tables that are part of a partitioned table hierarchy, but that do not contain data, are age-frozen so that they do not have to be vacuumed separately and do not affect calculation of the number of remaining transaction IDs before wraparound occurs. These tables include the root and intermediate tables in the partition heirarchy and, if they are append-optimized, their associated meta-data tables. This makes it unnecessary to vacuum the root partition to reduce the table's age, and eliminates the possibly needless vacuuming of all of the child tables.

Additional Contrib Modules

Greenplum Database 6 is distributed with these additional PostgreSQL and Greenplum contrib modules:

Additional Greenplum Database Features

Greenplum Database 6.0 also includes these features and changes from version 5.x:
  • VACUUM was updated to more easily skip pages that cannot be locked. This change should greatly reduce the incidence of VACUUM getting "stuck" while waiting for other sessions.
  • appendoptimized alias for the appendonly table storage option.
  • New gp_resgroup_status_per_host and gp_resgroup_status_per_segment gp_toolkit views to display resource group CPU and memory usage on a per-host and/or per-segment basis.

Beta Features

Because Pivotal Greenplum Database is based on the open source Greenplum Database project code, it includes several Beta features to allow interested developers to experiment with their use on development systems. Feedback will help drive development of these features, and they may become supported in future versions of the product.

Warning: Beta features are not recommended or supported for production deployments.
Key experimental features in Greenplum Database 6 include:
  • Storage plugin API for gpbackup and gprestore. Partners, customers, and OSS developers can develop plugins to use in conjunction with gpbackup and gprestore.

    For information about the storage plugin API, see Backup/Restore Storage Plugin API.

  • Using the Greenplum Platform Extension (PXF) connectors to write Parquet data is a Beta feature.

Changed Features

Greenplum Database 6 includes these feature changes:
  • The performance characteristics of Greenplum Database under heavy loads have changed in version 6 as compared to previous versions. In particular, you may notice increased I/O operations on primary segments for changes related to GPSS, WAL replication, and other features. All customers are encouraged to perform load testing with real-world data to ensure that the new Greenplum 6 cluster configuration meets their performance needs.\u0000
  • gpbackup and gprestore are no longer installed with Greenplum Database 6, but are available separately on Pivotal Network and can be upgraded separately from the core database installation.
  • Greenplum 6 uses a new jump consistent hash algorithm to map hashed data values to Greenplum segments. The new algorithm ensures that, after new segments are added to the Greenplum 6 cluster, only those rows that hash to the new segment need to be moved. Greenplum 6 hashing has performance characteristics similar to earlier Greenplum releases, but should enable faster database expansion. Note that the new algorithm is more CPU intensive than the previous algorithm, so COPY performance may degrade somewhat on CPU-bound systems.
  • The older, legacy hash functions are represented as non-default hash operator classes, named cdbhash_*_ops. The non-default operator classes are used when upgrading from Greenplum Database earlier than 6.0. The legacy operator classes are compatible with each other, but if you mix the legacy operator classes with the new ones, queries will require Redistribute Motions.

    The server configuration parameter gp_use_legacy_hashops controls whether the legacy or default hash functions are used when creating tables that are defined with a distribution column.

    The gp_distribution_policy system table now contains more information about Greenplum Database tables and the policy for distributing table data across the segments including the operator class of the distribution hash functions.

  • Greenplum uses direct dispatch to target queries that use IS NULL, similar to queries that filter on the table distribution key column(s).
  • In the pg_stat_activity and pg_stat_replication system views, the procpid column was renamed to pid to match the associagted change in PostgreSQL 9.2.
  • In the pg_proc system table, the proiswin column was renamed to proiswindow and relocated in the table to match the pg_proc system table in PostgreSQL 8.4.
  • Queries that use SELECT DISTINCT and UNION/INTERSECT/EXCEPT no longer necessarily return sorted output. Previously these queries always removed duplicate rows by using Sort/Unique processing. They now implement hashing to conform to behavior introduced in PostgreSQL 8.4; this method does not produce sorted output. If your application requires sorted output for these queries, alter the queries to use an explicit ORDER BY clause. Note that SELECT DISTINCT ON never uses hashing, so its behavior is unchanged from previous versions.
  • The pg_database system table datconfig column was removed. Greenplum Database now uses the pg_db_role_setting system table to keep track of per-database and per-role server configuration settings (PostgreSQL 9.0).
  • The pg_authid system table rolconfig column was removed. Greenplum Database now uses the pg_db_role_setting system table to keep track of per-database and per-role server configuration settings (PostgreSQL 9.0).
  • When creating and altering a table that has a distribution column, you can now specify the hash function used to distribute data across segment instances.
  • Pivotal Greenplum Database 6 removes the RECHECK option from ALTER OPERATOR FAMILY and CREATE OPERATOR CLASS DDL (PostgreSQL 8.4). Greenplum now determines whether an index operator is "lossy" on-the-fly at runtime.
  • Operator-related system catalog tables are modified to support operator families, compatibilty, and types (ordering or search).
  • System catalog table entries for HyperLogLog (HLL) functions, aggregates, and types are modified to prefix names with gp_. Renaming the HLL functions prevents name collisions with external Greenplum Database extensions that use HLL. Any user code written to use the built-in Greenplum Database HLL functions must be updated to use the new gp_ names.
  • The "legacy optimizer" from previous releases of Greenplum is now referred to as the Postgres optimizer in both the code and documentation.
  • The transaction isolation levels in Greenplum Database 6.0 are changed to align with PostgreSQL transaction isolation levels since the introduction of the serializable snapshot isolation (SSI) mode in PostgreSQL 9.1. The new SSI mode, which is not implemented in Greenplum Database, provides true serializability by monitoring concurrent transactions and rolling back transactions that could introduce a serialization anomaly. The existing snapshot isolation (SI) mode guarantees that transactions operate on a single, consistent snapshot of the database, but does not guarantee a consistent result when a set of concurrent transactions is executed in any given sequence.

    Greenplum Database 6.0 now allows the REPEATABLE READ keywords with SQL statements such as BEGIN and SET TRANSACTION. A SERIALIZABLE transaction in PostgreSQL 9.1 or later uses the new SSI mode. A SERIALIZABLE transaction in Greenplum Database 6.0 falls back to REPEATABLE READ, using the SI mode. The following table shows the SQL standard compliance for each transaction isolation level in Greenplum Database 6.0 and PostgreSQL 9.1.

    Table 1. Transaction Level Compliance with SQL Standard
    Requested Transaction Isolation Level Greenplum Database 6.0 Compliance PostgreSQL 9.1 Compliance
    READ UNCOMMITTED READ COMMITTED READ COMMITTED
    READ COMMITTED READ COMMITTED READ COMMITTED
    REPEATABLE READ REPEATABLE READ (SI) REPEATABLE READ (SI)
    SERIALIZABLE Falls back to REPEATABLE READ (SI) SERIALIZABLE (SSI)
  • The CREATE TABLESPACE command has changed.
    • The command no longer requires a filespace created with the gpfilespace utility.
    • The FILESPACE clause has been removed.
    • The WITH clause has been added to allow specifying a tablespace location for a specific segment instance.
  • The ALTER SEQUENCE SQL command has new clauses START [WITH] start and OWNER TO new_owner (PostgreSQL 8.4). The START clause sets the start value that will be used by future ALTER SEQUENCE RESTART commands, but does not change the current value of the sequence. The OWNER TO clause changes the sequence's owner.
  • The ALTER TABLE SQL command has a SET WITH OIDS clause to add an oid system column to a table (PostgreSQL 8.4). Note that using oids with Greenplum Database tables is strongly discouraged.
  • The CREATE DATABASE SQL command has new parameters LC_COLLATE and LC_CTYPE to specify the collation order and character classification for the new database.
  • The CREATE FUNCTION SQL command has a new keyword WINDOW, which indicates that the function is a window function rather than a plain function (PostgreSQL 8.4).
  • Specifying the index name in the CREATE INDEX SQL command is now optional. Greenplum Database constructs a default index name from the table name and indexed columns.
  • In the CREATE TABLE command, the Greenplum Database parser allows commas to be placed between a SUBPARTITION TEMPLATE clause and its cooresponding SUBPARTITION BY clause, and between consecutive SUBPARTITION BY clauses. These undocumented commas are deprecated and will generate a deprecation warning message.
  • Superuser privileges are now required to create a protocol. See CREATE PROTOCOL.
  • The CREATE TYPE SQL command has a new LIKE=type clause that copies the new type's representation (INTERNALLENGTH, PASSEDBYVALUE, ALIGNMENT, and STORAGE) from an existing type (PostgreSQL 8.4).
  • The GRANT SQL command has new syntax to grant privileges on truncate, foreign data wrappers, and foreign data servers (PostgreSQL 8.4).
  • The LOCK SQL command has an optional ONLY keyword (PostgreSQL 8.4). When specified, the table is locked without locking any tables that inherit from it.
  • Using the LOCK table statement outside of a transaction raises an error in Greenplum Database 6.0. In earlier releases, the statement executed, although it is only useful when executed inside of a transaction.
  • The SELECT and VALUES SQL commands support the SQL 2008 OFFSET and FETCH syntax (PostgreSQL 8.4). These clauses provide an alternative syntax for limiting the results returned by a query.
  • The FROM clause can be omitted from a SELECT command, but Greenplum Database no longer allows queries that omit the FROM clause and also reference database tables.
  • The ROWS and RANGE SQL keywords have changed from reserved to unreserved, and may be used as table or column names without quoting.
  • In Greenplum 6, a query on an external table with descendants will by default recurse into the descendant tables. This is a change from previous Greenplum Database versions, which never recursed into descendants. To get the previous behavior in Greenplum 6, you must include the ONLY keyword in the query to restrict the query to the parent table.
  • The TRUNCATE SQL command has an optional ONLY keyword (PostgreSQL 8.4). When specified, the table is truncated without truncating any tables that inherit from it.
  • The createdb command-line utility has new options -l (--locale), --lc-collate, and --lc-ctype to specify the locale and character classification for the database (PostgreSQL 8.4).
  • The pg_dump, pg_dumpall, and pg_restore utilities have a new --role=rolename option that instructs the utility to execute SET ROLE rolename after connecting to the database and before starting the dump or restore operation (PostgreSQL 8.4).
  • The pg_dump and pg_dumpall command-line utilities have a new option --lock-wait-timeout=timeout (PostgreSQL 8.4). When specified, instead of waiting indefinitely the dump fails if the utility cannot acquire shared table locks within the specified number of milliseconds.
  • The -d and -D command-line options are removed from the pg_dump and pg_dumpall utilities. The corresponding long versions, --inserts and --column-inserts are still supported. A new --binary-upgrade option is added, for use by in-place upgrade utilities.
  • The -w (--no-password) option was added to the pg_dump, pg_dumpall, and pg_restore utilities.
  • The -D option is removed from the gpexpand utility. The expansion schema will be created in the postgres database.
  • The gpstate utility has a new -x option, which displays details of an in-progress system expansion. gpstate -s and gpstate with no options specified also report if a system expansion is in progress.
  • The pg_restore utility has a new option -j (--number-of-jobs) parameter. This option can reduce time to restore a large database by running tasks such as loading data, creating indexes, and creating constraints concurrently.
  • The vacuumdb utility has a new -F (--freeze) option to freeze row transaction information.
  • ALTER DATABASE includes the SET TABLESPACE clause to change the default tablespace.
  • CREATE DATABASE includes the COLLATE and CTYPE options for setting the collation order and character classification of the new database.
  • The server configuration parameter gp_workfile_compress_algorithm has been changed to gp_workfile_compression. When workfile compression is enabled, Greenplum Database uses Zstandard compression.
  • The Oracle Compatibility Functions are now available in Greenplum Database as an extension, based on the PostgreSQL orafce project at https://github.com/orafce/orafce. Instead of executing a SQL script to install the compatibility functions in a database, you now execute the SQL command CREATE EXTENSION orafce. The Greenplum Database 6.0 orafce extension is based on the orafce 3.7 release. See Oracle Compatibility Functions for information about differences between the Greenplum Database compatibility functions and the PostgreSQL orafce extension.
  • Greenplum Database 6 supports specifying a table column of the citext data type as a distribution key.
  • Greenplum Database 6 provides a single client and loader tool package that you can download and install on a client system. Previous Greenplum releases provided separate client and loader packages. For more information about the Greenplum 6 Clients package, refer to Client Tools in the supported platforms documentation.
  • Greenplum Database 6 includes both PostgreSQL-sourced and Greenplum- sourced contrib modules. Most of these modules are now packaged as extensions, and you register an extension in Greenplum with the CREATE EXTENSION name command. Refer to Installing Additional Supplied Modules for more information about registering contrib modules in Greenplum Database 6.

Removed and Deprecated Features

Pivotal Greenplum Database 6 removes these features:
  • The gptransfer utility is no longer included; use gpcopy for all functionality that was provided with gptransfer.
  • The gp_fault_strategy system table is no longer used. Greenplum Database now uses the gp_segment_configuration system table to determine if mirroring is enabled.
  • Pivotal Greenplum Database 6 removes the gpcrondump, gpdbrestore, and gpmfr management utilities. Use gpbackup and gprestore to back up and restore Greenplum Database.
  • Pivotal Greenplum Database 6 no longer supports Veritas NetBackup.
  • Pivotal Greenplum Database 6 no longer supports the use of direct I/O to bypass the buffering of memory within the file system cache for backup.
  • Pivotal Greenplum Database 6 no longer supports the gphdfs external table protocol to access a Hadoop system. Use the Greenplum Platform Extension Framework (PXF) to access Hadoop in version 6. Refer to pxf:// Protocol for information about using the pxf external table protocol.
  • Pivotal Greenplum Database 6 no longer supports SSLv3.
  • Pivotal Greenplum Database 6 removes the following server configuration parameters:
    • gp_analyze_relative_error
    • gp_backup_directIO
    • gp_backup_directIO_read_chunk_mb
    • gp_connections_per_thread
    • gp_enable_sequential_window_plans
    • gp_idf_deduplicate
    • gp_snmp_community
    • gp_snmp_monitor_address
    • gp_snmp_use_inform_or_trap
    • gp_workfile_checksumming
  • The undocumented gp_cancel_query() function, and the configuration parameters gp_cancel_query_print_log and gp_cancel_query_delay_time, are removed in Greenplum Database 6.
  • Pivotal Greenplum Database 6 no longer supports the ability to configure a Greenplum Database system to trigger SNMP (Simple Network Management Protocol) alerts or send email notifications to system administrators if certain database events occur. Use Pivotal Greenplum Command Center alerts to detect and respond to events that occur in a Greenplum system.
  • Pivotal Greenplum Database 6 removes the gpfilespace utility. The CREATE TABLESPACE command no longer requires a filespace created with the utility.
  • The Greenplum-Kafka Integration has removed support of LZMA/XZ compression for the Kafka Avro data format. The Greenplum-Kafka Integration continues to support libz- and snappy-compressed Avro data from Kafka.
Pivotal Greenplum Database 6 deprecates these features:
  • The server configuration parameter gp_ignore_error_table is deprecated and will be removed in the next major release.

    You can set this value of this parameter to true to avoid the Greenplum Database error when you run applications that execute CREATE EXTERNAL TABLE or COPY commands that include the Greenplum Database 4.3.x INTO ERROR TABLE clause.

    When this parameter is removed, Greenplum Database always returns an error when a CREATE EXTERNAL TABLE or COPY command contains the INTO ERROR TABLE clause.

  • Specifying => as an operator name in the CREATE OPERATOR command is deprecated.
  • The Greenplum external table C API is deprecated. Any developers using this API are encouraged to use the new Foreign Data Wrapper API in its place.

Differences Compared to Open Source Greenplum Database

Pivotal Greenplum 6.x includes all of the functionality in the open source Greenplum Database project and adds:
  • Product packaging and installation script.
  • Support for data connectors:
    • Greenplum-Spark Connector
    • Greenplum-Informatica Connector
    • Greenplum-Kafka Integration
    • Gemfire-Greenplum Connector
    • Greenplum Stream Server
  • Data Direct ODBC/JDBC Drivers
  • gpcopy utility for copying or migrating objects between Greenplum systems.
  • Support for managing Greenplum Database using Pivotal Greenplum Command Center.
  • Support for full text search and text analysis using Pivotal GPText.

Upgrading

Note: This Beta release contains system catalog updates. Perform a new install to use Greenplum 6 Beta-4, as upgrading from previous Beta releases is not possible.

Known Issues and Limitations

Pivotal Greenplum 6 Beta has these limitations:

  • Upgrading a Greenplum Database 4 or 5 release to Pivotal Greenplum 6 is not supported. See Upgrading.
  • gpcopy cannot yet copy data from Greenplum 4 or 5 to Greenplum 6.
  • Greenplum 6 Beta is not provided for installation on DCA systems.
  • The Windows Client connectivity package is not provided with this Beta release.
  • Greenplum for Kubernetes is not provided with this Beta.
  • The PostGIS and MADlib extension packages are not provided with this Beta release.

The following table lists key known issues in Pivotal Greenplum 6.x.

Table 2. Key Known Issues in Pivotal Greenplum 6.x
Issue Category Description
164791118 PL/R PL/R cannot be installed using the deprecated createlang utility, and displays the error:
createlang: language installation failed: ERROR:  no schema has been selected to create in
Workaround: Use CREATE EXTENSION to install PL/R, as described in the documentation.