Pivotal Greenplum 6.0.0 Beta Release Notes

Pivotal Greenplum 6.0.0 Beta Release Notes

Updated: May 8, 2019

Welcome to Pivotal Greenplum 6

Pivotal Greenplum Database is a massively parallel processing (MPP) database server that supports next generation data warehousing and large-scale analytics processing. By automatically partitioning data and running parallel queries, it allows a cluster of servers to operate as a single database supercomputer performing tens or hundreds times faster than a traditional database. It supports SQL, MapReduce parallel processing, and data volumes ranging from hundreds of gigabytes, to hundreds of terabytes.

Note: This document contains pertinent release information about Pivotal Greenplum Database 6.0.0. For previous versions of the release notes for Greenplum Database, go to Pivotal Greenplum Database Documentation. For information about Greenplum Database end of life, see Pivotal Greenplum Database end of life policy.

Pivotal Greenplum 6 software is available for download from the Pivotal Greenplum page on Pivotal Network.

Pivotal Greenplum 6 is based on the open source Greenplum Database project code.

Important: Pivotal Support does not provide support for open source versions of Greenplum Database. Only Pivotal Greenplum Database is supported by Pivotal Support.

Changes in Beta-3

The Pivotal Greenplum 6 Beta-3 release includes several significant fixes and changes:
  • The on-disk representation of tablespaces has changed in the Beta-3 release. If you have a previously-installed Beta release, this change requires that you first drop all existing tablespaces before you upgrade to Beta-3. Recreate the tablespaces after upgrade to use the current on-disk represenatation. See Creating and Managing Tablespaces for information about dropping and creating tablespaces.
  • The gpseginstall utility is no longer included. You must install the Greenplum software RPM on each segment host, as described in Installing the Greenplum Database Software.
  • PXF version 5.3.2 is included, which introduces several new and changed features and bug fixes. See PXF Version 5.3.2 below.

See the Github Comparison Page for a full listing of pull requests that were merged into the Beta-3 release.

PXF Version 5.3.2

New Features

PXF 5.3.2 includes the following new features:

  • The PXF JDBC Connector now supports specifying per-connection, per-session, and per-statement properties via the JDBC server configuration. See JDBC Server Configuration for detailed information about setting and using these properties.
  • The PXF JDBC Connector also supports specifying the connection transaction isolation mode via the JDBC server configuration. Refer to Connection Transaction Isolation Property for information about this property.
  • The pxf cluster init and sync subcommands now also perform the operation on the Greenplum Database standby master host. With this new feature, PXF will continue to function after fail over from the Greenplum Database master host to the standby host.
  • PXF now supports the pxf cluster status subcommand. This command displays the status of the PXF service instance on each segment host in the Greenplum Database cluster. Refer to the pxf cluster reference page for more information.
  • PXF has improved memory usage by caching external file metadata per query within each PXF JVM. This caching is on by default. PXF Fragment Metadata Caching further describes this feature and its configuration procedure.

Changed Features

PXF 5.3.2 includes the following changes:

  • The PXF Hive Connector hive-site.xml template file now includes the hive.metastore.integral.jdo.pushdown property. This property is required to enable partition filter pushdown for Hive integral types. Refer to Partition Filter Pushdown for more information about the PXF Hive Connector's support for this feature.
  • The pxf cluster sync command no longer verifies that it is run on the Greenplum Database master host.
  • PXF has improved the pxf cluster sync command. The command now uses rsync to synchronize configuration from the master to the segments. Previously, the segment hosts were initiating the rsync operation.

Resolved Issues

PXF 5.3.2 includes the following resolved issues:

29896 - PXF
In some situations, the PXF JDBC Connector returned an out of memory error when you queried a PXF external table that referenced a large table in the external SQL database.
This issue has been resolved. The PXF JDBC Connector now supports a configurable read fetch size property. The availability and default value of this property mitigates potential memory issues.
165265354 - PXF
In some cases, when you performed a query with an OR or NOT predicate on a PXF external table that referenced a partitioned Hive table, the PXF Hive Connector may have incorrectly pruned partitions and returned an incomplete data set.
This issue has been resolved. The PXF Hive Connector now correctly handles OR predicates, and does not push down a predicate in which it encounters a relational or logical filter operator that it does not support. Note that the Connector does not currently optimize pushdown of NOT predicates.

Changes in Beta-2

The Pivotal Greenplum 6 Beta-2 release includes these changes:
  • PXF version 5.2.1 is included, which introduces several new and changed features and bug fixes. See PXF Version 5.2.1 below.
  • Recursive WITH Queries (Common Table Expressions) are no longer considered a Beta feature, and are now enabled by default. See WITH Queries (Common Table Expressions).
  • A new warning message is displayed when executing pg_create_physical_replication_slot() to indicate that the utility is not MPP-aware. In general, replication slots should only be created using Greenplum utilities such as gpaddmirrors and gprecoverseg.
  • The gpinitsystem option to specify the standby master data directory changed from -F to -S. The -S option no longer specifies spread mirroring. A new gpinitsystem option is introduced to specify the mirroring configuration: --mirror-mode={group|spread}.
  • The default value of the server configuration parameter log_rotation_size has changed from 0 to 1GB. This changes the default log rotation behavior so that a new log file is opened when more than 1GB has been written to the current log file or when the current log file has been open for 24 hours.
  • The gpcopy utility can use Kerberos authentication if the Greenplum Database system is Kerberos-enabled.

PXF Version 5.2.1

New Features

PXF 5.2.1 includes the following new features:

  • PXF now supports column projection. This enables connectors to project only the specific columns requested by the user. The JDBC, Parquet, and HiveORC profiles leverage this PXF feature to support column projection. See About Column Projection in PXF for information about PXF column projection support.
  • PXF now supports file-based server configuration for the JDBC Connector. When you initialize PXF, it copies a jdbc-site.xml template file to the $PXF_CONF/templates directory. You can use this template as a guide for PXF JDBC server configuration. Refer to Configuring the JDBC Connector for more information.
  • The PXF JDBC Connector now provides the QUOTE_COLUMNS option to control whether PXF should quote column names.
  • PXF exposes new configuration options that you can specify when you write Parquet data to HDFS or to an object store. These options allow you to set the row group size, page size, dictionary page size, and the Parquet version. Refer to the Parquet Custom Options topic for syntax and usage information.
  • You can specify the host name and/or port number of the PXF agent. The configuration procedure is described in Configuring the PXF Agent Host and Port.

Changed Features

PXF 5.2.1 includes the following changes:

  • PXF now bundles Hadoop 2.9.2 libraries.
  • PXF has increased the date precision to microseconds when reading Parquet files from an external system.
  • PXF now maps date and timestamp data types between Greenplum Database and Parquet as follows:
    • PXF previously serialized a repeated Greenplum Database date type to/from JSON as a numeric timestamp. PXF now serializes a repeated date type to/from JSON as a string.
    • When writing a Greenplum Database timestamp type field to Parquet, PXF previously converted the timestamp directly to an int96. PXF now localizes the timestamp to the current system timezone and converts it to universal time (UT) before finally converting to int96.

Resolved Issues

PXF 5.2.1 includes the following resolved issues:

29832
The PXF JDBC Connector in some cases did not propagate an INSERT error to the user. As a result, the operation appeared to succeed when it actually had not.
This issue has been resolved. The PXF JDBC Connector now correctly propagates the INSERT error to the user.
164473961
In some cases, a SELECT query on a PXF external table that referenced a Hive table incorrectly generated a Hive MetaStore connection error when the Hive service was running in a Kerberos-secured Hadoop cluster. This situation occurred because PXF did not instantiate the connect request with the Hive configuration settings.
This issue has been resolved. The PXF Hive Connector now properly includes the Hive configuration settings.

New Features

PostgreSQL Core Features

Pivotal Greenplum 6 incorporates several new features from PostgreSQL versions 8.4 through version 9.4.

INTERVAL Data Type Handling

PostgreSQL 8.4 improves the parsing of INTERVAL literals to align with SQL standards. This changes the output for queries that use INTERVAL labels between versions 5.x and 6.x. For example:
$ psql
psql (8.3.23)
Type "help" for help.

gpadmin=# select INTERVAL '1' YEAR;
 interval
----------
 00:00:00
(1 row)
```
``` sql
$ psql
psql (9.2beta2)
Type "help" for help.

gpadmin=# select INTERVAL '1' YEAR;
 interval
----------
 1 year
(1 row)

See Date/Time Types for more information.

Additional PostgreSQL Features

Greenplum Database 6.0 also includes these features and changes from PostgreSQL:
  • Support for user-defined I/O conversion casts. (PostgreSQL 8.4).
  • Support for column-level privileges (PostgreSQL 8.4).
  • The pg_db_role_setting catalog table, which provides support for setting server configuration parameters for a specific database and role combination (PostgreSQL 9.0).
  • Values in the relkind column of the pg_class catalog table were changed to match entries in PostgreSQL 9.3.
  • Support for GIN index method (PostgreSQL (8.3).
  • Support for jsonb data type (PostgreSQL 9.4).
  • DELETE, INSERT, and UPDATE supports the WITH clause, CTE (common table expression) (PostgreSQL 9.1).
  • Collation support to specify sort order and character classification behavior for data at the column level (PostgreSQL 9.1).
    Note: GPORCA supports collation only when all columns in the query use the same collation. If columns in the query use different collations, then Greenplum uses the Postgres query planner.

Zstandard Compression Algorithm

Greenplum Database 6.0 adds support for zstd (Zstandard) compression for some database operations. See Enabling Compression.

Relaxed Rules for Specifying Table Distribution Columns

In previous releases, if you specified both a UNIQUE constraint and a DISTRIBUTED BY clause in a CREATE TABLE statement, then the DISTRIBUTED BY clause was required to be equal to or a left-subset of the UNIQUE columns. Greenplum 6.x relaxes this rule so that any subset of the UNIQUE columns is accepted.

This change also affects the rules for how Greenplum 6.x selects a default distribution key. If gp_create_table_random_default_distribution is off (the default) and you do not include a DISTRIBUTED BY clause, then Greenplum chooses the table distribution key based on the command:
  • If a LIKE or INHERITS clause is specified, then Greenplum copies the distribution key from the source or parent table.
  • If a PRIMARY KEY or UNIQUE constraints are specified, then Greenplum chooses the largest subset of all the key columns as the distribution key.
  • If neither constraints nor a LIKE or INHERITS clause is specified, then Greenplum chooses the first suitable column as the distribution key. (Columns with geometric or user-defined data types are not eligible as Greenplum distribution key columns.)

PL/pgSQL Procedural Language Enhancements

PL/pgSQL in Greenplum Database 6.0 includes support for the following new features:

  • Attaching DETAIL and HINT text to user-thrown error messages. You can also specify the SQLSTATE and SQLERRMSG codes to return on a user-thrown error (PostgreSQL 8.4).
  • The RETURN QUERY EXECUTE statement, which specifies a query to execute dynamically (PostgreSQL 8.4).
  • Conditional execution using the CASE statement (PostgreSQL 8.4). See Conditionals in the PostgreSQL documentation.

Replicated Table Data

The CREATE TABLE command supports DISTRIBUTED REPLICATED as a distribution policy. If this distribution policy is specified, Greenplum Database distributes all rows of the table to all segment instances in the Greenplum Database system.

Concurrency Improvements in Greenplum 6

Greenplum Database 6 includes the following concurrency improvements:

  • Global Deadlock Detector - Previous versions of Greenplum Database prevented global deadlock by holding exclusive table locks for UPDATE and DELETE operations. While this strategy did prevent deadlocks, it came at the cost of poor performance on concurrent updates. Greenplum Database 6 includes a global deadlock detector. This backend process collects and analyzes lock waiting data in the Greenplum cluster. If the Global Deadlock Detector determines that deadlock exists, it breaks the deadlock by cancelling one or more backend processes. By default, the global deadlock detector is disabled and table-level exclusive locks are held for table updates. When the global deadlock detector is enabled, Greenplum Database holds row-level exclusive locks and concurrent updates are allowed. See Global Deadlock Detector.
  • Transaction Lock Optimization - Greenplum Database 6 optimizes transaction lock usage both when you BEGIN and COMMIT a transaction. This benefits highly concurrent mixed workloads.
  • Upstream PostgreSQL Features - Greenplum 6 includes upstream PostgreSQL features, including those for fastpath lock, which reduce lock contention. This benefits concurrent short queries and mixed workloads.
  • VACUUM can more easily skip pages it cannot lock. This reduces the frequency of a vacuum appearing to be "stuck," which occurs when VACUUM waits to lock a block for cleanup and another session has held a lock on the block for a long time. Now VACUUM skips a block it cannot lock and retries the block later.
  • VACUUM rechecks block visibility after it has removed dead tuples. If all remaining tuples in the block are visible to current and future transactions, the block is marked as all-visible.
  • The tables that are part of a partitioned table hierarchy, but that do not contain data, are age-frozen so that they do not have to be vacuumed separately and do not affect calculation of the number of remaining transaction IDs before wraparound occurs. These tables include the root and intermediate tables in the partition heirarchy and, if they are append-optimized, their associated meta-data tables. This makes it unnecessary to vacuum the root partition to reduce the table's age, and eliminates the possibly needless vacuuming of all of the child tables.

Additional Greenplum Database Features

Greenplum Database 6.0 also includes these features and changes from version 5.x:
  • VACUUM was updated to more easily skip pages that cannot be locked. This change should greatly reduce the incidence of VACUUM getting "stuck" while waiting for other sessions.
  • appendoptimized alias for the appendonly table storage option.
  • New gp_resgroup_status_per_host and gp_resgroup_status_per_segment gp_toolkit views to display resource group CPU and memory usage on a per-host and/or per-segment basis.

Beta Features

Because Pivotal Greenplum Database is based on the open source Greenplum Database project code, it includes several Beta features to allow interested developers to experiment with their use on development systems. Feedback will help drive development of these features, and they may become supported in future versions of the product.

Warning: Beta features are not recommended or supported for production deployments.
Key experimental features in Greenplum Database 6 include:
  • Storage plugin API for gpbackup and gprestore. Partners, customers, and OSS developers can develop plugins to use in conjunction with gpbackup and gprestore.

    For information about the storage plugin API, see Backup/Restore Storage Plugin API.

  • Using the Greenplum Platform Extension (PXF) connectors to write Parquet data is a Beta feature.

Changed Features

Greenplum Database 6 includes these feature changes:
  • The performance characteristics of Greenplum Database under heavy loads have changed in version 6 as compared to previous versions. In particular, you may notice increased I/O operations on primary segments for changes related to GPSS, WAL replication, and other features. All customers are encouraged to perform load testing with real-world data to ensure that the new Greenplum 6 cluster configuration meets their performance needs.\u0000
  • gpbackup and gprestore are no longer installed with Greenplum Database 6, but are available separately on Pivotal Network and can be upgraded separately from the core database installation.
  • Greenplum 6 uses a new jump consistent hash algorithm to map hashed data values to Greenplum segments. The new algorithm ensures that, after new segments are added to the Greenplum 6 cluster, only those rows that hash to the new segment need to be moved. Greenplum 6 hashing has performance characteristics similar to earlier Greenplum releases, but should enable faster database expansion. Note that the new algorithm is more CPU intensive than the previous algorithm, so COPY performance may degrade somewhat on CPU-bound systems.
  • The older, legacy hash functions are represented as non-default hash operator classes, named cdbhash_*_ops. The non-default operator classes are used when upgrading from Greenplum Database earlier than 6.0. The legacy operator classes are compatible with each other, but if you mix the legacy operator classes with the new ones, queries will require Redistribute Motions.

    The server configuration parameter gp_use_legacy_hashops controls whether the legacy or default hash functions are used when creating tables that are defined with a distribution column.

    The gp_distribution_policy system table now contains more information about Greenplum Database tables and the policy for distributing table data across the segments including the operator class of the distribution hash functions.

  • Greenplum uses direct dispatch to target queries that use IS NULL, similar to queries that filter on the table distribution key column(s).
  • In the pg_proc system table, the proiswin column was renamed to proiswindow and relocated in the table to match the pg_proc system table in PostgreSQL 8.4.
  • Queries that use SELECT DISTINCT and UNION/INTERSECT/EXCEPT no longer necessarily return sorted output. Previously these queries always removed duplicate rows by using Sort/Unique processing. They now implement hashing to conform to behavior introduced in PostgreSQL 8.4; this method does not produce sorted output. If your application requires sorted output for these queries, alter the queries to use an explicit ORDER BY clause. Note that SELECT DISTINCT ON never uses hashing, so its behavior is unchanged from previous versions.
  • The pg_database system table datconfig column was removed. Greenplum Database now uses the pg_db_role_setting system table to keep track of per-database and per-role server configuration settings (PostgreSQL 9.0).
  • The pg_authid system table rolconfig column was removed. Greenplum Database now uses the pg_db_role_setting system table to keep track of per-database and per-role server configuration settings (PostgreSQL 9.0).
  • When creating and altering a table that has a distribution column, you can now specify the hash function used to distribute data across segment instances.
  • Pivotal Greenplum Database 6 removes the RECHECK option from ALTER OPERATOR FAMILY and CREATE OPERATOR CLASS DDL (PostgreSQL 8.4). Greenplum now determines whether an index operator is "lossy" on-the-fly at runtime.
  • Operator-related system catalog tables are modified to support operator families, compatibilty, and types (ordering or search).
  • System catalog table entries for HyperLogLog (HLL) functions, aggregates, and types are modified to prefix names with gp_. Renaming the HLL functions prevents name collisions with external Greenplum Database extensions that use HLL. Any user code written to use the built-in Greenplum Database HLL functions must be updated to use the new gp_ names.
  • The "legacy optimizer" from previous releases of Greenplum is now referred to as the Postgres optimizer in both the code and documentation.
  • The transaction isolation levels in Greenplum Database 6.0 are changed to align with PostgreSQL transaction isolation levels since the introduction of the serializable snapshot isolation (SSI) mode in PostgreSQL 9.1. The new SSI mode, which is not implemented in Greenplum Database, provides true serializability by monitoring concurrent transactions and rolling back transactions that could introduce a serialization anomaly. The existing snapshot isolation (SI) mode guarantees that transactions operate on a single, consistent snapshot of the database, but does not guarantee a consistent result when a set of concurrent transactions is executed in any given sequence.

    Greenplum Database 6.0 now allows the REPEATABLE READ keywords with SQL statements such as BEGIN and SET TRANSACTION. A SERIALIZABLE transaction in PostgreSQL 9.1 or later uses the new SSI mode. A SERIALIZABLE transaction in Greenplum Database 6.0 falls back to REPEATABLE READ, using the SI mode. The following table shows the SQL standard compliance for each transaction isolation level in Greenplum Database 6.0 and PostgreSQL 9.1.

    Table 1. Transaction Level Compliance with SQL Standard
    Requested Transaction Isolation Level Greenplum Database 6.0 Compliance PostgreSQL 9.1 Compliance
    READ UNCOMMITTED READ COMMITTED READ COMMITTED
    READ COMMITTED READ COMMITTED READ COMMITTED
    REPEATABLE READ REPEATABLE READ (SI) REPEATABLE READ (SI)
    SERIALIZABLE Falls back to REPEATABLE READ (SI) SERIALIZABLE (SSI)
  • The CREATE TABLESPACE command has changed.
    • The command no longer requires a filespace created with the gpfilespace utility.
    • The FILESPACE clause has been removed.
    • The WITH clause has been added to allow specifying a tablespace location for a specific segment instance.
  • The ALTER SEQUENCE SQL command has new clauses START [WITH] start and OWNER TO new_owner (PostgreSQL 8.4). The START clause sets the start value that will be used by future ALTER SEQUENCE RESTART commands, but does not change the current value of the sequence. The OWNER TO clause changes the sequence's owner.
  • The ALTER TABLE SQL command has a SET WITH OIDS clause to add an oid system column to a table (PostgreSQL 8.4). Note that using oids with Greenplum Database tables is strongly discouraged.
  • The CREATE DATABASE SQL command has new parameters LC_COLLATE and LC_CTYPE to specify the collation order and character classification for the new database.
  • The CREATE FUNCTION SQL command has a new keyword WINDOW, which indicates that the function is a window function rather than a plain function (PostgreSQL 8.4).
  • Specifying the index name in the CREATE INDEX SQL command is now optional. Greenplum Database constructs a default index name from the table name and indexed columns.
  • In the CREATE TABLE command, the Greenplum Database parser allows commas to be placed between a SUBPARTITION TEMPLATE clause and its cooresponding SUBPARTITION BY clause, and between consecutive SUBPARTITION BY clauses. These undocumented commas are deprecated and will generate a deprecation warning message.
  • Superuser privileges are now required to create a protocol. See CREATE PROTOCOL.
  • The CREATE TYPE SQL command has a new LIKE=type clause that copies the new type's representation (INTERNALLENGTH, PASSEDBYVALUE, ALIGNMENT, and STORAGE) from an existing type (PostgreSQL 8.4).
  • The GRANT SQL command has new syntax to grant privileges on truncate, foreign data wrappers, and foreign data servers (PostgreSQL 8.4).
  • The LOCK SQL command has an optional ONLY keyword (PostgreSQL 8.4). When specified, the table is locked without locking any tables that inherit from it.
  • Using the LOCK table statement outside of a transaction raises an error in Greenplum Database 6.0. In earlier releases, the statement executed, although it is only useful when executed inside of a transaction.
  • The SELECT and VALUES SQL commands support the SQL 2008 OFFSET and FETCH syntax (PostgreSQL 8.4). These clauses provide an alternative syntax for limiting the results returned by a query.
  • The FROM clause can be omitted from a SELECT command, but Greenplum Database no longer allows queries that omit the FROM clause and also reference database tables.
  • The ROWS and RANGE SQL keywords have changed from reserved to unreserved, and may be used as table or column names without quoting.
  • In Greenplum 6, a query on an external table with descendants will by default recurse into the descendant tables. This is a change from previous Greenplum Database versions, which never recursed into descendants. To get the previous behavior in Greenplum 6, you must include the ONLY keyword in the query to restrict the query to the parent table.
  • The TRUNCATE SQL command has an optional ONLY keyword (PostgreSQL 8.4). When specified, the table is truncated without truncating any tables that inherit from it.
  • The createdb command-line utility has new options -l (--locale), --lc-collate, and --lc-ctype to specify the locale and character classification for the database (PostgreSQL 8.4).
  • The pg_dump, pg_dumpall, and pg_restore utilities have a new --role=rolename option that instructs the utility to execute SET ROLE rolename after connecting to the database and before starting the dump or restore operation (PostgreSQL 8.4).
  • The pg_dump and pg_dumpall command-line utilities have a new option --lock-wait-timeout=timeout (PostgreSQL 8.4). When specified, instead of waiting indefinitely the dump fails if the utility cannot acquire shared table locks within the specified number of milliseconds.
  • The -d and -D command-line options are removed from the pg_dump and pg_dumpall utilities. The corresponding long versions, --inserts and --column-inserts are still supported. A new --binary-upgrade option is added, for use by in-place upgrade utilities.
  • The -w (--no-password) option was added to the pg_dump, pg_dumpall, and pg_restore utilities.
  • The -D option is removed from the gpexpand utility. The expansion schema will be created in the postgres database.
  • The gpstate utility has a new -x option, which displays details of an in-progress system expansion. gpstate -s and gpstate with no options specified also report if a system expansion is in progress.
  • The pg_restore utility has a new option -j (--number-of-jobs) parameter. This option can reduce time to restore a large database by running tasks such as loading data, creating indexes, and creating constraints concurrently.
  • The vacuumdb utility has a new -F (--freeze) option to freeze row transaction information.
  • ALTER DATABASE includes the SET TABLESPACE clause to change the default tablespace.
  • CREATE DATABASE includes the COLLATE and CTYPE options for setting the collation order and character classification of the new database.
  • The server configuration parameter gp_workfile_compress_algorithm has been changed to gp_workfile_compression. When workfile compression is enabled, Greenplum Database uses Zstandard compression.
  • The Oracle Compatibility Functions are now available in Greenplum Database as an extension, based on the PostgreSQL orafce project at https://github.com/orafce/orafce. Instead of executing a SQL script to install the compatibility functions in a database, you now execute the SQL command CREATE EXTENSION orafce. The Greenplum Database 6.0 orafce extension is based on the orafce 3.7 release. See Oracle Compatibility Functions for information about differences between the Greenplum Database compatibility functions and the PostgreSQL orafce extension.
  • Greenplum Database 6 supports specifying a table column of the citext data type as a distribution key.
  • Greenplum Database 6 provides a single client and loader tool package that you can download and install on a client system. Previous Greenplum releases provided separate client and loader packages. For more information about the Greenplum 6 Clients package, refer to Client Tools in the supported platforms documentation.

Removed and Deprecated Features

Pivotal Greenplum Database 6 removes these features:
  • The gptransfer utility is no longer included; use gpcopy for all functionality that was provided with gptransfer.
  • The gp_fault_strategy system table is no longer used. Greenplum Database now uses the gp_segment_configuration system table to determine if mirroring is enabled.
  • Pivotal Greenplum Database 6 removes the gpcrondump, gpdbrestore, and gpmfr management utilities. Use gpbackup and gprestore to back up and restore Greenplum Database.
  • Pivotal Greenplum Database 6 no longer supports Veritas NetBackup.
  • Pivotal Greenplum Database 6 no longer supports the use of direct I/O to bypass the buffering of memory within the file system cache for backup.
  • Pivotal Greenplum Database 6 no longer supports the gphdfs external table protocol to access a Hadoop system. Use the Greenplum Platform Extension Framework (PXF) to access Hadoop in version 6. Refer to pxf:// Protocol for information about using the pxf external table protocol.
  • Pivotal Greenplum Database 6 no longer supports SSLv3.
  • Pivotal Greenplum Database 6 removes the following server configuration parameters:
    • gp_analyze_relative_error
    • gp_backup_directIO
    • gp_backup_directIO_read_chunk_mb
    • gp_connections_per_thread
    • gp_enable_sequential_window_plans
    • gp_idf_deduplicate
    • gp_snmp_community
    • gp_snmp_monitor_address
    • gp_snmp_use_inform_or_trap
    • gp_workfile_checksumming
  • The undocumented gp_cancel_query() function, and the configuration parameters gp_cancel_query_print_log and gp_cancel_query_delay_time, are removed in Greenplum Database 6.
  • Pivotal Greenplum Database 6 no longer supports the ability to configure a Greenplum Database system to trigger SNMP (Simple Network Management Protocol) alerts or send email notifications to system administrators if certain database events occur.
  • Pivotal Greenplum Database 6 removes the gpfilespace utility. The CREATE TABLESPACE command no longer requires a filespace created with the utility.
  • The Greenplum-Kafka Integration has removed support of LZMA/XZ compression for the Kafka Avro data format. The Greenplum-Kafka Integration continues to support libz- and snappy-compressed Avro data from Kafka.
Pivotal Greenplum Database 6 deprecates these features:
  • The server configuration parameter gp_ignore_error_table is deprecated and will be removed in the next major release.

    You can set this value of this parameter to true to avoid the Greenplum Database error when you run applications that execute CREATE EXTERNAL TABLE or COPY commands that include the Greenplum Database 4.3.x INTO ERROR TABLE clause.

    When this parameter is removed, Greenplum Database always returns an error when a CREATE EXTERNAL TABLE or COPY command contains the INTO ERROR TABLE clause.

  • Specifying => as an operator name in the CREATE OPERATOR command is deprecated.
  • The Greenplum external table C API is deprecated. Any developers using this API are encouraged to use the new Foreign Data Wrapper API in its place.

Differences Compared to Open Source Greenplum Database

Pivotal Greenplum 6.x includes all of the functionality in the open source Greenplum Database project and adds:
  • Product packaging and installation script.
  • Support for data connectors:
    • Greenplum-Spark Connector
    • Greenplum-Informatica Connector
    • Greenplum-Kafka Integration
    • Gemfire-Greenplum Connector
    • Greenplum Stream Server
  • Data Direct ODBC/JDBC Drivers
  • gpcopy utility for copying or migrating objects between Greenplum systems.
  • Support for managing Greenplum Database using Pivotal Greenplum Command Center.
  • Support for full text search and text analysis using Pivotal GPText.

Upgrading

Note: The on-disk representation of tablespaces has changed in the Beta-3 release. If you have a previously-installed Beta release, drop all existing tablespaces before you upgrade to Beta-3. Recreate the tablespaces after upgrade to use the current on-disk represenatation. See Creating and Managing Tablespaces for information about dropping and creating tablespaces.
Upgrade to a new releases of the Greenplum 6 Beta software (for example, to Beta-3) by installing the new RPM with yum upgrade. For example:
$ yum upgrade ./greenplum-db-6.0.0-beta.3-rhel7-x86_64.rpm

Upgrading a Pivotal Greenplum Database 4 or 5 system directly to Pivotal Greenplum Database 6 is not currently supported.

Migrating Greenplum 5 data to Greenplum Database 6 using gpcrondump and gpdbrestore, gpbackup and gprestore, or by using gpcopy, has not been tested. However, because of syntax and implementation changes in Greenplum 6, it is anticipated that further tooling changes and/or preparation of your Greenplum 5 data will be required before the data can be migrated into a Greenplum 6 system.

Known Issues and Limitations

Pivotal Greenplum 6 Beta has these limitations:

  • #164791118 - PL/R cannot be installed using the deprecated createlang utility, and displays the error:
    createlang: language installation failed: ERROR:  no schema has been selected to create in
    Workaround: Use CREATE EXTENSION to install PL/R, as described in the documentation.
  • The PL/Java and PL/R procedural language packages are not provided with the first Beta release.
  • The gpaddmirrors utility is not provided with the first Beta release.
  • The gpseginstall utility does not use the rpm package to install Greenplum 6 on remote hosts, and does not install the required software dependencies on those hosts. All Beta customers should use the yum utility to install Greenplum 6 on each segment host instead of using gpseginstall.
  • gppkg does not automatically install software dependencies to target Greenplum systems when installing PL/R. Before you install PL/R, execute these commands on each host as the root user (or use sudo to aquire root permission):
    $ yum install -y epel-release
    $ yum update -y
    $ yum install -y R

    After installing R on each host system, you can install PL/R using the instructions in Greenplum PL/R Language Extension.

  • Upgrading a Greenplum Database 4 or 5 release to Pivotal Greenplum 6 is not supported. See Upgrading.
  • gpcopy cannot yet copy data from Greenplum 4 or 5 to Greenplum 6.
  • Greenplum 6 Beta is not provided for installation on DCA systems.
  • Several Greenplum extension packages are not provided with this Beta release: MADlib, PostGIS, Python Data Science Module package, PL/Container, AutoExplain contrib module.
  • Connectivity packages are not provided with this Beta release: ODBC driver, JDBC driver, Windows Client and Loaders package.
  • Greenplum connectors are not yet supported in this Beta release: Greenplum-Spark Connector, Greenplum-Informatica Connector.
  • Greenplum Command Center, Greenplum Text, and Greenplum for Kubernetes are not provided with this Beta.
  • Greenplum Database 4 and 5 packages are not compatible with Pivotal Greenplum 6.

The following table lists key known issues in Pivotal Greenplum 6.x.

Table 2. Key Known Issues in Pivotal Greenplum 6.x
Issue Category Description
29139 DML In some cases for an append-optimized partitioned table, Greenplum Database acquires a ROW EXCLUSIVE lock on all leaf partitions of the table when inserting data directly into one of the leaf partitions of the table. The locks are acquired the first time Greenplum Database performs validation on the leaf partitions. When inserting data into one leaf partition, the locks are not acquired on the other leaf partitions as long as the validation information remains in memory.

The issue does not occur for heap-storage partitioned tables.

5489   The RETURNING clause is not supported in a DELETE statement that executes against an append-only table, and yields an error similar to:
ERROR:  DELETE RETURNING is not supported on appendonly tables  (seg2 slice1 127.0.0.1:40002 pid=6618)