Pivotal Greenplum 5.0.0 Release Notes

Pivotal Greenplum 5.0.0 Release Notes

Updated: September, 2017

Welcome to Pivotal Greenplum 5.0.0

Pivotal Greenplum is a massively parallel processing (MPP) database server that supports next generation data warehousing and large-scale analytics processing. By automatically partitioning data and running parallel queries, it allows a cluster of servers to operate as a single database supercomputer performing tens or hundreds times faster than a traditional database. It supports SQL, MapReduce parallel processing, and data volumes ranging from hundreds of gigabytes, to hundreds of terabytes.

Pivotal Greenplum 5.0.0 is a major new release, and is the first Pivotal Greenplum release based on the open source Greenplum Database project code. Pivotal Greenplum 5.0.0 includes many new features and product changes as compared to prior releases.

Pivotal Greenplum 5.0.0 software is available for download from Pivotal Network.

Important: Pivotal Greenplum 5.0.0 is not yet certified for running on DCA systems. Contact your DCA representative for information about the availability of Greenplum Database 5.0.0 support on the DCA.
Important: Pivotal Support does not provide support for open source versions of Greenplum Database. Only Pivotal Greenplum is supported by Pivotal Support.

New Features

PostgreSQL Core Features

Heap Data Checksums

Pivotal Greenplum 5.0.0 now includes a checksum feature to detect corruption in the I/O system for heap storage. Checksums are computed when data pages are flushed to disk and verified when pages are re-read from storage. If checksum verification fails, the data page is not allowed to be read back into memory. Checksums are calculated for all heap pages, in all databases, including pages that store heap tables, system catalogs, indexes, and database metadata.

Note that Greenplum Database append-only storage has its own built-in checksum protection separate from the feature described here.

In Greenplum Database, heap checksums are enabled by default. Disabling checksums is strongly discouraged, but it can be done by setting the HEAP_CHECKSUMS parameter to off in the cluster configuration file supplied to the gpinitsystem management utility. Once a Greenplum Database system has been initialized, the heap checksums setting cannot be changed without reinitializing the system. To change the checksums setting, you must dump the databases, reinitialize the cluster, and then restore the databases. See gpinitsystem for information about setting HEAP_CHECKSUMS in the cluster configuration file.

You can determine if heap checksums are enabled for a Greenplum Database cluster by checking the read-only data_checksums server configuration parameter:

$ gpconfig -s data_checksums

If a checksum verification fails, the page is not read into memory, so no transaction can read or write to it. An error is generated and the transaction is aborted.

A new system configuration parameter, ignore_checksum_failure, can be used to alter the system's behavior when a checksum verification fails. When this parameter is set to on, a failed checksum verification generates a warning, but the page is read into memory. If a transaction writes to the page and it is subsequently flushed to storage, corrupted data can be propagated to the mirror.
Warning: Because of the potential for data loss, the ignore_checksum_failure parameter should only be enabled when necessary to recover after data corruption has been detected.

New Datatype Support

Greenplum Database adds support for the built-in datatypes:
  • UUID—Universally Unique Identifiers (RFC 4122, ISO/IEC 9834-8:2005)
  • JSON—Variable, unlimited length JSON data. Greenplum Database also includes new built-in functions for supporting the JSON datatype. See Working with JSON Data.

You can now create enumerated datatypes using the CREATE TYPE name AS ENUM ( 'label' [, ... ] ) syntax. See CREATE TYPE.

Greenplum Database now supports arrays of arbitrary, complex compound types. See "Pseudo-Types" in Greenplum Database Data Types.

Hashing is supported for the NUMERIC datatype.

Improved XML Datatype Support

XML datatype built-in functions and SQL commands from PostgreSQL 9.1 are now included. The new functions and function-like expressions include:
  • cursor_to_xml
  • cursor_to_xmlschema
  • database_to_xml
  • database_to_xmlschema
  • database_to_xml_and_xmlschema
  • query_to_xml
  • query_to_xml_and_xmlschema
  • query_to_xmlschema
  • schema_to_xml
  • schema_to_xmlschema
  • schema_to_xml_and_xmlschema
  • table_to_xml
  • table_to_xmlschema
  • table_to_xml_and_xmlschema
  • XMLCONCAT
  • XMLELEMENT
  • XMLFOREST
  • XMLPI
  • XMLROOT
  • XMLPARSE
  • XMLSERIALIZE

The command SET XML OPTION { DOCUMENT | CONTENT } is supported to control XML data validation. The command is equivalent to setting the server configuration parameter xmloption

XML data is validated as an XML fragment. In Greenplum Database 4.3, XML data is validated as an XML document. The value for the server configuration parameter xmloption controls how XML data is validated. The default value for xmloption is content.

Note: With the default XML OPTION setting, you cannot directly cast a character string to type xml if the string contains a document type declaration. The definition of XML content fragment does not allow a document type declaration. To cast such as string, either use the XMLPARSE function or change the XML option.

See Working with XML Data.

Anonymous Blocks

Anonymous blocks are now supported for Greenplum Database procedural languages (PostgreSQL 9.0 feature). See the DO command reference.

dblink Module

The dblink module is provided for making easy connections to other databases either on the same database host, or on a remote host. Greenplum Database provides dblink support for database users to perform short ad hoc queries in other databases. dblink is not intended as a replacement for external tables or for administrative tools such as gptransfer. In this release, dblink has several limitations:
  • The dblink_send_query(), dblink_is_busy(), and dblink_get_result() functions are not supported.
  • Statements that modify table data cannot use named or implicit dblink connections. Instead, you must provide the connection string directly in the dblink function for such queries.

See dblink Functions for basic information about using dblink to query other databases. See dblink in the PostgreSQL documentation for more information about individual functions.

hstore Data Type and Functions

The hstore module is provided, which implements a data type and associated functions for storing sets of (key,value) pairs within a single Greenplum Database data field. See hstore Functions for more information about installing and using this optional module.

Cached Plan Invalidation

Greenplum Database 5.0.0 invalidates cached query plans when any of the relations referenced by the plan are dropped or altered. An individual cached query plan is invalidated when:
  • a definition changes in any of the relations on which the plan depends
  • any user-defined functions used in the plan are modified
  • statistics are updated for any table on which the plan depends (ANALYZE)

Additionally, all cached plans are invalidated when schemas, operators, or operator classes are modified.

The query is replanned if and when the next demand for the cached plan occurs.

Ordering Results with NULL Values

The SELECT command ORDER BY clause now supports ordering null values first or last. See the SELECT command reference.

Transaction ID Functions

New built-in functions are provided to return the transaction ID used by the current session. See thetxid_ functions described at System Information Functions in the PostgreSQL documentation.

Additional PostgreSQL Features

Greenplum Database 5.0.0 also includes these features from PostgreSQL 8.3 and later:
  • Heap-only tuples (HOT) support improves the performance of updating non-indexed columns on tables that have indexes.
  • Support for setting configuration parameters, cost, and default and variadic arguments on a per-function basis.
  • Support for setting modifier input and output functions a per-type basis.
  • Support for the CREATE/ALTER/DROP EXTENSION commands (PostgreSQL 9.1 feature).

Python 2.7

Pivotal Greenplum 5.0.0 upgrades the installed Python version to version 2.7. PL/Python and core Python management utilities are now based on version 2.7.

gpdbrestore Support for CASTs

gpdbrestore now supports restoring CASTs when using table and schema filters and the --change-schema flag. In previous releases, any restore that filtered based on a table or schema name did not restore user-created CASTs because the CASTs were not included in the filter.

Enhanced Session State Monitoring

The Greenplum Database session_state.session_level_memory_consumption view has been enhanced to include session idle time information. This information can be used from applications such as Greenplum Database Workload Manager and user-defined scripts to determine how long a session has been idle.

To use the enhanced view, you must run SQL scripts that are provided in the Greenplum Database installation contrib directory, $GPHOME/share/postgresql/contrib:
  1. (Optional). For each database in which the view was previously registered, you must uninstall the older view. Run the uninstall_gp_session_state.sql SQL script to drop the view and related database objects. This example uninstalls the view from a database named testdb.
     psql -d testdb -f $GPHOME/share/postgresql/contrib/uninstall_gp_session_state.sql
  2. For all databases in which you want to register the new view, run the $GPHOME/share/postgresql/contrib/gp_session_state.sql script to create the new view definition and related database objects. This example installs the view in a database named testdb.
     psql -d testdb -f $GPHOME/share/postgresql/contrib/gp_session_state.sql

For additional information about Workload Manager, refer to the Pivotal Greenplum Command Center and Workload Manager documentation.

Python Data Science Module Package

Greenplum Database now includes a Python Data Science Module package that you can optionally install. This package includes a set of commonly-used, open source data science Python modules. The Python Data Science Module package is available for download in .gppkg format from Pivotal Network. This package includes the following Python modules:
Table 1. Python Data Science Modules
Module Name Version
Beautiful Soup 4.6.0
Gensim 2.2.0
Keras 2.0.6
Lifelines 0.11.1
lxml 3.8.0
NLTK 3.2.4
NumPy 1.13.1
Pandas 0.20.3
Pattern-en 2.6
pyLDAvis 2.1.1
PyMC3 3.1
scikit-learn 0.18.2
SciPy 0.19.1
spaCy 1.8.2
StatsModels 0.8.0
Tensorflow 1.1.0
XGBoost 0.6a2

See the Python Data Science Module Package installation guide for additional information about this package.

R Data Science Library Package

Greenplum Database now includes an R Data Science Library package that you can optionally install. This package includes a set of commonly-used, open source data science R libraries. The R Data Science Library package is available for download in .gppkg format from Pivotal Network.

Refer to the R Data Science Library Package installation guide for additional information about this package, including the list of libraries provided.

COPY Command ON SEGMENT Clause

The COPY command now supports the ON SEGMENT clause, which enables you to copy data to or from multiple, segment-oriented files that are stored on the segment hosts. These files can be used for migrating data between clusters or for performing a backup. This differs from the default COPY behavior (COPY without specifying the ON SEGMENT clause), which copies data to or from a single file located on the Greenplum Database master host. See COPY.

Experimental Features

Because Pivotal Greenplum Database is based on the open source Greenplum Database project code, it includes several experimental features to allow interested developers to experiment with their use on development systems. Feedback will help drive development of these features, and they may become supported in future versions of the product.

Warning: Experimental features are not recommended or supported for production deployments. These features may change in or be removed from future versions of the product based on further testing and feedback. Moreover, any features that may be visible in the open source code but that are not described in the product documentation should be considered experimental and unsupported for production use.
Key experimental features in Greenplum Database 5.0.0 include:
  • Recursive WITH Queries (Common Table Expressions)—The RECURSIVE keyword for the WITH clause can be enabled by setting the server configuration parameter gp_recursive_cte_prototype to on. The keyword allows a subquery in the WITH clause of a SELECT [INTO] command to reference itself. See WITH Queries (Common Table Expressions).
  • Workload Management with Resource Groups—Resource groups provide fine-grained workload management of concurrent transactions, memory utilization, and CPU resources using groups that you assign to Greenplum Database roles. The resource group implementation is an experimental feature provided as an alternative to resource queue-based workload management. You can configure both resource queues and resource groups, but only one method of workload management can be active at a given time. See Using Resource Groups. See also Known Issues and Limitations for information about issues associated with resource groups.
  • The pgAdmin 4 tool is compatible with Pivotal Greenplum Daabase 5.0.0, but is not officially supported by Pivotal. You can use pgAdmin 4 to query Greenplum Database tables and to view Greenplum Database table DDL, but some functionality is either unavailable or not fully implemented when used with Greenplum Database:
    • Partitioned tables can cause pgAdmin performance to slow down.
    • External tables are not supported.
    • Graphical explain analyze is not supported.
    • Append-optimized tables are expected to work with pgAdmin 4.

    See https://www.pgadmin.org/ for information about installing and using the tool.

Changed Features

GPORCA as Default Optimizer

GPORCA is the default query optimizer. Greenplum Database uses GPORCA to generate an execution plan for a query when possible. If GPORCA cannot be used, the legacy query optimizer is used. In previous releases, the legacy query optimizer was the default query optimizer.

The server configuration parameter optimizer controls whether GPORCA or the legacy optimizer is the default optimizer. The default value for optimizer is on.

Legacy Optimizer Changes

For queries against partitioned tables, the legacy query optimizer has improved the method of partition selection. Now the legacy optimizer uses a PartitionSelector node. The old method required pattern matching, and duplicating Joins in the query plan. The old method was inefficient and might return incorrect results for queries that also contained volatile expressions.

Escape Characters in String Literals

Handling of backslash (\) in string literals, strings enclosed in quotes ('...') has changed. Backslashes appearing in string literals are interpreted as literal backslashes instead of escape characters, as specified in the SQL standard. In previous releases, treat backslashes in string literals are treated as escape characters.

The server configuration parameter standard_conforming_strings controls the handling of backslashes (\) in string literals. The default setting for the server configuration parameter standard_conforming_strings has changed to on. Set this parameter to off to treat backslashes in string literals as escape characters instead of literal backslashes. Applications can check this parameter to determine how string literals are processed.

Implicit Casting of Text

Implicit casting between text data types and time or numerical data types has changed to match PostgreSQL 8.3 behavior. PostgreSQL 8.3 limits implicit casts between text and non-text data types to minimize unexpected results and to minimize coercing data incorrectly.

When upgrading to Greenplum Database 5.0.0, SQL commands and user-defined functions that depend on implicit casts between text and non-text data types should be checked and explicit casts should be applied as necessary.

This example demonstrates the change. The WHERE clause in the SELECT command works in Greenplum Database 4.3.x, but fails in Greenplum Database 5.0.0.

CREATE TABLE test AS SELECT generate_series(1,10)::text as myid, generate_series(11,20) as mynum 
  DISTRIBUTED RANDOMLY ;

SELECT myid || mynum FROM test WHERE myid = 3 ;
In Greenplum Database 5.0.0, the equality comparison requires an explicit CAST.
SELECT myid || mynum FROM test WHERE myid::INT = 3 ;

Logging for Formatting Errors

In Greenplum Database 4.3.x releases, you could capture formatting errors generated during CREATE EXTERNAL TABLE and COPY operations and log them either internally or to a user-specified Greenplum Database table. Greenplum Database 5.0.0 stores this error information only internally and no longer supports using the [INTO error-table] clause in CREATE EXTERNAL TABLE and COPY command LOG ERRORS statements.

Upgrade Action Required: Identify any Greenplum 4.3.x database scripts you have written in which you included the [INTO error-table] clause. Remove the clause from the script(s). You must also replace any SQL commands associated with the examination or manipulation of error-tables. You can use the gp_read_error_log() and gp_truncate_error_log() built-in functions to view and manage the internal error log associated with your table.

ANALYZE Command

The ANALYZE command uses the faster PostgreSQL implementation to gather table statistics, improving its performance for both heap and append-optimized tables.

A sample of rows is collected in a single query, and a calculation of statistics for each column is performed in memory. Previously, separate queries for each column were issued.

During the analyze operation, a table is no longer created to hold the sample. This is more efficient for small tables.

Query Dispatcher

As the default, Greenplum Database now uses an asynchronous dispatcher when processing SQL queries. The server configuration parameter gp_connections_per_thread controls the generation Greenplum Database query dispatcher (QD) worker threads when processing SQL queries.

How Greenplum Database interprets the value of the server configuration parameter gp_connections_per_thread has changed. For Greenplum Database 5.0.0, the default value is 0. For the default value, a query dispatcher generates two types of threads: a main thread that manages the dispatch of query plan work, and an interconnect thread. The main thread also acts as a worker thread. For information about how Greenplum Database manages threads during query processing, see the server configuration parameter gp_connections_per_thread in the Greenplum Database Reference Guide.

Datatype Storage

The on-disk representation of the NUMERIC data type has changed. This change affects the size of existing data on disk.

The MONEY data type has changed from 32-bit to 64-bit. This change affects the on-disk size of existing data.

S3 and Custom Protocols

You can now specify the ON MASTER clause when creating s3 and custom protocol readable and writable external tables. This clause restricts all table operations to the master segment. ON MASTER is not supported on external tables you create using the gpfdist, gpfdists, gphdfs, and file protocols.

The s3 protocol now supports S3 server-side encryption using Amazon S3-Managed Keys (SSE-S3) for both readable and writable external tables. See s3 Protocol AWS Server-Side Encryption Support for additional information about this feature.

System Catalog

In Greenplum Database 5.0.0, many system catalog tables and views have changed as a result of:
  • Upgrading the version of PostgreSQL on which Greenplum Database is based (PostgreSQL 8.3)
  • New features or bug fixes that required catalog changes

Existing queries that access system catalog information may require adjustments due to changes in the Greenplum Database 5.0.0 system catalog tables and views.

Some specific system catalog changes in Greenplum Database 5.0.0 include:
  • The pg_proc table now defines the get_ao_compression_ratio() function as a strict function, in order to fix 26591.
  • The pg_proc table now includes the following columns:
    • procost - the estimated execution cost of the function
    • provariadic - the data type of the variadic array parameter's elements
    • pronargdefaults - the number of arguments that have default values
    • proargdefaults - expression trees representing default argument values
  • The pg_type table now includes the typemodin and typemodout columns to support type modifier input and output functions.

pg_dumpall Utility

The pg_dumpall command now supports the --roles-only and -t | --tablespaces-only options to dump only roles or tablespaces, respectively. Additionally, pg_dumpall no longer supports the -r option to dump resource queue data; you must specify the --resource-queues option for this feature.

psql Utility

The psql \dx meta-command no longer displays Greenplum Database external tables, it now displays Greenplum Database extensions. This change was made for compatibility with PostgreSQL psql. Use the \dE meta-command to display Greenplum Database external tables.

gpstart Utility

The gpstart command now supports the --skip-heap-checksum-validation option that disables checking the consistency of the heap checksum setting among the Greenplum Database master and segment instances during startup. For information about heap checksums, see Heap Data Checksums.

PL/Python Multi-Dimensional Arrays

PL/Python now supports multi-dimensional arrays as input and return function arguments. This feature makes a backwards-incompatible change to the handling of arrays composed of composite types. Previously, you could return an array of composite types as "[[col1, col2], [col1, col2]]". PL/Python now interprets this as a two-dimensional array. Composite types in arrays must now be returned as Python tuples, not lists, to resolve the ambiguity; for example, "[(col1, col2), (col1, col2)]".

In the typical usage pattern, you will specify lists [] for arrays and tuples () for composite types. This usage may be accepted in other contexts:
  • [] is still accepted for composite types that are not array elements.
  • () is still accepted for arrays at the top level, but such arrays are always treated as single-dimensional.

Updatable Cursors

Support for updatable cursors in Greenplum Database is clarified. UPDATE/DELETE WHERE CURRENT OF... cursor statements can be executed on the server, for example in an interactive psql session or a script executed with psql. Extensions such as PL/pgSQL and PL/Java do not support updatable cursors. A statement that Greenplum Database does not support updatable cursors has been removed from the documentation.

Creating Trusted Languages

Database owners (instead of only superusers) can create trusted languages.

pgcrypto Removes FIPS Support

The pgcrypto extension package provided with Greenplum Database 5.0.0 no longer supports Federal Information Processing Standard (FIPS) 140-2 mode. See Removed and Deprecated Features.

Locking for ALTER TABLE RENAME

For the ALTER TABLE RENAME command, renaming a relation acquires an Exclusive lock on the relation.

Free Tuple Handling for Persistent Tables

Previous releases of Greenplum Database used complex code and on-disk structures to manage free tuples for persistent tables. In version 5.0.0, the tuple deletion code has been refactored to use the native heap delete and vacuum framework. This change should not affect any existing scripts or administration tasks. However, the change resolves several system failure scenarios associated with the earlier implementation.

Transaction ID Assignment

When performing database transactions, Greenplum Database assigns a transaction ID (XID) value only to a transaction that involves a DDL or DML operation, which is typically the only transaction that requires an XID. An XID is not assigned to a read-only query. Reducing the use of XID values reduces the risk of XID wrap-around when performing a high number of transactions and can reduce the need to vacuum a database.

Default Master Data Directory for Utilities

For the Greenplum Database utilities gpactivatestandby and gpdeletesystem, the option -d standby_master_datadir is no longer required. The default value for the -d option is the value of MASTER_DATA_DIRECTORY environment variable.

Independent relfilenode and OID

In Greenplum Database 5.0, for the pg_class system catalog table, the relfilenode and (internal) OID values for a given relation are completely decoupled. If you created any scripts that assume relfilenode and OID are the same, then you must modify those scripts. Additionally, any assumptions that you may have made about the equivalency of relfilenodes across the Greenplum Database master and segment hosts are no longer valid. As in previous Greenplum Database releases, a relfilenode is guaranteed to be unique only across a tablespace in Greenplum Database version 5.0.

Partner Connector

Installing Greenplum Database 5.0.0 also installs the Greenplum Partner Connector library. In previous Greenplum Database versions, you installed the Partner Connector from a separate .gppkg.

GPORCA Supports Indexes on Leaf Child Partitions

If an index exists on a leaf child partition, GPORCA considers the index when generating a query plan for queries against the leaf child partition. In previous releases, GPORCA did not consider the index when generating plans even though using the index would benefit the query that directly accesses the leaf child partition.

Requirement for ZLIB Compression

Pivotal Greenplum Database no longer packages and installs the libz.so library required for zlib compression and decompression. Instead, this library is part of the requirements of the underlying operating system. If necessary, install zlib on each Pivotal Greenplum host. For example:
$ sudo yum install -y zlib

See System Requirements for complete information about operating system requirements.

Removed and Deprecated Features

Pivotal Greenplum 5.0.0 removes these features:
  • The utilities gpmigrator and gpmigrator_mirror are no longer used.
  • The pgcrypto extension package provided with Greenplum Database 5.0.0 no longer supports Federal Information Processing Standard (FIPS) 140-2 mode.

    The server configuration parameter pgcrypto.fips has been removed.

    The server configuration parameter password_hash_algorithm does not support the valuesha-256-fips.

    The server configuration parameter custom_variable_classes no longer requires the pgcrypto value that enables FIPS 140-2. The value can be removed from the parameter.

  • The following analytics functions were deprecated in Greenplum Database 4.3 and have been removed from Greenplum Database 5.0:
    • matrix_add()
    • matrix_multiply()
    • matrix_transpose()
    • pinv()
    • mregr_coef()
    • mregr_r2()
    • mregr_pvalues()
    • mregr_tstats()
    • nb_classify()
    • nb_probabilities()

    The Greenplum MADlib Extension for Analytics is recommended for performing analytics in Greenplum Database.

  • In Greenplum Database 4.3.x releases, you could capture formatting errors generated during CREATE EXTERNAL TABLE and COPY operations and log them either internally or to a user-specified Greenplum Database table. Greenplum Database 5.0.0 stores this error information only internally and no longer supports using the [INTO error-table] clause in CREATE EXTERNAL TABLE and COPY command LOG ERRORS statements.

    Upgrade Action Required: Identify any Greenplum 4.3.x database scripts you have written in which you included the [INTO error-table] clause. Remove the clause from the script(s). You must also replace any SQL commands associated with the examination or manipulation of error-tables. You can use the gp_read_error_log() and gp_truncate_error_log() built-in functions to view and manage the internal error log associated with your table.

  • The following parameters were never included in the Greenplum Database documentation, and have been removed from the product. This removal notice is for developers who may have learned about the parameters from the open source content:
    • gp_hashagg_compress_spill_files
    • gp_eager_hashtable_release
    • gp_workfile_compress_algorithm
    • max_work_mem
    • gp_hash_index
  • The workfile caching feature has been removed. The feature was never documented. This removal notice is for developers who may have learned about the feature from the open source content.

The gpcrondump and gpdbrestore utilities are deprecated and will not be supported after the end of Greenplum Database 5.x Support Life.

Differences Compared to Open Source Greenplum Database

Pivotal Greenplum 5.0.0 includes all of the functionality in the open source Greenplum Database project and adds:
  • Product packaging and installation script.
  • Support for QuickLZ compression. QuickLZ compression is not provided in the open source version of Greenplum Database due to licensing restrictions.
  • Support for managing Greenplum Database using Pivotal Greenplum Command Center.
  • Support for monitoring and managing queries with Pivotal Greenplum Workload Manager.
  • Support for full text search and text analysis using Pivotal GPText.

Supported Platforms

Pivotal Greenplum 5.0.0 runs on the following platforms:

  • Red Hat Enterprise Linux 64-bit 7.x (See the following Note)
  • Red Hat Enterprise Linux 64-bit 6.x
  • SuSE Linux Enterprise Server 64-bit 11 SP4 (See the following Note)
  • CentOS 64-bit 7.x
  • CentOS 64-bit 6.x
Important: Pivotal Greenplum 5.0.0 is not yet certified for running on DCA systems. Contact your DCA representative for information about the availability of Greenplum Database 5.0.0 support on the DCA.
Note: Greenplum Database on SuSE Linux Enterprise systems does not support the PL/Perl procedural language or the gpmapreduce tool.
Note: For Greenplum Database that is installed on Red Hat Enterprise Linux 7.x or CentOS 7.x prior to 7.3, an operating system issue might cause Greenplum Database that is running large workloads to hang in the workload. The Greenplum Database issue is caused by Linux kernel bugs.

RHEL 7.3 and CentOS 7.3 resolves the issue.

Pivotal Greenplum 5.0.0 supports these Java versions:
  • 8.xxx
  • 7.xxx

Greenplum Database 5.0.0 software that runs on Linux systems uses OpenSSL 1.0.2l (with FIPS 2.0.16), cURL 7.54, OpenLDAP 2.4.44, and Python 2.6.9.

Greenplum Database client software that runs on Windows and AIX systems uses OpenSSL 0.9.8zg.

The Greenplum Database s3 external table protocol supports these data sources:

Pivotal Greenplum 5.0.0 supports Data Domain Boost on Red Hat Enterprise Linux.

This table lists the versions of Data Domain Boost SDK and DDOS supported by Pivotal Greenplum 5.0.0.

Table 2. Data Domain Boost Compatibility
Pivotal Greenplum Data Domain Boost DDOS
5.0.0 3.3 6.0 (all versions)
Note: In addition to the DDOS versions listed in the previous table, Pivotal Greenplum 5.0.0 and later supports all minor patch releases (fourth digit releases) later than the certified version.
Note: Pivotal Greenplum 5.0.0 does not support the ODBC driver for Cognos Analytics V11.

Connecting to IBM Cognos software with an ODBC driver is not supported. Greenplum Database supports connecting to IBM Cognos software with the DataDirect JDBC driver for Pivotal Greenplum. This driver is available as a download from Pivotal Network.

Veritas NetBackup

Pivotal Greenplum 5.0.0 supports backup with Veritas NetBackup version 7.7.3. See Backing Up Databses with Veritas NetBackup.

Supported Platform Notes

The following notes describe platform support for Pivotal Greenplum. Please send any questions or comments to Pivotal Support at https://support.pivotal.io.

  • The only file system supported for running Greenplum Database is the XFS file system. All other file systems are explicitly not supported by Pivotal.
  • Greenplum Database is supported on all 1U and 2U commodity servers with local storage. Special purpose hardware that is not commodity may be supported at the full discretion of Pivotal Product Management based on the general similarity of the hardware to commodity servers.
  • Greenplum Database is supported on network or shared storage if the shared storage is presented as a block device to the servers running Greenplum Database and the XFS file system is mounted on the block device. Network file systems are not supported. When using network or shared storage, Greenplum Database mirroring must be used in the same way as with local storage, and no modifications may be made to the mirroring scheme or the recovery scheme of the segments. Other features of the shared storage such as de-duplication and/or replication are not directly supported by Pivotal Greenplum Database, but may be used with support of the storage vendor as long as they do not interfere with the expected operation of Greenplum Database at the discretion of Pivotal.
  • Greenplum Database is supported when running on virtualized systems, as long as the storage is presented as block devices and the XFS file system is mounted for the storage of the segment directories.
  • A minimum of 10-gigabit network is required for a system configuration to be supported by Pivotal.
  • Greenplum Database is supported on Amazon Web Services (AWS) servers using either Amazon instance store (Amazon uses the volume names ephemeral[0-20]) or Amazon Elastic Block Store (Amazon EBS) storage. If using Amazon EBS storage the storage should be RAID of Amazon EBS volumes and mounted with the XFS file system for it to be a supported configuration.
  • For Red Hat Enterprise Linux 7.2 or CentOS 7.2, the default systemd setting RemoveIPC=yes removes IPC connections when non-system users logout. This causes the Greenplum Database utility gpinitsystem to fail with semaphore errors. To avoid this issue, see "Setting the Greenplum Recommended OS Parameters" in the Greenplum Database Installation Guide.

Pivotal Greenplum Tools and Extensions Compatibility

Client Tools

Greenplum releases a number of client tool packages on various platforms that can be used to connect to Greenplum Database and the Greenplum Command Center management tool. The following table describes the compatibility of these packages with this Greenplum Database release.

Tool packages are available from Pivotal Network.

Table 3. Pivotal Greenplum 5.0.0 Tools Compatibility
Tool Description of Contents Tool Version Server Versions
Pivotal Greenplum Clients Greenplum Database Command-Line Interface (psql) 5.0 5.0.0
Pivotal Greenplum Loaders Greenplum Database Parallel Data Loading Tools (gpfdist, gpload) 5.0 5.0.0
Pivotal Greenplum Command Center Greenplum Database management tool 3.2.2 and later 5.0.0
Pivotal Greenplum Workload Manager Greenplum Database query monitoring and management tool 1.8.0 5.0.0

The Greenplum Database Client Tools and Load Tools are supported on the following platforms:

  • AIX 7.2 (64-bit) (Client and Load Tools only)
  • Red Hat Enterprise Linux x86_64 6.x (RHEL 6)
  • SuSE Linux Enterprise Server x86_64 SLES 11
  • Windows 10 (32-bit and 64-bit)
  • Windows 8 (32-bit and 64-bit)
  • Windows Server 2012 (32-bit and 64-bit)
  • Windows Server 2012 R2 (32-bit and 64-bit)
  • Windows Server 2008 R2 (32-bit and 64-bit)

Extensions

Table 4. Pivotal Greenplum 5.0.0 Extensions Compatibility
Pivotal Greenplum Extension Version(s)
MADlib machine learning for Greenplum Database 5.0.x1 MADlib 1.12, 1.11
PL/Java for Greenplum Database 5.0.x PL/Java 1.4.2, 1.4.0
PL/R for Greenplum Database 5.0.x 2.3.0, 2.2.0
PostGIS Spatial and Geographic Objects for Greenplum Database 5.0.x 2.1.5
Python Data Science Module Package for Greenplum Database 5.0.x 1.0.0, 1.1.0
R Data Science Library Package for Greenplum Database 5.0.x 1.0.0
Note: 1For information about MADlib support and upgrade information, see the MADlib FAQ.

Pivotal GPText Compatibility

Pivotal Greenplum Database 5.0.0 is compatible with Pivotal GPText version 2.1.3 or later.

Hadoop Distribution Compatibility

This table lists the supported Hadoop distributions:

Table 5. Supported Hadoop Distributions
Hadoop Distribution Version gp_hadoop_ target_version
Cloudera CDH 5.2, 5.3, 5.4.x - 5.8.x cdh5
CDH 5.0, 5.1 cdh4.1
Hortonworks Data Platform HDP 2.1, 2.2, 2.3, 2.4, 2.5 hdp2
MapR MapR 4.x, MapR 5.x gpmr-1.2
Apache Hadoop 2.x hadoop2
Note: MapR requires the MapR client. For MapR 5.x, only TEXT and CSV are supported in the FORMAT clause of the CREATE EXTERNAL TABLE command.

Migrating Data to Pivotal Greenplum 5.0.0

Note: Upgrading a Pivotal Greenplum Database 4.x system directly to Pivotal Greenplum Database 5.0.0 is not supported.

You can migrate existing data to Greenplum Database 5.0.0 using standard backup and restore procedures (gpcrondump and gpdbrestore) or by using gptransfer if both clusters will run side-by-side.

Follow these general guidelines for migrating data:
  • Make sure that you have a complete backup of all data in the Greenplum Database 4.3.x cluster, and that you can successfully restore the Greenplum Database 4.3.x cluster if necessary.
  • You must install and initialize a new Greenplum Database 5.0.0 cluster using the version 5.0.0 gpinitsystem utility.
    Note: Unless you modify file locations manually, gpdbrestore only supports restoring data to a cluster that has an identical number of hosts and an identical number of segments per host, with each segment having the same content_id as the segment in the original cluster. If you initialize the Greenplum Database 5.0.0 cluster using a configuration that is different from the version 4.3 cluster, then follow the steps outlined in Restoring to a Different Greenplum System Configuration to manually update the file locations.
  • If you intend to install Greenplum Database 5.0.0 on the same hardware as your 4.3.x system, you will need enough disk space to accommodate over 5 times the original data set (2 full copies of the primary and mirror data sets, plus the original backup data in ASCII format). Keep in mind that the ASCII backup data will require more disk space than the original data, which may be stored in compressed binary format. Offline backup solutions such as Dell EMC Data Domain or Veritas NetBackup can reduce the required disk space on each host.
  • Use the version 5.0.0 gpdbrestore utility to load the 4.3.x backup data into the new cluster.
  • If the Greenplum Database 5.0.0 cluster resides on separate hardware from the 4.3.x cluster, you can optionally use the version 5.0.0 gptransfer utility to migrate the 4.3.x data. You must initiate the gptransfer operation from the version 5.0.0 cluster, pulling the older data into the newer system.
  • After migrating data you may need to modify SQL scripts, administration scripts, and user-defined functions as necessary to account for changes in Greenplum Database version 5.0.0. Look for Upgrade Action Required entries in these release notes for features that may necessitate post-migration tasks.

Resolved Issues

The following issues discovered during the Beta program have been resolved in Greenplum Database 5.0.0.
  • GPORCA did not properly handle a NULL while deriving propagation expressions that involved default list partitions with indexes. This caused a database crash. Greenplum Database now falls back to the planner in this scenario.

    SHA: gporca/commit/de0a7729

  • In some cases, running the gpdbrestore -s dbname command failed when the db_dumps directory contained more than one backup directory with a date stamp YYYYMMDD as the directory name. This failure occurred when the utility searched for the latest timestamp report file in the wrong directory. The utility now searches in the correct directory.

    SHA: gpdb/commit/aaa88b21

    SHA: gpdb/commit/d3e16577

  • For queries that have a bitmap and btree index in a qual clause, the executor could crash or error out when using the planner plan. This issue has been resolved.

    SHA: gpdb/commit/eda18c68

  • Running gppkg --remove did not work if more than one package was installed. This issue has been fixed.

    SHA: gpdb/commit/1cb80806

Known Issues and Limitations

Pivotal Greenplum 5.0.0 has these limitations:

  • Upgrading a Greenplum Database 4.3.x release to Pivotal Greenplum 5.0.0 is not supported. See Migrating Data to Pivotal Greenplum 5.0.0.
  • Pivotal Greenplum 5.0.0 is not yet certified for running on DCA systems. Contact your DCA representative for information about the availability of Greenplum Database 5.0.0 support on the DCA.
  • Greenplum Database resource groups and recursive WITH Queries (Common Table Expressions) are considered work-in-progress and are experimental features. Pivotal does not support using experimental features in a production environment.
  • Greenplum Database 4.3.x packages are not compatible with Pivotal Greenplum 5.0.0.

The following table lists key known issues in Pivotal Greenplum 5.0.0.

Table 6. Key Known Issues in Pivotal Greenplum 5.0.0
Issue Category Description
151468673 Client and Loader Tools For the Greenplum Database 5.0.0 Client and Loader tools for RHEL and SLES platforms, the default installation path is not valid. The default installation path contains a : that is not handled properly.

Workaround: When installing a Client or Loader tool, specify a installation path, for example /usr/local/greenplum-loaders-5.0.0. Do not use the default installation directory.

151135629 COPY command When the ON SEGMENT clause is specified, the COPY command does not support specifying a SELECT statement in the COPY TO clause. For example, this command completes successfully, but the files are not created on the segment hosts:
COPY (SELECT * FROM testtbl) TO '/tmp/mytst<SEGID>' ON SEGMENT
29064 Storage: DDL The money datatype accepts out-of-range values as negative values, and no error message is displayed.

Workaround: Use only in-range values for the money datatype (32-bit for Greenplum Database 4.x, or 64-bit for Greenplum Database 5.x). Or, use an alternative datatype such as numeric or decimal.

3290 JSON The to_json() function is not implemented as a callable function. Attempting to call the function results in an error. For example:
tutorial=# select to_json('Fred said "Hi."'::text); ERROR: function to_json(text) does not exist
LINE 1: select to_json('Fred said "Hi."'::text);
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.

Workaround: Greenplum Database invokes to_json() internally when casting to the json data type, so perform a cast instead. For example: SELECT '{"foo":"bar"}'::json; Greenplum Database also provides the array_to_json() and row_to_json() functions.

148119917 Resource Groups (Experimental) Testing of the experimental resource groups feature has found that a kernel panic can occur when using the default kernel in RHEL/CentOS system. The problem occurs due to a problem in the kernel cgroups implementation, and results in a kernel panic backtrace similar to:
[81375.325947] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      [81375.325986] IP: [<ffffffff812f94b1>] rb_next+0x1/0x50 [81375.326014] PGD 0 [81375.326025]
      Oops: 0000 [#1] SMP [81375.326041] Modules linked in: veth ipt_MASQUERADE
      nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype
      iptable_filter xt_conntrack nf_nat nf_conntrack bridge stp llc intel_powerclamp coretemp
      intel_rapl dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio kvm_intel kvm crc32_pclmul
      ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt
      iTCO_vendor_support ses enclosure ipmi_ssif pcspkr lpc_ich sg sb_edac mfd_core edac_core
      mei_me ipmi_si mei wmi ipmi_msghandler shpchp acpi_power_meter acpi_pad ip_tables xfs
      libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect crct10dif_pclmul
      sysimgblt crct10dif_common crc32c_intel drm_kms_helper ixgbe ttm mdio ahci igb libahci drm ptp
      pps_core libata dca i2c_algo_bit [81375.326369]  i2c_core megaraid_sas dm_mirror
      dm_region_hash dm_log dm_mod [81375.326396] CPU: 17 PID: 0 Comm: swapper/17 Not tainted
      3.10.0-327.el7.x86_64 #1 [81375.326422] Hardware name: Cisco Systems Inc
      UCSC-C240-M4L/UCSC-C240-M4L, BIOS C240M4.2.0.8b.0.080620151546 08/06/2015 [81375.326459] task:
      ffff88140ecec500 ti: ffff88140ed10000 task.ti: ffff88140ed10000 [81375.326485] RIP:
      0010:[<ffffffff812f94b1>]  [<ffffffff812f94b1>] rb_next+0x1/0x50 [81375.326514] RSP:
      0018:ffff88140ed13e10  EFLAGS: 00010046 [81375.326534] RAX: 0000000000000000 RBX:
      0000000000000000 RCX: 0000000000000000 [81375.326559] RDX: ffff88282f1d4800 RSI:
      ffff88280bc0f140 RDI: 0000000000000010 [81375.326584] RBP: ffff88140ed13e58 R08:
      0000000000000000 R09: 0000000000000001 [81375.326609] R10: 0000000000000000 R11:
      0000000000000001 R12: ffff88280b0e7000 [81375.326634] R13: 0000000000000000 R14:
      0000000000000000 R15: 0000000000b6f979 [81375.326659] FS:  0000000000000000(0000)
      GS:ffff88282f1c0000(0000) knlGS:0000000000000000 [81375.326688] CS:  0010 DS: 0000 ES: 0000
      CR0: 0000000080050033 [81375.326708] CR2: 0000000000000010 CR3: 000000000194a000 CR4:
      00000000001407e0 [81375.326733] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
      0000000000000000 [81375.326758] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
      0000000000000400 [81375.326783] Stack: [81375.326792]  ffff88140ed13e58 ffffffff810bf539
      ffff88282f1d4780 ffff88282f1d4780 [81375.326826]  ffff88140ececae8 ffff88282f1d4780
      0000000000000011 ffff88140ed10000 [81375.326861]  0000000000000000 ffff88140ed13eb8
      ffffffff8163a10a ffff88140ecec500 [81375.326895] Call Trace: [81375.326912]
      [<ffffffff810bf539>] ? pick_next_task_fair+0x129/0x1d0 [81375.326940]  [<ffffffff8163a10a>]
      __schedule+0x12a/0x900 [81375.326961]  [<ffffffff8163b9e9>] schedule_preempt_disabled+0x29/0x70
      [81375.326987]  [<ffffffff810d6244>] cpu_startup_entry+0x184/0x290 [81375.327011]
      [<ffffffff810475fa>] start_secondary+0x1ba/0x230 [81375.327032] Code: e5 48 85 c0 75 07 eb 19 66
      90 48 89 d0 48 8b 50 10 48 85 d2 75 f4 48 8b 50 08 48 85 d2 75 eb 5d c3 31 c0 5d c3 0f 1f 44
      00 00 55 <48> 8b 17 48 89 e5 48 39 d7 74 3b 48 8b 47 08 48 85 c0 75 0e eb [81375.327157] RIP
      [<ffffffff812f94b1>] rb_next+0x1/0x50 [81375.327179]  RSP <ffff88140ed13e10> [81375.327192] CR2:
      0000000000000010

Workaround: Upgrade to the latest-available kernel for your Red Hat or CentOS release to avoid the above system panic.

149789783 Resource Groups Significant performance degradation has been observed when enabling resource group-based workload management on RedHat, CentOS, and SuSE systems using the default kernel.

Workaround: Performance problems associated with resource groups can bepartially mitigated by upgrading to the latest supported kernels for the respective RedHat, CentOS, or SuSE release.

149970753 Resource Groups The gpcrondump and gpdbrestore utilities do not backup or restore configuration information for the experimental Resource Groups feature.

Workaround: If you need to preserve your resource groups configuration, store the associated SQL commands in a separate file and manually apply the commands after restoring from a backup.

150906510 Backup and Restore Greenplum Database 4.3.15.0 and later backups contain the following line in the backup files:
SET gp_strict_xml_parse = false;

However, Greenplum Database 5.0.0 does not have a parameter named gp_strict_xml_parse. When you restore the 4.3 backup set to the 5.0.0 cluster, you may see the warning:

[WARNING]:-gpdbrestore finished but ERRORS were found, please check the restore report file for details

Also, the report file may contain the error:

ERROR:  unrecognized configuration parameter "gp_strict_xml_parse"

These warnings and errors do not affect the restoration procedure, and can be ignored.