Pivotal Greenplum 5.1.0 Release Notes

Pivotal Greenplum 5.1.0 Release Notes

Updated: November, 2017

Welcome to Pivotal Greenplum 5.1.0

Pivotal Greenplum Database is a massively parallel processing (MPP) database server that supports next generation data warehousing and large-scale analytics processing. By automatically partitioning data and running parallel queries, it allows a cluster of servers to operate as a single database supercomputer performing tens or hundreds times faster than a traditional database. It supports SQL, MapReduce parallel processing, and data volumes ranging from hundreds of gigabytes, to hundreds of terabytes.

Note: This document contains pertinent release information about Pivotal Greenplum Database 5.1.0. For previous versions of the release notes for Greenplum Database, go to Pivotal Greenplum Database Documentation. For information about Greenplum Database end of life, see Pivotal Greenplum Database end of life policy.

Pivotal Greenplum 5.x software is available for download from Pivotal Network.

Important: Significant Greenplum Database performance degradation has been observed when enabling resource group-based workload management (an experimental feature) on RedHat 6.x, CentOS 6.x, and SuSE 11 systems. This issue is caused by a Linux cgroup kernel bug. This kernel bug has been fixed in CentOS 7.x and Red Hat 7.x systems.

If you use RedHat 6 and the performance with resource groups is acceptable for your use case, upgrade your kernel to version 2.6.32-696 or higher to benefit from other fixes to the cgroups implementation.

SuSE 11 does not have a kernel version that resolves this issue. See known issue 149789783.

Pivotal Greenplum 5.x is based on the open source Greenplum Database project code.

Important: Pivotal Support does not provide support for open source versions of Greenplum Database. Only Pivotal Greenplum Database is supported by Pivotal Support.

Greenplum Database 5.1.0 is a minor release that includes product enhancements and changes, and resolves some known issues.

New Features

GPORCA Enhanced Short-Running Query Performance

In Greenplum Database 5.1.0, GPORCA reduces optimization time by retrieving column statistics only for the columns required for cardinality estimation. Eliminating the overhead of retrieving and generating the statistics for columns that are not used in cardinality estimation improves the performance of short running queries.

In previous releases, when only column width information was required by GPORCA, the column statistics was also retrieved.

GPORCA Performance Enhancements

Greenplum Database 5.1.0 includes these GPORCA performance enhancements.

  • GPORCA has reduced that maximum number of join combinations that are evaluated during query optimization. This reduces the time evaluating join combinations without significantly reducing query performance.

    In previous releases, for queries that contain a large number of joins, GPORCA spent a significant amount of time evaluating all possible multiple join combinations to determine the most efficient join tree to use in the query plan.

  • For some queries that contain a correlated subquery and the subquery contains a window function, GPORCA de-correlates the query and produces an efficient plan that uses joins.

    In previous releases, GPORCA generated a suboptimal query plan with correlated execution.

GPORCA Supports Indexes on Leaf Child Partitions

In Greenplum Database 5.1.0, if an index exists on a leaf child partition, GPORCA considers the index when generating a query plan for queries against the leaf child partition. In previous releases, GPORCA did not consider the index when generating plans even though using the index would benefit the query that directly accesses the leaf child partition.

COPY Table Data to or from a Program

In Greenplum Database 5.1.0, the COPY command supports the PROGRAM 'command' clause to copy table data to or from a program. The command must be specified from the viewpoint of the Greenplum Database master host system, and must be executable by the Greenplum Database administrator user (gpadmin). For the COPY FROM command, the input is read from standard output of the command, and for the COPY TO command, the output is written to the standard input of the command.

For more information about the command, see the COPY command.

gptransfer Supports SHA-256 Data Validation

In Greenplum Database 5.1.0, the Greenplum Database gptransfer utility supports validating transferred table data with SHA-256 checksums when copying database objects between source and destination systems. Specify --validate value sha256 to compare SHA-256 checksums values between source and destination table data.

On a Greenplum Database system with FIPS enabled, the option md5 is not supported. Use the option count or sha256 to validate table data.

Note: Validating table data with SHA-256 requires the Greenplum Database pgcrypto extension. The extension is included with Pivotal Greenplum 5.x. When copying data from a supported Pivotal Greenplum 4.3.x system, the extension package must be installed on the 4.3.x system. You do not need to run pgcrypto.sql to install the pgcrypto functions in a Greenplum 4.3.x database.

For more information about the utility, see gptransfer.

gprecoverseg Performance Enhancement

In Greenplum Database 5.1.0, the performance of the Greenplum Database gprecoverseg utility has been improved when performing directory deletes as part of a full recovery (gprecoverseg -F command). gprecoverseg now uses rsync instead of an rm -rf operation against directories, which improves delete performance with a large number of files.

PXF Extension Framework

The PXF Extension Framework (PXF) is integrated with Greenplum Database 5.1.0. PXF enables you to access external HDFS and Hive data stores from a Greenplum Database external table, essentially providing an ANSI-compliant SQL interface to virtually any dataset. The Greenplum Database PXF Extension Framework is based on PXF from Apache HAWQ (incubating).

Note: PXF does not currently support filter predicate pushdown to HDFS and Hive.

For information about configuring and using the PXF Extension Framework in your Greenplum Database deployment, refer to Accessing HDFS and Hive Data with PXF in the Greenplum Database Administrator Guide.

New Parameters

Greenplum Database 5.1.0 includes these new server configuration parameters.

For information about Greenplum Database server configuration parameters, see the Greenplum Database Reference Guide.

gp_interconnect_debug_retry_interval

Specifies the interval, in seconds, to log Greenplum Database interconnect debugging messages when the server configuration parameter gp_log_interconnect is set to DEBUG. The default is 10 seconds.

The log messages contain information about the interconnect communication between Greenplum Database segment instance worker processes. The information can be helpful when debugging network issues between segment instances.

Value Range Default Set Classifications
1 =< Integer < 4096 10 master

session

reload

gp_log_interconnect

Controls the amount of information that is written to the log file about communication between Greenplum Database segment instance worker processes. The default value is terse. The log information is written to both the master and segment instance logs.

Increasing the amount of logging could affect performance and increase disk space usage.

Value Range Default Set Classifications

off

terse

verbose

debug

terse master

session

reload

optimizer_join_arity_for_associativity_commutativity

The value is an optimization hint to limit the number of join associativity and join commutativity transformations explored during query optimization. The limit controls the alternative plans that GPORCA considers during query optimization. For example, the default value of 7 is an optimization hint for GPORCA to stop exploring join associativity and join commutativity transformations when an n-ary join operator has more than 7 children during optimization.

For a query with a large number of joins, specifying a lower value improves query performance by limiting the number of alternate query plans that GPORCA evaluates. However, setting the value too low might cause GPORCA to generate a query plan that performs sub-optimally.

This parameter can be set for a database system or a session.

Value Range Default Set Classifications
integer > 0 7 local

system

reload

Experimental Features

Because Pivotal Greenplum Database is based on the open source Greenplum Database project code, it includes several experimental features to allow interested developers to experiment with their use on development systems. Feedback will help drive development of these features, and they may become supported in future versions of the product.

Warning: Experimental features are not recommended or supported for production deployments. These features may change in or be removed from future versions of the product based on further testing and feedback. Moreover, any features that may be visible in the open source code but that are not described in the product documentation should be considered experimental and unsupported for production use.
Key experimental features in Greenplum Database 5.1.0 include:
  • Recursive WITH Queries (Common Table Expressions). See WITH Queries (Common Table Expressions).
  • Resource Management with Resource Groups. See Using Resource Groups.
    Important: Significant Greenplum Database performance degradation has been observed when enabling resource group-based workload management (an experimental feature) on RedHat 6.x, CentOS 6.x, and SuSE 11 systems. This issue is caused by a Linux cgroup kernel bug. This kernel bug has been fixed in CentOS 7.x and Red Hat 7.x systems.

    If you use RedHat 6 and the performance with resource groups is acceptable for your use case, upgrade your kernel to version 2.6.32-696 or higher to benefit from other fixes to the cgroups implementation.

    SuSE 11 does not have a kernel version that resolves this issue. See known issue 149789783.

    See Known Issues and Limitations for information about issues associated with resource groups.

  • The pgAdmin 4 tool is compatible with Pivotal Greenplum Database 5.1.0, but is not officially supported by Pivotal. You can use pgAdmin 4 to query Greenplum Database tables and to view Greenplum Database table DDL. Append-optimized tables are expected to work with pgAdmin 4.

    Some functionality is either unavailable or not fully implemented when used with Greenplum Database:

    • Partitioned tables can cause pgAdmin 4 performance to slow down.
    • External tables are not supported.
    • The graphical EXPLAIN feature of the Query tool is not supported.

    See https://www.pgadmin.org/ for information about installing and using the tool.

Changed Features

Greenplum Database 5.1.0 includes these feature changes.
  • The Greenplum Database utilities gpcrondump and gpdbrestore have changed.
    • If you specify a gpcrondump option to back up schemas, -s, -S, -schema-file, --schema-exclude-file, procedural languages that are installed in the database are also backed up even though they are not schema specific. External items such as shared libraries that are used by a language are not backed up.
    • If you specify the gpdbrestore option -S to restore a schema, procedural languages that are in a database backup are also restored even though they are not schema specific.
      Note: When restoring procedural languages, gprestore logs non-fatal messages if the languages that are being restored already exist in the target database.
    In previous releases, functions, that are schema specific, were backed up and restored for a schema specific operation. However, only the PL/Java language was backed up and restored.
  • The meta-command \copy for the Greenplum Database psql utility has changed. When copying data with \copy, enclosing the source or destination with single quotes (') indicates that the source or destination is a file. For example, this \copy command copies data to the file stdout.
    \copy testbl to 'stdout'

    Without the single quotes, the output is sent to command output.

    For information about the psql utility, see the Greenplum Database Utility Guide.

  • MADlib 1.12 is available with Greenplum Database 5.1.0. When using MADlib 1.12, the server configuration parameter optimizer_control can be set to either on (the default) or off.

    When using MADlib 1.11 with Greenplum Database, some MADlib functions would not work if the parameter was set to off.

    For information about the MADlib extension for analytics, see the Greenplum Database Reference Guide.

  • The Greenplum Database PL/R extension package version has been updated to 2.3.1.

    After installing the PL/R extension package, the directory is $GPHOME/ext/R-3.3.3. The directory name matches the version of R that is installed, 3.3.3. In the previous package, the installation directory was $GPHOME/ext/R-3.3.1

    For information about the PL/R extension package, see the Greenplum Database Reference Guide.
  • The Greenplum Database Client and Loader Tools for AIX 7 have been updated for Greenplum Database 5.1.0.
    Note: The Clients for AIX7 package is not available for download with Greenplum Database 5.1.0; download this package from the Greenplum Database 5.0.0 file collection on Pivotal Network.
  • A PostGIS extension package for Red Hat Enterprise Linux 6.x is available on Pivotal Network.

Deprecated Features

Pivotal Greenplum 5.1.0 deprecates these unused catalog tables. They will be removed in a future release.
  • gp_configuration
  • gp_db_interfaces
  • gp_interfaces

Differences Compared to Open Source Greenplum Database

Pivotal Greenplum 5.x includes all of the functionality in the open source Greenplum Database project and adds:
  • Product packaging and installation script.
  • Support for QuickLZ compression. QuickLZ compression is not provided in the open source version of Greenplum Database due to licensing restrictions.
  • Support for managing Greenplum Database using Pivotal Greenplum Command Center.
  • Support for monitoring and managing queries with Pivotal Greenplum Workload Manager.
  • Support for full text search and text analysis using Pivotal GPText.

Supported Platforms

Pivotal Greenplum 5.1.0 runs on the following platforms:

  • Red Hat Enterprise Linux 64-bit 7.x (See the following Note)
  • Red Hat Enterprise Linux 64-bit 6.x
  • SuSE Linux Enterprise Server 64-bit 11 SP4 (See the following Note)
  • CentOS 64-bit 7.x
  • CentOS 64-bit 6.x
Important: Significant Greenplum Database performance degradation has been observed when enabling resource group-based workload management (an experimental feature) on RedHat 6.x, CentOS 6.x, and SuSE 11 systems. This issue is caused by a Linux cgroup kernel bug. This kernel bug has been fixed in CentOS 7.x and Red Hat 7.x systems.

If you use RedHat 6 and the performance with resource groups is acceptable for your use case, upgrade your kernel to version 2.6.32-696 or higher to benefit from other fixes to the cgroups implementation.

SuSE 11 does not have a kernel version that resolves this issue. See known issue 149789783.

Note: For Greenplum Database that is installed on Red Hat Enterprise Linux 7.x or CentOS 7.x prior to 7.3, an operating system issue might cause Greenplum Database that is running large workloads to hang in the workload. The Greenplum Database issue is caused by Linux kernel bugs.

RHEL 7.3 and CentOS 7.3 resolves the issue.

Note: Greenplum Database on SuSE Linux Enterprise systems does not support the PL/Perl procedural language or the gpmapreduce tool.
Greenplum Database support on Dell EMC DCA.
  • Pivotal Greenplum 5.1.0, is supported on DCA systems that are running DCA software version 3.4 or greater.
    Note: Pivotal Greenplum 5.1.0 is not supported on DCA systems with FIPS enabled.

    Only Pivotal Greenplum Database is supported on DCA systems. Open source versions of Greenplum Database are not supported.

Pivotal Greenplum 5.1.0 supports these Java versions:
  • 8.xxx
  • 7.xxx

Greenplum Database 5.10 software that runs on Linux systems uses OpenSSL 1.0.2l (with FIPS 2.0.16), cURL 7.54, OpenLDAP 2.4.44, and Python 2.7.12.

Greenplum Database client software that runs on Windows and AIX systems uses OpenSSL 0.9.8zg.

The Greenplum Database s3 external table protocol supports these data sources:

Pivotal Greenplum 5.1.0 supports Data Domain Boost on Red Hat Enterprise Linux.

This table lists the versions of Data Domain Boost SDK and DDOS supported by Pivotal Greenplum 5.x.

Table 1. Data Domain Boost Compatibility
Pivotal Greenplum Data Domain Boost DDOS
5.1.0

5.0.0

3.3, 3.0.0.3 6.0 (all versions)
Note: In addition to the DDOS versions listed in the previous table, Pivotal Greenplum 5.0.0 and later supports all minor patch releases (fourth digit releases) later than the certified version.
Note: Pivotal Greenplum 5.1.0 does not support the ODBC driver for Cognos Analytics V11.

Connecting to IBM Cognos software with an ODBC driver is not supported. Greenplum Database supports connecting to IBM Cognos software with the DataDirect JDBC driver for Pivotal Greenplum. This driver is available as a download from Pivotal Network.

Veritas NetBackup

Pivotal Greenplum 5.1.0 supports backup with Veritas NetBackup version 7.7.3. See Backing Up Databases with Veritas NetBackup.

Supported Platform Notes

The following notes describe platform support for Pivotal Greenplum. Please send any questions or comments to Pivotal Support at https://support.pivotal.io.

  • The only file system supported for running Greenplum Database is the XFS file system. All other file systems are explicitly not supported by Pivotal.
  • Greenplum Database is supported on all 1U and 2U commodity servers with local storage. Special purpose hardware that is not commodity may be supported at the full discretion of Pivotal Product Management based on the general similarity of the hardware to commodity servers.
  • Greenplum Database is supported on network or shared storage if the shared storage is presented as a block device to the servers running Greenplum Database and the XFS file system is mounted on the block device. Network file systems are not supported. When using network or shared storage, Greenplum Database mirroring must be used in the same way as with local storage, and no modifications may be made to the mirroring scheme or the recovery scheme of the segments. Other features of the shared storage such as de-duplication and/or replication are not directly supported by Pivotal Greenplum Database, but may be used with support of the storage vendor as long as they do not interfere with the expected operation of Greenplum Database at the discretion of Pivotal.
  • Greenplum Database is supported when running on virtualized systems, as long as the storage is presented as block devices and the XFS file system is mounted for the storage of the segment directories.
  • A minimum of 10-gigabit network is required for a system configuration to be supported by Pivotal.
  • Greenplum Database is supported on Amazon Web Services (AWS) servers using either Amazon instance store (Amazon uses the volume names ephemeral[0-20]) or Amazon Elastic Block Store (Amazon EBS) storage. If using Amazon EBS storage the storage should be RAID of Amazon EBS volumes and mounted with the XFS file system for it to be a supported configuration.
  • For Red Hat Enterprise Linux 7.2 or CentOS 7.2, the default systemd setting RemoveIPC=yes removes IPC connections when non-system users logout. This causes the Greenplum Database utility gpinitsystem to fail with semaphore errors. To avoid this issue, see "Setting the Greenplum Recommended OS Parameters" in the Greenplum Database Installation Guide.

Pivotal Greenplum Tools and Extensions Compatibility

Client Tools

Greenplum releases a number of client tool packages on various platforms that can be used to connect to Greenplum Database and the Greenplum Command Center management tool. The following table describes the compatibility of these packages with this Greenplum Database release.

Tool packages are available from Pivotal Network.

Table 2. Pivotal Greenplum 5.0.0 Tools Compatibility
Tool Description of Contents Tool Version Server Versions
Pivotal Greenplum Clients Greenplum Database Command-Line Interface (psql) 5.0 5.0.0
Pivotal Greenplum Loaders Greenplum Database Parallel Data Loading Tools (gpfdist, gpload) 5.0 5.0.0
Pivotal Greenplum Command Center Greenplum Database management tool 3.2.2 and later 5.0.0
Pivotal Greenplum Workload Manager Greenplum Database query monitoring and management tool 1.8.0 5.0.0

The Greenplum Database Client Tools and Load Tools are supported on the following platforms:

  • AIX 7.2 (64-bit) (Client and Load Tools only)
    Note: The Clients for AIX7 package is not available for download with Greenplum Database 5.1.0; download this package from the Greenplum Database 5.0.0 file collection on Pivotal Network.
  • Red Hat Enterprise Linux x86_64 6.x (RHEL 6)
  • SuSE Linux Enterprise Server x86_64 SLES 11
  • Windows 10 (32-bit and 64-bit)
  • Windows 8 (32-bit and 64-bit)
  • Windows Server 2012 (32-bit and 64-bit)
  • Windows Server 2012 R2 (32-bit and 64-bit)
  • Windows Server 2008 R2 (32-bit and 64-bit)

Extensions

Table 3. Pivotal Greenplum 5.0.0 Extensions Compatibility
Pivotal Greenplum Extension Versions
MADlib machine learning for Greenplum Database 5.0.x1 MADlib 1.12, 1.11
PL/Java for Greenplum Database 5.0.x PL/Java 1.4.2, 1.4.0
PL/R for Greenplum Database 5.0.x 2.3.0, 2.2.0
PostGIS Spatial and Geographic Objects for Greenplum Database 5.0.x 2.1.5
Python Data Science Module Package for Greenplum Database 5.0.x2 1.0.0, 1.1.0
R Data Science Library Package for Greenplum Database 5.0.x3 1.0.0
Note: 1For information about MADlib support and upgrade information, see the MADlib FAQ.

2For information about the Python package, including the modules provided, see the Python Data Science Module Package.

3For information about the R package, including the libraries provided, see the R Data Science Library Package.

These Greenplum Database extensions are installed with Pivotal Greenplum Database
  • Fuzzy String Match Extension
  • PL/Python Extension
  • pgcrypto Extension

Pivotal Greenplum Data Connectors

  • PXF Extension Framework - The PXF Extension Framework, integrated with Greenplum Database 5.1.0, provides access to HDFS and Hive external data stores. Refer to Accessing HDFS and Hive Data with PXF in the Greenplum Database Administrator Guide for PXF configuration and usage information.
  • Greenplum-Spark Connector - The Pivotal Greenplum-Spark Connector supports high speed, parallel data transfer from Greenplum Database to an Apache Spark cluster. The Greenplum-Spark Connector is available as a separate download from Pivotal Network. Refer to the Greenplum-Spark Connector documentation for compatibility and usage information.
  • Gemfire-Greenplum Connector - The Pivotal Gemfire-Greenplum Connector supports the transfer of data between a GemFire region and a Greenplum Database cluster. The Gemfire-Greenplum Connector is available as a separate download from Pivotal Network. Refer to the Gemfire-Greenplum Connector documentation for compatibility and usage information.

Pivotal GPText Compatibility

Pivotal Greenplum Database 5.1.0 is compatible with Pivotal GPText version 2.1.3 and later.

Pivotal Greenplum Command Center

See the Greenplum Command Center documentation site for GPCC and Greenplum Workload Manager compatibility information.

Hadoop Distribution Compatibility

Greenplum Database provides access to HDFS with gphdfs and PXF.

gphdfs Hadoop Distribution Compatibility

The supported Hadoop distributions for gphdfs are listed below:

Table 4. Supported gphdfs Hadoop Distributions
Hadoop Distribution Version gp_hadoop_ target_version
Cloudera CDH 5.2, 5.3, 5.4.x - 5.8.x cdh5
CDH 5.0, 5.1 cdh4.1
Hortonworks Data Platform HDP 2.1, 2.2, 2.3, 2.4, 2.5 hdp2
MapR MapR 4.x, MapR 5.x gpmr-1.2
Apache Hadoop 2.x hadoop2
Note: MapR requires the MapR client. For MapR 5.x, only TEXT and CSV are supported in the FORMAT clause of the CREATE EXTERNAL TABLE command.

PXF Hadoop Distribution Compatibility

The PXF Extension Framework supports Cloudera, Hortonworks Data Platform, and generic Apache Hadoop distributions.

Upgrading to Greenplum Database 5.1.0

The upgrade path supported for this release is Greenplum Database 5.0.0 to Greenplum Database 5.1.0. Upgrading a Greenplum Database 4.3.x release to Pivotal Greenplum 5.x is not supported. See Migrating Data to Pivotal Greenplum 5.x.

Note: If you are upgrading Greenplum Database on a DCA system, see Pivotal Greenplum on DCA Systems.

Prerequisites

Before starting the upgrade process, Pivotal recommends performing the following checks.

  • Verify the health of the Greenplum Database host hardware, and that you verify that the hosts meet the requirements for running Greenplum Database. The Greenplum Database gpcheckperf utility can assist you in confirming the host requirements.
    Note: If you need to run the gpcheckcat utility, Pivotal recommends running it a few weeks before the upgrade and that you run gpcheckcat during a maintenance period. If necessary, you can resolve any issues found by the utility before the scheduled upgrade.

    The utility is in $GPHOME/bin. Pivotal recommends that Greenplum Database be in restricted mode when you run gpcheckcat utility. See the Greenplum Database Utility Guide for information about the gpcheckcat utility.

    If gpcheckcat reports catalog inconsistencies, you can run gpcheckcat with the -g option to generate SQL scripts to fix the inconsistencies.

    After you run the SQL scripts, run gpcheckcat again. You might need to repeat the process of running gpcheckcat and creating SQL scripts to ensure that there are no inconsistencies. Pivotal recommends that the SQL scripts generated by gpcheckcat be run on a quiescent system. The utility might report false alerts if there is activity on the system.

    Important: If the gpcheckcat utility reports errors, but does not generate a SQL script to fix the errors, contact Pivotal support. Information for contacting Pivotal Support is at https://support.pivotal.io.
  • During the migration process from Greenplum Database 5.0.0, a backup is made of some files and directories in $MASTER_DATA_DIRECTORY. Pivotal recommends that files and directories that are not used by Greenplum Database be backed up, if necessary, and removed from the $MASTER_DATA_DIRECTORY before migration. For information about the Greenplum Database migration utilities, see the Greenplum Database Documentation.

For information about supported versions of Greenplum Database extensions, see Pivotal Greenplum Tools and Extensions Compatibility.

If you are utilizing Data Domain Boost, you have to re-enter your DD Boost credentials after upgrading from Greenplum Database 4.2.x.x to 4.3.x.x as follows:

gpcrondump --ddboost-host ddboost_hostname --ddboost-user ddboost_user
  --ddboost-backupdir backup_directory
Note: If you do not reenter your login credentials after an upgrade, your backup will never start because the Greenplum Database cannot connect to the Data Domain system. You will receive an error advising you to check your login credentials.

Upgrading from 5.0.0 to 5.1.0

An upgrade from 5.0.0 to 5.1.0 involves stopping Greenplum Database, updating the Greenplum Database software binaries, upgrading and restarting Greenplum Database. If you are using Greenplum Database extension packages there are additional requirements. See Prerequisites in the previous section.

Note: If the Greenplum Command Center database gpperfmon is installed in your Greenplum Database system, the migration process changes the distribution key of the Greenplum Database log_alert_* tables to the logtime column. The redistribution of the table data might take some time the first time you start Greenplum Database after migration. The change occurs only the first time you start Greenplum Database after a migration.
  1. Log in to your Greenplum Database master host as the Greenplum administrative user:
    $ su - gpadmin
  2. Perform a smart shutdown of your current Greenplum Database 5.x system (there can be no active connections to the database). This example uses the -a option to disable confirmation prompts:
    $ gpstop -a
  3. Run the binary installer for 5.1.0 on the Greenplum Database master host.
    When prompted, choose an installation location in the same base directory as your current installation. For example:
    /usr/local/greenplum-db-5.1.0

    If you install Greenplum Database with the rpm (as root), the installation directory is /usr/local/greenplum-db-5.1.0.

    For the rpm installation, update the permissions for the new installation. For example, run this command as root to change user and group of the installed files to gpadmin.

    # chown -R gpadmin:gpadmin /usr/local/greenplum*
  4. If your Greenplum Database deployment uses LDAP authentication, manually edit the /usr/local/greenplum-db/greenplum_path.sh file to add the line:
    export LDAPCONF=/etc/openldap/ldap.conf
  5. Edit the environment of the Greenplum Database superuser (gpadmin) and make sure you are sourcing the greenplum_path.sh file for the new installation. For example change the following line in .bashrc or your chosen profile file:
    source /usr/local/greenplum-db-5.0.0/greenplum_path.sh

    to:

    source /usr/local/greenplum-db-5.1.0/greenplum_path.sh

    Or if you are sourcing a symbolic link (/usr/local/greenplum-db) in your profile files, update the link to point to the newly installed version. For example:

    $ rm /usr/local/greenplum-db
    $ ln -s /usr/local/greenplum-db-5.1.0 /usr/local/greenplum-db
  6. Source the environment file you just edited. For example:
    $ source ~/.bashrc
  7. Run the gpseginstall utility to install the 5.1.0 binaries on all the segment hosts specified in the hostfile. For example:
    $ gpseginstall -f hostfile
  8. Use the Greenplum Database gppkg utility to install Greenplum Database extensions. If you were previously using any Greenplum Database extensions such as pgcrypto, PL/R, PL/Java, PL/Perl, and PostGIS, download the corresponding packages from Pivotal Network, and install using this utility. See the Greenplum Database Documentation for gppkg usage details.
  9. After all segment hosts have been upgraded, you can log in as the gpadmin user and restart your Greenplum Database system:
    # su - gpadmin
    $ gpstart
  10. If you are utilizing Data Domain Boost, you have to re-enter your DD Boost credentials after upgrading from Greenplum Database 5.0.0 to 5.1.0 as follows:
    gpcrondump --ddboost-host ddboost_hostname --ddboost-user ddboost_user
      --ddboost-backupdir backup_directory
Note: If you do not reenter your login credentials after an upgrade, your backup will never start because the Greenplum Database cannot connect to the Data Domain system. You will receive an error advising you to check your login credentials.

Troubleshooting a Failed Upgrade

If you experience issues during the migration process and have active entitlements for Greenplum Database that were purchased through Pivotal, contact Pivotal Support. Information for contacting Pivotal Support is at https://support.pivotal.io.

Be prepared to provide the following information:

  • A completed Upgrade Procedure.
  • Log output from gpcheckcat (located in ~/gpAdminLogs)

Migrating Data to Pivotal Greenplum 5.x

Upgrading a Pivotal Greenplum Database 4.x system directly to Pivotal Greenplum Database 5.x is not supported.

You can migrate existing data to Greenplum Database 5.x using standard backup and restore procedures (gpcrondump and gpdbrestore) or by using gptransfer if both clusters will run side-by-side.

Follow these general guidelines for migrating data:
  • Make sure that you have a complete backup of all data in the Greenplum Database 4.3.x cluster, and that you can successfully restore the Greenplum Database 4.3.x cluster if necessary.
  • You must install and initialize a new Greenplum Database 5.x cluster using the version 5.x gpinitsystem utility.
    Note: Unless you modify file locations manually, gpdbrestore only supports restoring data to a cluster that has an identical number of hosts and an identical number of segments per host, with each segment having the same content_id as the segment in the original cluster. If you initialize the Greenplum Database 5.x cluster using a configuration that is different from the version 4.3 cluster, then follow the steps outlined in Restoring to a Different Greenplum System Configuration to manually update the file locations.
  • If you intend to install Greenplum Database 5.x on the same hardware as your 4.3.x system, you will need enough disk space to accommodate over 5 times the original data set (2 full copies of the primary and mirror data sets, plus the original backup data in ASCII format). Keep in mind that the ASCII backup data will require more disk space than the original data, which may be stored in compressed binary format. Offline backup solutions such as Dell EMC Data Domain or Veritas NetBackup can reduce the required disk space on each host.
  • Use the version 5.x gpdbrestore utility to load the 4.3.x backup data into the new cluster.
  • If the Greenplum Database 5.x cluster resides on separate hardware from the 4.3.x cluster, you can optionally use the version 5.x gptransfer utility to migrate the 4.3.x data. You must initiate the gptransfer operation from the version 5.x cluster, pulling the older data into the newer system.

    On a Greenplum Database system with FIPS enabled, validating table data with MD5 (specifying the option --validate=md5) is not available. Use the option sha256 to validate table data.

    Validating table data with SHA-256 (specifying the option --validate=sha256) requires the Greenplum Database pgcrypto extension. The extension is included with Pivotal Greenplum 5.x. The extension package must be installed on supported Pivotal Greenplum 4.3.x systems. Support for pgcrypto functions in a Greenplum 4.3.x database is not required.

  • After migrating data you may need to modify SQL scripts, administration scripts, and user-defined functions as necessary to account for changes in Greenplum Database version 5.x. Look for Upgrade Action Required entries in the Pivotal Greenplum 5.0.0 Release Notes for features that may necessitate post-migration tasks.

Pivotal Greenplum on DCA Systems

On supported Dell EMC DCA systems, you can install Pivotal Greenplum 5.1.0, or you can upgrade from Pivotal Greenplum 5.x to 5.1.0.

Only Pivotal Greenplum Database is supported on DCA systems. Open source versions of Greenplum Database are not supported.

Important: Upgrading Pivotal Greenplum Database 4.3.x to Pivotal Greenplum 5.1.0 is not supported. See Migrating Data to Pivotal Greenplum 5.x.

Installing the Pivotal Greenplum 5.1.0 Software Binaries on DCA Systems

Important: This section is for installing Pivotal Greenplum 5.1.0 only on DCA systems. Also, see the information on the DELL EMC support site (requires login).

For information about installing Pivotal Greenplum on non-DCA systems, see the Greenplum Database Installation Guide.

Prerequisites

  • Ensure your DCA system supports Pivotal Greenplum 5.1.0. See Supported Platforms.
    Note: GPDB 5.1.0 is not supported on DCA systems with FIPS enabled.
  • Ensure Greenplum Database 4.3.x is not installed on your system.

    Installing Pivotal Greenplum 5.1.0 on a DCA system with an existing Greenplum Database 4.3.x installation is not supported. For information about uninstalling Greenplum Database software, see your Dell EMC DCA documentation.

Installing Pivotal Greenplum 5.1.0

  1. Download or copy the Greenplum Database DCA installer file greenplum-db-appliance-5.1.0-RHEL6-x86_64.bin to the Greenplum Database master host.
  2. As root, run the DCA installer for 5.1.0 on the Greenplum Database master host and specify the file hostfile that lists all hosts in the cluster, one host name per line. If necessary, copy hostfile to the directory containing the installer before running the installer.

    This example command runs the installer for Greenplum Database 5.1.0.

    # ./greenplum-db-appliance-5.1.0-RHEL6-x86_64.bin hostfile

Upgrading from 5.x to 5.1.0 on DCA Systems

Upgrading Pivotal Greenplum from 5.x to 5.1.0 on a Dell EMC DCA system involves stopping Greenplum Database, updating the Greenplum Database software binaries, and restarting Greenplum Database.

Important: This section is only for upgrading to Pivotal Greenplum 5.1.0 on DCA systems. For information about upgrading on non-DCA systems, see Upgrading to Greenplum Database 5.1.0.
  1. Log in to your Greenplum Database master host as the Greenplum administrative user (gpadmin):
    # su - gpadmin
  2. Download or copy the installer file greenplum-db-appliance-5.1.0-RHEL6-x86_64.bin to the Greenplum Database master host.
  3. Perform a smart shutdown of your current Greenplum Database 5.x system (there can be no active connections to the database). This example uses the -a option to disable confirmation prompts:
    $ gpstop -a
  4. As root, run the Greenplum Database DCA installer for 5.1.0 on the Greenplum Database master host and specify the file hostfile that lists all hosts in the cluster. If necessary, copy hostfile to the directory containing the installer before running the installer.

    This example command runs the installer for Greenplum Database 5.1.0 for Redhat Enterprise Linux 6.x.

    # ./greenplum-db-appliance-5.1.0-RHEL6-x86_64.bin hostfile

    The file hostfile is a text file that lists all hosts in the cluster, one host name per line.

  5. Install Greenplum Database extension packages. For information about installing a Greenplum Database extension package, see gppkg in the Greenplum Database Utility Guide.
  6. After all segment hosts have been upgraded, you can log in as the gpadmin user and restart your Greenplum Database system:
    # su - gpadmin
    $ gpstart
  7. If you are utilizing Data Domain Boost, you have to re-enter your DD Boost credentials after upgrading to Greenplum Database 5.1.0 as follows:
    gpcrondump --ddboost-host ddboost_hostname --ddboost-user ddboost_user
      --ddboost-backupdir backup_directory
Note: If you do not reenter your login credentials after an upgrade, your backup will never start because the Greenplum Database cannot connect to the Data Domain system. You will receive an error advising you to check your login credentials.

Resolved Issues

The listed issues that are resolved in Pivotal Greenplum Database 5.1.0

For issues resolved in prior 5.x releases, refer to the corresponding release notes. Release notes are available from the Pivotal Greenplum page on Pivotal Network or on the Pivotal Greenplum Database documentation site at Release Notes.

29069 - Query Optimizer
GPORCA query optimization performance was poor for queries that contain an IN operator with a large number of values and the values require an implicit CAST.

This issue has been resolved. Query optimization performance has been improved for the specified type of queries.

29061 - gpcheckcat
In some cases, the Greenplum Database gpcheckat utility returned an error stating that a value was out of range for type integer when the utility did not handle large OID values correctly (relfile nodes exceeding 231, more than 2 billion).

This issue has been resolved. Now the utility handles large OID values correctly.

29042 - Query Planner
The legacy query optimizer generated a Greenplum Database PANIC when executing some aggregation queries. The queries contain column aliases with the same name as the table columns, the queries contain subqueries that reference the column alias, and grouping is applied on the column alias. In some cases, the legacy optimizer caused a PANIC when it attempted to generate a plan that contained a inconsistent target list for the aggregation plan.

This issue has been resolved. The legacy optimizer generates the correct query plan for the specified type of aggregation queries.

29039 - Query Optimizer
GPORCA generated a PANIC when a query attempted to use an index on a table column after the table was altered by dropping columns that are listed earlier in the table definition. For example, the index is defined on column 10, and columns 3 and 4 were deleted from the table. GPORCA did not correctly determine the index associated with the column.

This issue has been resolved. Now GPORCA correctly determines the column index in the specified situation.

29030 - Query Optimizer
For queries that contain a large number of joins, GPORCA spent a significant amount of time evaluating all possible multiple join combinations to determine the most efficient join tree to use in the query plan.

This issue has been resolved. The maximum number of join combinations that are evaluated has been reduced. See GPORCA Performance Enhancements.

29025 - Backup and Restore
Some restore operations performed by the Greenplum Database utility gpdbrestore that used DD Boost failed because of issues parsing DD Boost path information in the backup report file.

This issue has been resolved. Now the utility correctly parses DD Boost path information.

29009 - gprecoverseg
Running the Greenplum Database gprecoverseg with the option -r to rebalance segments did not return errors if the operation generated errors on some segments and completed successfully on other segments.

This issue has been resolved. Now if any errors occur during a rebalance operation, the user is notified and is referred to the logs for details.

29005 - Query Planner
For some DELETE or UPDATE commands that contain subqueries that reference the same table, the Greenplum Database legacy query optimizer generated a plan that did not correctly redistribute some rows to the appropriate segments. This caused a Greenplum Database PANIC when attempting to update or delete a row on an incorrect segment.

This issue has been resolved. Now the plan generated by the legacy optimizer correctly redistributes rows in the specified situation.

26993 - gptransfer
The Greenplum Database utility gptransfer failed intermittently when transferring large amounts of data. When exporting data, Greenplum Database did not properly check whether a write operation had completed. This might cause intermittent failures when exporting large amounts of data in a single operation.

This issue has been resolved. Checking the completion of write operations during data export has been improved.

151498144 - gpcrondump
When backing up and restoring database objects with the Greenplum Database utilities gpcrondump and gpdbrestore and specifying a schema level option such as the gpcrondump option -s or -S, functions (which are schema specific) were backed up and restored. However, only the PL/Java language was backed up and restored.

This issue has been resolved. Now for schema level back up and restore operations, procedural languages are included. See Changed Features.

151344210 - Query Planner

For an aggregation query, the Greenplum Database legacy query optimizer might return incorrect results when the aggregation is over a column, a CAST is defined on the column, and the original column (without the CAST) is in a GROUP BY clause. The legacy optimizer did not handle the CAST correctly in the query.

This issue has been resolved. Now the legacy optimizer handles the CAST correctly for the specified queries.

151341000 - Backup and Restore
In some cases for a Greenplum Database system with 10 or more segment instances, the Greenplum Database gpdbrestore utility did not perform a restore operation correctly. When restoring from a version 4.3.11.3 or earlier backup to version 4.3.12.0 or later, the utility did not handle the backup filenames correctly. The format of backup filenames changed in 4.3.12.0.

This issue has been resolved. Now the utility, handles the backup filenames correctly in the specified situation.

151016845 - Query Optimizer
For queries involving catalog tables, GPORCA falls back to Greenplum Database legacy optimizer. The fallback event was set to the log severity level LOG and was captured in the Greenplum Database logs. In some cases, the messages bloated the Greenplum Database log files. For example, when a workload executed a large number of queries against catalog tables.

The issue has been resolved. Now the log level for the fallback event is DEBUG1. This level does not log the fallback events by default. The log level can be set to log the events for debugging purposes.

150988530 - Query Optimizer
In some cases, GPORCA generated incorrect results for a query when the query predicate contains a [NOT] EXISTS clause that contains both an aggregate function and a GROUP BY clause where a grouping column is a outer reference that is not in the aggregate function. GPORCA did not process the predicate correctly.

This issue has now been resolved. Now GPORCA handles the specified type of predicate correctly.

150906152 - Query Optimizer
For some queries that contain a correlated subquery and the subquery contains a window function, GPORCA generated a query plan that returned incorrect results. The plan incorrectly performed the join after the window function was applied.

This issue has been resolved. Now the plan generated by GPORCA correctly applies the window function before the join.

149513137 - gpstart
For Greenplum Database 5.0.0 systems installed on Linux hosts, the Greenplum system did not start when the host operating system had FIPS mode enabled. This issue was caused by the use of the Python MD5 hash algorithm by Greenplum Database.

This issue has been resolved. Greenplum Database 5.1.0 uses the Python SHA-256 hash algorithm.

Known Issues and Limitations

Pivotal Greenplum 5.x has these limitations:

  • Upgrading a Greenplum Database 4.3.x release to Pivotal Greenplum 5.x is not supported. See Migrating Data to Pivotal Greenplum 5.x.
  • These features are considered work-in-progress and are experimental features. Pivotal does not support using experimental features in a production environment.
    • Greenplum Database resource groups
    • Recursive WITH Queries (Common Table Expressions)
  • Greenplum Database 4.3.x packages are not compatible with Pivotal Greenplum 5.x.

The following table lists key known issues in Pivotal Greenplum 5.x.

Table 5. Key Known Issues in Pivotal Greenplum 5.x
Issue Category Description
149628145 DCA Pivotal Greenplum 5.x is not supported on DCA systems with FIPS enabled.
151468673 Client and Loader Tools For the Greenplum Database 5.0.0 Client and Loader tools for RHEL and SLES platforms, the default installation path is not valid. The default installation path contains a : that is not handled properly.

Workaround: When installing a Client or Loader tool, specify a installation path, for example /usr/local/greenplum-loaders-5.0.0. Do not use the default installation directory.

151135629 COPY command When the ON SEGMENT clause is specified, the COPY command does not support specifying a SELECT statement in the COPY TO clause. For example, this command is not supported.
COPY (SELECT * FROM testtbl) TO '/tmp/mytst<SEGID>' ON SEGMENT
29064 Storage: DDL The money data type accepts out-of-range values as negative values, and no error message is displayed.

Workaround: Use only in-range values for the money data type (32-bit for Greenplum Database 4.x, or 64-bit for Greenplum Database 5.x). Or, use an alternative data type such as numeric or decimal.

3290 JSON The to_json() function is not implemented as a callable function. Attempting to call the function results in an error. For example:
tutorial=# select to_json('Fred said "Hi."'::text); 
ERROR: function to_json(text) does not exist
LINE 1: select to_json('Fred said "Hi."'::text);
^
HINT: No function matches the given name and argument types. 
You might need to add explicit type casts.

Workaround: Greenplum Database invokes to_json() internally when casting to the json data type, so perform a cast instead. For example: SELECT '{"foo":"bar"}'::json; Greenplum Database also provides the array_to_json() and row_to_json() functions.

148119917 Resource Groups (Experimental) Testing of the experimental resource groups feature has found that a kernel panic can occur when using the default kernel in RHEL/CentOS system. The problem occurs due to a problem in the kernel cgroups implementation, and results in a kernel panic backtrace similar to:
[81375.325947] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      [81375.325986] IP: [<ffffffff812f94b1>] rb_next+0x1/0x50 [81375.326014] PGD 0 [81375.326025]
      Oops: 0000 [#1] SMP [81375.326041] Modules linked in: veth ipt_MASQUERADE
      nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype
      iptable_filter xt_conntrack nf_nat nf_conntrack bridge stp llc intel_powerclamp coretemp
      intel_rapl dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio kvm_intel kvm crc32_pclmul
      ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt
      iTCO_vendor_support ses enclosure ipmi_ssif pcspkr lpc_ich sg sb_edac mfd_core edac_core
      mei_me ipmi_si mei wmi ipmi_msghandler shpchp acpi_power_meter acpi_pad ip_tables xfs
      libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect crct10dif_pclmul
      sysimgblt crct10dif_common crc32c_intel drm_kms_helper ixgbe ttm mdio ahci igb libahci drm ptp
      pps_core libata dca i2c_algo_bit [81375.326369]  i2c_core megaraid_sas dm_mirror
      dm_region_hash dm_log dm_mod [81375.326396] CPU: 17 PID: 0 Comm: swapper/17 Not tainted
      3.10.0-327.el7.x86_64 #1 [81375.326422] Hardware name: Cisco Systems Inc
      UCSC-C240-M4L/UCSC-C240-M4L, BIOS C240M4.2.0.8b.0.080620151546 08/06/2015 [81375.326459] task:
      ffff88140ecec500 ti: ffff88140ed10000 task.ti: ffff88140ed10000 [81375.326485] RIP:
      0010:[<ffffffff812f94b1>]  [<ffffffff812f94b1>] rb_next+0x1/0x50 [81375.326514] RSP:
      0018:ffff88140ed13e10  EFLAGS: 00010046 [81375.326534] RAX: 0000000000000000 RBX:
      0000000000000000 RCX: 0000000000000000 [81375.326559] RDX: ffff88282f1d4800 RSI:
      ffff88280bc0f140 RDI: 0000000000000010 [81375.326584] RBP: ffff88140ed13e58 R08:
      0000000000000000 R09: 0000000000000001 [81375.326609] R10: 0000000000000000 R11:
      0000000000000001 R12: ffff88280b0e7000 [81375.326634] R13: 0000000000000000 R14:
      0000000000000000 R15: 0000000000b6f979 [81375.326659] FS:  0000000000000000(0000)
      GS:ffff88282f1c0000(0000) knlGS:0000000000000000 [81375.326688] CS:  0010 DS: 0000 ES: 0000
      CR0: 0000000080050033 [81375.326708] CR2: 0000000000000010 CR3: 000000000194a000 CR4:
      00000000001407e0 [81375.326733] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
      0000000000000000 [81375.326758] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
      0000000000000400 [81375.326783] Stack: [81375.326792]  ffff88140ed13e58 ffffffff810bf539
      ffff88282f1d4780 ffff88282f1d4780 [81375.326826]  ffff88140ececae8 ffff88282f1d4780
      0000000000000011 ffff88140ed10000 [81375.326861]  0000000000000000 ffff88140ed13eb8
      ffffffff8163a10a ffff88140ecec500 [81375.326895] Call Trace: [81375.326912]
      [<ffffffff810bf539>] ? pick_next_task_fair+0x129/0x1d0 [81375.326940]  [<ffffffff8163a10a>]
      __schedule+0x12a/0x900 [81375.326961]  [<ffffffff8163b9e9>] schedule_preempt_disabled+0x29/0x70
      [81375.326987]  [<ffffffff810d6244>] cpu_startup_entry+0x184/0x290 [81375.327011]
      [<ffffffff810475fa>] start_secondary+0x1ba/0x230 [81375.327032] Code: e5 48 85 c0 75 07 eb 19 66
      90 48 89 d0 48 8b 50 10 48 85 d2 75 f4 48 8b 50 08 48 85 d2 75 eb 5d c3 31 c0 5d c3 0f 1f 44
      00 00 55 <48> 8b 17 48 89 e5 48 39 d7 74 3b 48 8b 47 08 48 85 c0 75 0e eb [81375.327157] RIP
      [<ffffffff812f94b1>] rb_next+0x1/0x50 [81375.327179]  RSP <ffff88140ed13e10> [81375.327192] CR2:
      0000000000000010

Workaround: Upgrade to the latest-available kernel for your Red Hat or CentOS release to avoid the above system panic.

149789783 Resource Groups Significant Pivotal Greenplum performance degradation has been observed when enabling resource group-based workload management (an experimental feature) on RedHat 6.x, CentOS 6.x, and SuSE 11 systems. This issue is caused by a Linux cgroup kernel bug. This kernel bug has been fixed in CentOS 7.x and Red Hat 7.x systems.

When resource groups are enabled on systems with an affected kernel, there can be a delay of 1 second or longer when starting a transaction or a query. The delay is caused by a Linux cgroup kernel bug where a synchronization mechanism called synchronize_sched is abused when a process is attached to a cgroup. See http://www.spinics.net/lists/cgroups/msg05708.html and https://lkml.org/lkml/2013/1/14/97 for more information.

The issue causes single attachment operations to take longer and also causes all concurrent attachments to be executed in sequence. For example, one process attachment could take about 0.01 second. When concurrently attaching 100 processes, the fastest process attachment takes 0.01 second and the slowest takes about 1 second. Pivotal Greenplum performs process attachments when a transaction or queries are started. So the performance degradation is dependent on concurrent started transactions or queries, and not related to concurrent running queries. Also Pivotal Greenplum has optimizations to bypass the rewriting when a QE is reused by multiple queries in the same session.

Workaround: This bug does not affect CentOS 7.x and Red Hat 7.x systems.

If you use RedHat 6 and the performance with resource groups is acceptable for your use case, upgrade your kernel to version 2.6.32-696 or higher to benefit from other fixes to the cgroups implementation.

SuSE 11 does not have a kernel version that resolves this issue.

149970753 Resource Groups The gpcrondump and gpdbrestore utilities do not backup or restore configuration information for the experimental Resource Groups feature.

Workaround: If you need to preserve your resource groups configuration, store the associated SQL commands in a separate file and manually apply the commands after restoring from a backup.

150906510 Backup and Restore Greenplum Database 4.3.15.0 and later backups contain the following line in the backup files:
SET gp_strict_xml_parse = false;

However, Greenplum Database 5.0.0 does not have a parameter named gp_strict_xml_parse. When you restore the 4.3 backup set to the 5.0.0 cluster, you may see the warning:

[WARNING]:-gpdbrestore finished but ERRORS were found, please check the restore report file for details

Also, the report file may contain the error:

ERROR:  unrecognized configuration parameter "gp_strict_xml_parse"

These warnings and errors do not affect the restoration procedure, and can be ignored.