Pivotal Greenplum 5.10.0 Release Notes

A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 5.x documentation.

Pivotal Greenplum 5.10.0 Release Notes

Welcome to Pivotal Greenplum 5.10.0

Pivotal Greenplum Database is a massively parallel processing (MPP) database server that supports next generation data warehousing and large-scale analytics processing. By automatically partitioning data and running parallel queries, it allows a cluster of servers to operate as a single database supercomputer performing tens or hundreds times faster than a traditional database. It supports SQL, MapReduce parallel processing, and data volumes ranging from hundreds of gigabytes, to hundreds of terabytes.

This document contains pertinent release information about Pivotal Greenplum Database 5.10.0. For previous versions of the release notes for Greenplum Database, go to Pivotal Greenplum Database Documentation. For information about Greenplum Database end of life, see the Pivotal Support Lifecycle Policy.

Pivotal Greenplum 5.x software is available for download from the Pivotal Greenplum page on Pivotal Network.

Pivotal Greenplum 5.x is based on the open source Greenplum Database project code.

Important: Pivotal Support does not provide support for open source versions of Greenplum Database. Only Pivotal Greenplum Database is supported by Pivotal Support.

Pivotal Greenplum 5.10.0 is a minor release that adds and changes several features and resolves some issues.

New Features

gpcopy Enhancements

In Greenplum 5.10.0, the gpcopy utility supports these options.
  • --dry-run - When you specify this option, gpcopy generates a list of the migration operations that would have been performed with the gpcopy options you specify. The data is not migrated.

    The information is displayed at the command line and written to the log file.

  • --no-distribution-check - Specify this option to disable table data distribution checking. In Greenplum 5.10.0, gpcopy performs data distribution checking to ensure data is distributed to segment instances correctly.
    Note: The utility does not support table data distribution checking when copying a partitioned table that is defined with a leaf table that is an external table or if a leaf table is defined with a distribution policy that is different from the root partitioned table.
    Warning: Before you perform a gpcopy operation with the --no-distribution-check option, ensure that you have a backup of the destination database and that the distribution policies of the tables that are being copied are the same in the source and destination database. Copying data into segment instances with incorrect data distribution can cause incorrect query results and can cause database corruption.

The gpcopy utility copies objects from databases in a source Greenplum Database system to databases in a destination Greenplum Database system. For information about gpcopy, see gpcopy.

Bypass Resource Group Concurrent Transaction Limits

The Greenplum Database server configuration parameter gp_resource_group_bypass enables or disables the enforcement of resource group concurrent transaction limits on Greenplum Database resources. The default value is false, enforce resource group transaction limits. Resource groups manage resources such as CPU, memory, and the number of concurrent transactions that are used by queries and external components such as PL/Container.

Note: This server configuration parameter is enforced only when resource group-based resource management is active.

You can set this parameter to true to bypass resource group concurrent transaction limitations so that a query can run immediately. For example, you can set the parameter to true for a session to run a system catalog query or a similar query that requires a minimal amount of resources.

When you set this parameter to true and a run a query, the query runs in this environment:
  • The query runs inside a resource group. The resource group assignment for the query does not change.
  • The query memory quota is approximately 10 MB per query. The memory is allocated from resource group shared memory or global shared memory. The query fails if there is not enough shared memory available to fulfill the memory allocation request.

This parameter can be set for a session. The parameter cannot be set within a transaction or a function.

Value Range Default Set Classifications
Boolean false session

gpload Performance Enhancement

The Greenplum Database gpload data loading utility supports the configuration file parameter FAST_MATCH that can improve gpload performance. When FAST_MATCH is set to true and the utility reuses external tables, gpload only searches Greenplum Database for matching external tables for reuse. The utility does not check the external table column names and column types to ensure that the table can be used for a gpload operation. This can improve gpload performance when the utility reuses external table and the database catalog table pg_attribute contains a large number of rows.

To reuse external table objects and staging table objects , REUSE_TABLES: true must also be specified in the gpload configuration file. If REUSE_TABLES is false or not specified and FAST_MATCH: true is specified, gpload returns a warning message.

The FAST_MATCH default value is false, the utility checks the external table definition column names and column types. The utility returns an error and quits if the column definitions are not compatible.

Note: If fast_match: true is specified in the gpload configuration file, the utility ignores the value of SCHEMA in the EXTERNAL section if the SCHEMA value is specified in the file. The utility uses the Greenplum Database default schema. The SCHEMA value specifies the schema of the external table database objects created by gpload.

For information about gpload, see gpload.

gpbackup S3 Plugin Enhancements

The gpbackup S3 plugin (an experimental feature) supports these new configuration options for S3 compatible data sources:
  • endpoint - Required if connecting to an S3 compatible service. Specify this option to connect to an S3 compatible service such as ECS. The plugin connects to the specified S3 endpoint (hostname or IP address) to access the S3 compatible data store.

    If this option is specified, the plugin ignores the region option and does not use AWS to resolve the endpoint. When this option is not specified, the plugin uses the region to determine the AWS S3 endpoint.

  • encryption Optional. Enable or disable the use of Secure Sockets Layer (SSL) when connecting to an S3 location. The default value is on, use connections that are secured with SSL. Set this option to off to connect to an S3 compatible service that is not configured to use SSL.

    Any value other than off is treated as on.

For information about the options, see Using the S3 Storage Plugin with gpbackup and gprestore.

Storage Plugin API Execution Scope

The Storage plugin framework API (an experimental feature) provides new arguments with a plugin setup or cleanup command that specify the execution scope (master host, segment host, or segment instance). The scope can be one of these values.

  • master - Execute the plugin command once on the master host.
  • segment_host - Execute the plugin command once on each of the segment hosts.
  • segment - Execute the plugin command once for each active segment instance on the host running the segment instance.

The Greenplum Database hosts and segment instances are based on the Greenplum Database configuration when the back up started.

For information about the API, see Backup/Restore Storage Plugin API.

Filter Pushdown for External Table Protocols

Greenplum Database 5.10.0 introduces the gp_external_enable_filter_pushdown server configuration parameter to enable or disable predicate filter pushdown for external table protocols, such as pxf. Filter pushdown can improve query performance by reducing the amount of data transferred between the external data source and Greenplum Database. This parameter is set to off by default; set it to on to enable filter pushdown.

Keep in mind that some data sources do not support filter pushdown. Also, filter pushdown may not be supported with certain data types or operators. If a query accesses a data source that does not support filter push-down for the query constraints, the query is instead executed without filter pushdown (the data is first transferred to Greenplum Database and then filtered).

Pivotal Greenplum Database supports filter pushdown only with the PXF Hive connector.

For more information about filter pushdown, see Using PXF to Read and Write External Data.

Pivotal Greenplum-Kafka Connector (Experimental)

Greenplum Database 5.10 provides integration with the new Pivotal Greenplum-Kafka Connector (experimental). The Pivotal Greenplum-Kafka Connector provides high speed, parallel data transfer from a Kafka cluster to a Pivotal Greenplum Database cluster for batch and streaming ETL operations. Refer to the Pivotal Greenplum-Kafka Connector (Experimental) documentation for more information about this feature.

Changed Features

Greenplum Database 5.10.0 includes these changed features.
  • By default, the gpcopy utility performs data distribution checking to ensure data is distributed to segment instances correctly when copying data from a source Greenplum Database system to a destination Greenplum Database system. If distribution checking fails, the table copy fails. The gpcopy option --no-distribution-check disables distribution checking. See New Features.
  • The Greenplum Database gphdfs protocol was tested to confirm support for Parquet versions 1.7.0 and later. Previous versions of the documentation incorrectly stated that only Parquet version 1.7.0 was supported.
  • The Pivotal Greenplum-Informatica Connector is no longer an experimental feature as of version 1.0.3. The connector supports high speed data transfer from an Informatica PowerCenter cluster to a Pivotal Greenplum Database cluster for batch and streaming ETL operations. See the Pivotal Greenplum-Informatica Connector documentation.
  • The PgBouncer connection pooler utility that ships with Greenplum Database 5.10.0 has been updated to resolve a known issue. See resolved issue 29347. PgBouncer 1.8.1 provides native TLS and PAM support and pg_hba.conf-compatible configuration. For information about PgBouncer, see Using the PgBouncer Connection Pooler.
  • Concurrent VACUUM operations on the same append-optimized table are blocked. In previous 5.x releases, concurrent VACUUM operations on the same append-optimized table are allowed.

Experimental Features

Because Pivotal Greenplum Database is based on the open source Greenplum Database project code, it includes several experimental features to allow interested developers to experiment with their use on development systems. Feedback will help drive development of these features, and they may become supported in future versions of the product.

Warning: Experimental features are not recommended or supported for production deployments. These features may change in or be removed from future versions of the product based on further testing and feedback. Moreover, any features that may be visible in the open source code but that are not described in the product documentation should be considered experimental and unsupported for production use.
Greenplum Database 5.10.0 includes these experimental features:
  • Storage plugins for gpbackup and gprestore.
    • The DD Boost storage plugin. You can specify the --plugin-config option to store a backup on a Dell EMC Data Domain storage appliance, and restore the data from the appliance. You can also replicate a backup on a separate, remote Data Domain system for disaster recovery.
    • The S3 storage plugin. You can specify the --plugin-config option to store a backup on an Amazon Web Services S3 location, and restore the data from the S3 location. For Greenplum Database 5.10.0 the S3 plugin also supports S3 compatible data stores. See New Features.
    • Storage plugin framework API. Partners, customers, and OSS developers can develop plugins to use in conjunction with gpbackup and gprestore. For Greenplum Database 5.10.0 the API supports a new argument for execution scope. See New Features.
    For information about storage plugins and the storage plugin API, see Using gpbackup Storage Plugins and Backup/Restore Storage Plugin API.
  • Recursive WITH Queries (Common Table Expressions). See WITH Queries (Common Table Expressions).
  • Resource groups remain an experimental feature only on the SuSE 11 platform, due to limited cgroups functionality in the kernel.

    SuSE 12 resolves the Linux cgroup issues that caused the performance degradation when Greenplum Database resource groups are enabled.

  • Integration with the Pivotal Greenplum-Kafka Connector (experimental). The Connector provides high speed, parallel data transfer from a Kafka cluster to a Pivotal Greenplum Database cluster for batch and streaming ETL operations. Refer to the Pivotal Greenplum-Kafka Connector (Experimental) documentation for more information about this feature.

Differences Compared to Open Source Greenplum Database

Pivotal Greenplum 5.x includes all of the functionality in the open source Greenplum Database project and adds:
  • Product packaging and installation script.
  • Support for QuickLZ compression. QuickLZ compression is not provided in the open source version of Greenplum Database due to licensing restrictions.
  • Support for managing Greenplum Database using Pivotal Greenplum Command Center.
  • Support for full text search and text analysis using Pivotal GPText.
  • Support for data connectors:
    • Greenplum-Spark Connector
    • Greenplum-Informatica Connector
    • Greenplum-Kafka Connector
    • Gemfire-Greenplum Connector
  • Data Direct ODBC/JDBC Drivers
  • gpcopy utility for copying or migrating objects between Greenplum systems.
Pivotal Greenplum 5.x does not support the following community-contributed features of open source Greenplum Database:
  • The PXF JDBC connector.
  • The PXF Apache Ignite connector.

Supported Platforms

Pivotal Greenplum 5.10.0 runs on the following platforms:

  • Red Hat Enterprise Linux 64-bit 7.x (See the following Note)
  • Red Hat Enterprise Linux 64-bit 6.x
  • SuSE Linux Enterprise Server 64-bit 12 SP2 and SP3 with kernel version greater than 4.4.73-5. (See the following Note)
  • SuSE Linux Enterprise Server 64-bit 11 SP4 (See the following Note)
  • CentOS 64-bit 7.x
  • CentOS 64-bit 6.x
Note: For the supported Linux operating systems, Pivotal Greenplum Database is supported on system hosts using either AMD or Intel CPUs based on the x86-64 architecture. Pivotal recommends using a homogeneous set of hardware (system hosts) in a Greenplum Database system.
Important: Significant Greenplum Database performance degradation has been observed when enabling resource group-based workload management on Red Hat 6.x, CentOS 6.x, and SuSE 11 systems. This issue is caused by a Linux cgroup kernel bug. This kernel bug has been fixed in CentOS 7.x and Red Hat 7.x systems.

If you use Red Hat 6 and the performance with resource groups is acceptable for your use case, upgrade your kernel to version 2.6.32-696 or higher to benefit from other fixes to the cgroups implementation.

SuSE 11 does not have a kernel version that resolves this issue; resource groups are still considered to be an experimental feature on this platform. Resource groups are not supported on SuSE 11 for production use. See known issue 149789783.

Pivotal Greenplum on SuSE 12 supports resource groups for production use. SuSE 12 resolves the Linux cgroup kernel issues that caused the performance degradation when Greenplum Database resource groups are enabled.

Note: For Greenplum Database that is installed on Red Hat Enterprise Linux 7.x or CentOS 7.x prior to 7.3, an operating system issue might cause Greenplum Database that is running large workloads to hang in the workload. The Greenplum Database issue is caused by Linux kernel bugs.

RHEL 7.3 and CentOS 7.3 resolves the issue.

Note: Greenplum Database on SuSE Linux Enterprise systems does not support these features.
  • The PL/Perl procedural language
  • The gpmapreduce tool
  • The PL/Container language extension
  • The Greenplum Platform Extension Framework (PXF)
Greenplum Database support on Dell EMC DCA.
  • Pivotal Greenplum Database 5.10.0 is supported on DCA systems that are running DCA software version 3.4 or greater.
  • Only Pivotal Greenplum Database is supported on DCA systems. Open source versions of Greenplum Database are not supported.
  • FIPS is supported on DCA software version 3.4 and greater with Pivotal Greenplum Database 5.2.0 and greater.
Note: These Greenplum Database releases are not certified on DCA because of an incompatibility in configuring timezone information.

5.5.0, 5.6.0, 5.6.1, 5.7.0, 5.8.0

These Greenplum Database releases are certified on DCA.

5.7.1, 5.8.1, 5.10.0 and later releases, and 5.x releases prior to 5.5.0.

Pivotal Greenplum 5.10.0 supports these Java versions:
  • 8.xxx
  • 7.xxx

Greenplum Database 5.10.0 software that runs on Linux systems uses OpenSSL 1.0.2l (with FIPS 2.0.16), cURL 7.54, OpenLDAP 2.4.44, and Python 2.7.12.

Greenplum Database client software that runs on Windows and AIX systems uses OpenSSL 0.9.8zg.

The Greenplum Database s3 external table protocol supports these data sources:

Pivotal Greenplum 5.10.0 supports Data Domain Boost on Red Hat Enterprise Linux.

This table lists the versions of Data Domain Boost SDK and DDOS supported by Pivotal Greenplum 5.x.

Table 1. Data Domain Boost Compatibility
Pivotal Greenplum Data Domain Boost2 DDOS
5.10.0

5.9.0

5.8.1

5.8.0

5.7.1

5.7.0

5.4.1

5.4.0

5.2.0

5.1.0

5.0.0

3.3

3.0.0.31

6.1 (all versions)

6.0 (all versions)

Note: In addition to the DDOS versions listed in the previous table, Pivotal Greenplum 5.0.0 and later supports all minor patch releases (fourth digit releases) later than the certified version.

1Support for Data Domain Boost 3.0.0.3 is deprecated. The DELL EMC end of Primary Support date is December 31, 2017.

2The Greenplum Database utilities gpbackup and gprestore support Data Domain DD Boost File System Plug-In (BoostFS) v1.1 with DDOS 6.0 or greater. Data Domain Boost is not supported.

Note: Pivotal Greenplum 5.10.0 does not support the ODBC driver for Cognos Analytics V11.

Connecting to IBM Cognos software with an ODBC driver is not supported. Greenplum Database supports connecting to IBM Cognos software with the DataDirect JDBC driver for Pivotal Greenplum. This driver is available as a download from Pivotal Network.

Veritas NetBackup

Pivotal Greenplum 5.10.0 supports backup with Veritas NetBackup version 7.7.3. See Backing Up Databases with Veritas NetBackup.

Supported Platform Notes

The following notes describe platform support for Pivotal Greenplum. Please send any questions or comments to Pivotal Support at https://support.pivotal.io.

  • The only file system supported for running Greenplum Database is the XFS file system. All other file systems are explicitly not supported by Pivotal.
  • Greenplum Database is supported on all 1U and 2U commodity servers with local storage. Special purpose hardware that is not commodity may be supported at the full discretion of Pivotal Product Management based on the general similarity of the hardware to commodity servers.
  • Greenplum Database is supported on network or shared storage if the shared storage is presented as a block device to the servers running Greenplum Database and the XFS file system is mounted on the block device. Network file systems are not supported. When using network or shared storage, Greenplum Database mirroring must be used in the same way as with local storage, and no modifications may be made to the mirroring scheme or the recovery scheme of the segments. Other features of the shared storage such as de-duplication and/or replication are not directly supported by Pivotal Greenplum Database, but may be used with support of the storage vendor as long as they do not interfere with the expected operation of Greenplum Database at the discretion of Pivotal.
  • Greenplum Database is supported when running on virtualized systems, as long as the storage is presented as block devices and the XFS file system is mounted for the storage of the segment directories.
  • A minimum of 10-gigabit network is required for a system configuration to be supported by Pivotal.
  • Greenplum Database is supported on Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Compute (GCP).
    • AWS - For production workloads, r4.8xlarge and r4.16xlarge instance types with four 12TB ST1 EBS volumes for each segment host, or d2.8xlarge with ephemeral storage configured with 4 RAID 0 volumes, are supported. EBS storage is recommended. EBS storage is more reliable and provides more features than ephemeral storage. Note that Amazon has no provisions to replace a bad ephemeral drive; when a disk failure occurs, you must replace the node with the bad disk.

      Pivotal recommends using an Auto Scaling Group (ASG) to provision nodes in AWS. An ASG automatically replaces bad nodes, and you can add further automation to recover the Greenplum processes on the new nodes automatically.

      Deployments should be in a Placement Group within a single Availability Zone. Because Amazon recommends using the same instance type in a Placement Group, use a single instance type for all nodes, including the masters.

    • Azure - For production workloads, Pivotal recommends configuring Standard_H8 instance type with 4 2TB disks and 2 segments per host, and recommend using 8 2TB disks and 4 segments per host with Standard_H16 instance type. Standard_H16 uses 8 2TB disks and 4 segments per host. This means software RAID 0 is required so that the number of volumes do not exceed the number of segments.
      For Azure deployments, you must also configure the Greenplum Database system to not use port 65330. Add the following line to the sysctl.conf file on all Greenplum Database hosts.
      $net.ipv4.ip_local_reserved_ports=65330
    • GCP - For all workloads, n1-standard-8 and n1-highmem-8 are supported which are relatively small instance types. This is because of the disk performance in GCP forces the configuration to have just 2 segments per host but with many hosts to scale. Use pd-standard disks and the size of the disk is recommended to be 6 TB. For performance perspective, use a factor of 8 when determining how many nodes to deploy in GCP, so a 16 segment host cluster in AWS would require 128 nodes in GCP.

  • For Red Hat Enterprise Linux 7.2 or CentOS 7.2, the default systemd setting RemoveIPC=yes removes IPC connections when non-system users logout. This causes the Greenplum Database utility gpinitsystem to fail with semaphore errors. To avoid this issue, see "Setting the Greenplum Recommended OS Parameters" in the Greenplum Database Installation Guide.

Pivotal Greenplum Tools and Extensions Compatibility

Client Tools

Greenplum releases a number of client tool packages on various platforms that can be used to connect to Greenplum Database and the Greenplum Command Center management tool. The following table describes the compatibility of these packages with this Greenplum Database release.

Tool packages are available from Pivotal Network.

Table 2. Pivotal Greenplum 5.10.0 Tools Compatibility
Tool Description of Contents Tool Version(s) Server Version(s)
Pivotal Greenplum Clients Greenplum Database Command-Line Interface (psql) 5.8 5.x
Pivotal Greenplum Loaders Greenplum Database Parallel Data Loading Tools (gpfdist, gpload) 5.8 5.x
Pivotal Greenplum Command Center Greenplum Database management tool 4.0.0 5.7.0 and later
3.3.2 5.0.0 and later
3.2.2 5.0.0 - 5.2.x
Pivotal Greenplum Workload Manager1 Greenplum Database query monitoring and management tool 1.8.0 5.0.0

The Greenplum Database Client Tools and Load Tools are supported on the following platforms:

  • AIX 7.2 (64-bit) (Client and Load Tools only)2
  • Red Hat Enterprise Linux x86_64 6.x (RHEL 6)
  • SuSE Linux Enterprise Server x86_64 SLES 11 SP4, or SLES 12 SP2/SP3
  • Windows 10 (32-bit and 64-bit)
  • Windows 8 (32-bit and 64-bit)
  • Windows Server 2012 (32-bit and 64-bit)
  • Windows Server 2012 R2 (32-bit and 64-bit)
  • Windows Server 2008 R2 (32-bit and 64-bit)
Note: 1For Pivotal Greenplum Command Center 4.0.0 and later, workload management is an integrated Command Center feature rather than the separate tool Pivotal Greenplum Workload Manager.

2For Greenplum Database 5.4.1 and earlier 5.x releases, download the AIX Client and Load Tools package either from the Greenplum Database 5.10.0 file collection or the Greenplum Database 5.0.0 file collection on Pivotal Network.

Extensions

Table 3. Pivotal Greenplum 5.10.0 Extensions Compatibility
Pivotal Greenplum Extension Versions
MADlib machine learning for Greenplum Database 5.x1, 2 MADlib 1.14
PL/Java for Greenplum Database 5.x PL/Java 1.4.2, 1.4.0
PL/R for Greenplum Database 5.6.x 2.3.2
PostGIS Spatial and Geographic Objects for Greenplum Database 5.x 2.1.5+pivotal.15
Python Data Science Module Package for Greenplum Database 5.x3 1.0.0, 1.1.0, 1.1.1
R Data Science Library Package for Greenplum Database 5.x4 1.0.0, 1.0.1
PL/Container for Greenplum Database 5.x 1.16, 1.27
Note: 1For information about MADlib support and upgrade information, see the MADlib FAQ. For information on installing the MADlib extension in Greenplum Database, see Greenplum MADlib Extension for Analytics.

2Before upgrading to MADlib 1.13, you must remove some leftover knn functions. For information on upgrading MADlib, see Greenplum MADlib Extension for Analytics.

3For information about the Python package, including the modules provided, see the Python Data Science Module Package.

4For information about the R package, including the libraries provided, see the R Data Science Library Package.

5The PostGIS extension package version 2.1.5+pivotal.1 is compatible only with Greenplum Database 5.5.0 and later.

6To upgrade from PL/Container 1.0 to PL/Container 1.1 and later, you must drop the PL/Container 1.0 language before registering the new version of PL/Container. For information on upgrading the PL/Container extension in Greenplum Database, see PL/Container Language Extension.

7PL/Container version 1.2 can utilize the resource group capabilities that were introduced in Greenplum Database 5.8.0. If you downgrade to a Greenplum Database system that uses PL/Container 1.1 or earlier, you must use plcontainer runtime-edit to remove any resource_group_id settings from the PL/Container runtime configuration file. See Upgrading from PL/Container 1.1.

These Greenplum Database extensions are installed with Pivotal Greenplum Database
  • Fuzzy String Match Extension
  • PL/Python Extension
  • pgcrypto Extension

Pivotal Greenplum Data Connectors

  • Greenplum Platform Extension Framework (PXF) - PXF, integrated with Greenplum Database 5.10.0, provides access to HDFS, Hive, and HBase external data stores. Refer to Accessing External Data with PXF in the Greenplum Database Administrator Guide for PXF configuration and usage information.
    Note: PXF is available only for supported Red Hat and CentOS platforms. PXF is not available for supported SuSE platforms.
  • Greenplum-Spark Connector - The Pivotal Greenplum-Spark Connector supports high speed, parallel data transfer from Greenplum Database to an Apache Spark cluster. The Greenplum-Spark Connector is available as a separate download from Pivotal Network. Refer to the Greenplum-Spark Connector documentation for compatibility and usage information.
  • Greenplum-Informatica Connector - The Pivotal Greenplum-Informatica connector supports high speed data transfer from an Informatica PowerCenter cluster to a Pivotal Greenplum Database cluster for batch and streaming ETL operations. See the Pivotal Greenplum-Informatica Connector Documentation.
  • Greenplum-Kafka Connector - The Pivotal Greenplum-Kafka connector provides high speed, parallel data transfer from a Kafka cluster to a Pivotal Greenplum Database cluster for batch and streaming ETL operations. Refer to the Pivotal Greenplum-Kafka Connector (Experimental) Documentation for more information about this feature.
  • Gemfire-Greenplum Connector - The Pivotal Gemfire-Greenplum Connector supports the transfer of data between a GemFire region and a Greenplum Database cluster. The Gemfire-Greenplum Connector is available as a separate download from Pivotal Network. Refer to the Gemfire-Greenplum Connector documentation for compatibility and usage information.

Pivotal GPText Compatibility

Pivotal Greenplum Database 5.10.0 is compatible with Pivotal GPText version 2.1.3 and later.

Pivotal Greenplum Command Center

See the Greenplum Command Center documentation for GPCC and Greenplum Workload Manager compatibility information, see the Pivotal Greenplum Command Center 3.x and 2.x Release Notes.
Note: For Pivotal Greenplum Command Center 4.0.0 and later, workload management is an integrated Command Center feature rather than the separate tool Pivotal Greenplum Workload Manager.

Hadoop Distribution Compatibility

Greenplum Database provides access to HDFS with gphdfs and the Greenplum Platform Extension Framework (PXF).

PXF Hadoop Distribution Compatibility

PXF supports Cloudera, Hortonworks Data Platform, and generic Apache Hadoop distributions.

If you plan to access JSON format data stored in a Cloudera Hadoop cluster, PXF requires a Cloudera version 5.8 or later Hadoop distribution.

gphdfs Hadoop Distribution Compatibility

The supported Hadoop distributions for gphdfs are listed below:

Table 4. Supported gphdfs Hadoop Distributions
Hadoop Distribution Version gp_hadoop_ target_version
Cloudera CDH 5.x cdh
Hortonworks Data Platform HDP 2.x hdp
MapR MapR 4.x, MapR 5.x mpr
Apache Hadoop 2.x hadoop
Note: MapR requires the MapR client.

Upgrading to Greenplum Database 5.10.0

The upgrade path supported for this release is Greenplum Database 5.x to Greenplum Database 5.10.0. Upgrading a Greenplum Database 4.3.x release to Pivotal Greenplum 5.x is not supported. See Migrating Data to Pivotal Greenplum 5.x.

Note: If you are upgrading Greenplum Database on a DCA system, see Pivotal Greenplum on DCA Systems.
Important: Pivotal recommends that customers set the Greenplum Database timezone to a value that is compatible with their host systems. Setting the Greenplum Database timezone prevents Greenplum Database from selecting a timezone each time the cluster is restarted and sets the timezone for the Greenplum Database master and segment instances. After you upgrade to this release and if you have not set a Greenplum Database timezone value, verify that the selected Greenplum Database timezone is acceptable for your deployment. See Configuring Timezone and Localization Settings for more information.

Prerequisites

Before starting the upgrade process, Pivotal recommends performing the following checks.

  • Verify the health of the Greenplum Database host hardware, and that you verify that the hosts meet the requirements for running Greenplum Database. The Greenplum Database gpcheckperf utility can assist you in confirming the host requirements.
    Note: If you need to run the gpcheckcat utility, Pivotal recommends running it a few weeks before the upgrade and that you run gpcheckcat during a maintenance period. If necessary, you can resolve any issues found by the utility before the scheduled upgrade.

    The utility is in $GPHOME/bin. Pivotal recommends that Greenplum Database be in restricted mode when you run gpcheckcat utility. See the Greenplum Database Utility Guide for information about the gpcheckcat utility.

    If gpcheckcat reports catalog inconsistencies, you can run gpcheckcat with the -g option to generate SQL scripts to fix the inconsistencies.

    After you run the SQL scripts, run gpcheckcat again. You might need to repeat the process of running gpcheckcat and creating SQL scripts to ensure that there are no inconsistencies. Pivotal recommends that the SQL scripts generated by gpcheckcat be run on a quiescent system. The utility might report false alerts if there is activity on the system.

    Important: If the gpcheckcat utility reports errors, but does not generate a SQL script to fix the errors, contact Pivotal support. Information for contacting Pivotal Support is at https://support.pivotal.io.
  • During the migration process from Greenplum Database 5.0.0, a backup is made of some files and directories in $MASTER_DATA_DIRECTORY. Pivotal recommends that files and directories that are not used by Greenplum Database be backed up, if necessary, and removed from the $MASTER_DATA_DIRECTORY before migration. For information about the Greenplum Database migration utilities, see the Greenplum Database Documentation.

For information about supported versions of Greenplum Database extensions, see Pivotal Greenplum Tools and Extensions Compatibility.

If you are utilizing Data Domain Boost, you have to re-enter your DD Boost credentials after upgrading to Greenplum Database 5.10.0 as follows:

gpcrondump --ddboost-host ddboost_hostname --ddboost-user ddboost_user
  --ddboost-backupdir backup_directory
Note: If you do not reenter your login credentials after an upgrade, your backup will never start because the Greenplum Database cannot connect to the Data Domain system. You will receive an error advising you to check your login credentials.

If you have configured the Greenplum Platform Extension Framework (PXF) in your previous Greenplum Database installation, you must stop the PXF service and back up PXF configuration files before upgrading to a new version of Greenplum Database. Refer to PXF Pre-Upgrade Actions for instructions.

If you do not plan to use PXF, or you have not yet configured PXF, no action is necessary.

Upgrading from 5.x to 5.10.0

An upgrade from 5.x to 5.10.0 involves stopping Greenplum Database, updating the Greenplum Database software binaries, upgrading and restarting Greenplum Database. If you are using Greenplum Database extension packages there are additional requirements. See Prerequisites in the previous section.

Note: If you have databases that were created with Greenplum Database 5.3.0 or an earlier 5.x release, upgrade the gp_bloat_diagfunction and view in the gp_toolkit schema. For information about the issue and how check a database for the issue, see Update for gp_toolkit.gp_bloat_diag Issue.
Note: If the Greenplum Command Center database gpperfmon is installed in your Greenplum Database system, the migration process changes the distribution key of the Greenplum Database log_alert_* tables to the logtime column. The redistribution of the table data might take some time the first time you start Greenplum Database after migration. The change occurs only the first time you start Greenplum Database after a migration.
  1. Log in to your Greenplum Database master host as the Greenplum administrative user:
    $ su - gpadmin
  2. Perform a smart shutdown of your current Greenplum Database 5.x system (there can be no active connections to the database). This example uses the -a option to disable confirmation prompts:
    $ gpstop -a
  3. Run the binary installer for 5.10.0 on the Greenplum Database master host.
    When prompted, choose an installation location in the same base directory as your current installation. For example:
    /usr/local/greenplum-db-5.10.0

    If you install Greenplum Database with the rpm (as root), the installation directory is /usr/local/greenplum-db-5.10.0.

    For the rpm installation, update the permissions for the new installation. For example, run this command as root to change user and group of the installed files to gpadmin.

    # chown -R gpadmin:gpadmin /usr/local/greenplum*
  4. If your Greenplum Database deployment uses LDAP authentication, manually edit the /usr/local/greenplum-db/greenplum_path.sh file to add the line:
    export LDAPCONF=/etc/openldap/ldap.conf
  5. Edit the environment of the Greenplum Database superuser (gpadmin) and make sure you are sourcing the greenplum_path.sh file for the new installation. For example change the following line in .bashrc or your chosen profile file:
    source /usr/local/greenplum-db-5.0.0/greenplum_path.sh

    to:

    source /usr/local/greenplum-db-5.10.0/greenplum_path.sh

    Or if you are sourcing a symbolic link (/usr/local/greenplum-db) in your profile files, update the link to point to the newly installed version. For example:

    $ rm /usr/local/greenplum-db
    $ ln -s /usr/local/greenplum-db-5.10.0 /usr/local/greenplum-db
  6. Source the environment file you just edited. For example:
    $ source ~/.bashrc
  7. Run the gpseginstall utility to install the 5.10.0 binaries on all the segment hosts specified in the hostfile. For example:
    $ gpseginstall -f hostfile
  8. Use the Greenplum Database gppkg utility to install Greenplum Database extensions. If you were previously using any Greenplum Database extensions such as pgcrypto, PL/R, PL/Java, PL/Perl, and PostGIS, download the corresponding packages from Pivotal Network, and install using this utility. See the Greenplum Database Documentation for gppkg usage details.
  9. If you configured PgBouncer in your previous Greenplum Database installation, you must migrate to the new PgBouncer when you upgrade Greenplum Database. Refer to Migrating PgBouncer for specific migration instructions.
  10. After all segment hosts have been upgraded, you can log in as the gpadmin user and restart your Greenplum Database system:
    # su - gpadmin
    $ gpstart
  11. If you are utilizing Data Domain Boost, you have to re-enter your DD Boost credentials after upgrading from Greenplum Database to 5.10.0 as follows:
    gpcrondump --ddboost-host ddboost_hostname --ddboost-user ddboost_user
      --ddboost-backupdir backup_directory
    Note: If you do not reenter your login credentials after an upgrade, your backup will never start because the Greenplum Database cannot connect to the Data Domain system. You will receive an error advising you to check your login credentials.
  12. If you configured PXF in your previous Greenplum Database installation, you must re-initialize the PXF service after you upgrade Greenplum Database. Refer to Upgrading PXF for instructions.

Troubleshooting a Failed Upgrade

If you experience issues during the migration process and have active entitlements for Greenplum Database that were purchased through Pivotal, contact Pivotal Support. Information for contacting Pivotal Support is at https://support.pivotal.io.

Be prepared to provide the following information:

  • A completed Upgrade Procedure.
  • Log output from gpcheckcat (located in ~/gpAdminLogs)

Migrating Data to Pivotal Greenplum 5.x

Upgrading a Pivotal Greenplum Database 4.x system directly to Pivotal Greenplum Database 5.x is not supported.

You can migrate existing data to Greenplum Database 5.x using standard backup and restore procedures (gpcrondump and gpdbrestore) or by using gptransfer. The gpcopy utility can be used to migrate data from Greenplum Database 4.3.26 or later to 5.9 or later if both the source and destination clusters have the same number of segments.

Follow these general guidelines for migrating data:
  • Make sure that you have a complete backup of all data in the Greenplum Database 4.3.x cluster, and that you can successfully restore the Greenplum Database 4.3.x cluster if necessary.
  • You must install and initialize a new Greenplum Database 5.x cluster using the version 5.x gpinitsystem utility.
    Note: Unless you modify file locations manually, gpdbrestore only supports restoring data to a cluster that has an identical number of hosts and an identical number of segments per host, with each segment having the same content_id as the segment in the original cluster. If you initialize the Greenplum Database 5.x cluster using a configuration that is different from the version 4.3 cluster, then follow the steps outlined in Restoring to a Different Greenplum System Configuration to manually update the file locations.

    You cannot use gpcopy to migrate data between Greenplum Database clusters that have different numbers of segments.

    Important: For Greenplum Database 5.x, Pivotal recommends that customers set the Greenplum Database timezone to a value that is compatible with the host systems. Setting the Greenplum Database timezone prevents Greenplum Database from selecting a timezone each time the cluster is restarted and sets the timezone for the Greenplum Database master and segment instances. See Configuring Timezone and Localization Settings for more information.
  • If you intend to install Greenplum Database 5.x on the same hardware as your 4.3.x system, you will need enough disk space to accommodate over 5 times the original data set (2 full copies of the primary and mirror data sets, plus the original backup data in ASCII format) in order to migrate data with gpcrondump and gpdbrestore. Keep in mind that the ASCII backup data will require more disk space than the original data, which may be stored in compressed binary format. Offline backup solutions such as Dell EMC Data Domain or Veritas NetBackup can reduce the required disk space on each host.

    If you attempt to migrate your data on the same hardware but run out of free space, gpcopy provides the --truncate-source-after option to truncate each source table after copying the table to the destination cluster and validating the the copy succeeded. This reduces the amount of free space needed to migrate clusters that reside on the same hardware. See Migrating Data with gpcopy for more information.

  • Use the version 5.x gpdbrestore utility to load the 4.3.x backup data into the new cluster.
  • If the Greenplum Database 5.x cluster resides on separate hardware from the 4.3.x cluster, and the clusters have different numbers of segments, you can optionally use the version 5.x gptransfer utility to migrate the 4.3.x data. You must initiate the gptransfer operation from the version 5.x cluster, pulling the older data into the newer system.

    On a Greenplum Database system with FIPS enabled, validating table data with MD5 (specifying the gptransfer option --validate=md5) is not available. Use the option sha256 to validate table data.

    Validating table data with SHA-256 (specifying the option --validate=sha256) requires the Greenplum Database pgcrypto extension. The extension is included with Pivotal Greenplum 5.x. The extension package must be installed on supported Pivotal Greenplum 4.3.x systems. Support for pgcrypto functions in a Greenplum 4.3.x database is not required.

  • Greenplum Database 5.x removes automatic implicit casts between the text type and other data types. After you migrate from Greenplum Database version 4.3.x to version 5.x, this change in behavior may impact existing applications and queries. Refer to About Implicit Text Casting in Greenplum Database in the Greenplum Database Installation Guide for information, including a discussion about supported and unsupported workarounds.
  • After migrating data you may need to modify SQL scripts, administration scripts, and user-defined functions as necessary to account for changes in Greenplum Database version 5.x. Look for Upgrade Action Required entries in the Pivotal Greenplum 5.0.0 Release Notes for features that may necessitate post-migration tasks.
  • If you configured PgBouncer in your previous Greenplum Database installation, you must migrate to the new PgBouncer when you upgrade Greenplum Database. Refer to Migrating PgBouncer for specific migration instructions.

Pivotal Greenplum on DCA Systems

On supported Dell EMC DCA systems, you can install Pivotal Greenplum 5.10.0, or you can upgrade from Pivotal Greenplum 5.x to 5.10.0.

Only Pivotal Greenplum Database is supported on DCA systems. Open source versions of Greenplum Database are not supported.

Important: Upgrading Pivotal Greenplum Database 4.3.x to Pivotal Greenplum 5.10.0 is not supported. See Migrating Data to Pivotal Greenplum 5.x.
Note: These Greenplum Database releases are not certified on DCA because of an incompatibility in configuring timezone information.

5.5.0, 5.6.0, 5.6.1, 5.7.0, 5.8.0

These Greenplum Database releases are certified on DCA.

5.7.1, 5.8.1, 5.9.0 and later releases, and 5.x releases prior to 5.5.0.

Installing the Pivotal Greenplum 5.10.0 Software Binaries on DCA Systems

Important: This section is for installing Pivotal Greenplum 5.10.0 only on DCA systems. Also, see the information on the DELL EMC support site (requires login).

For information about installing Pivotal Greenplum on non-DCA systems, see the Greenplum Database Installation Guide.

Prerequisites

  • Ensure your DCA system supports Pivotal Greenplum 5.10.0. See Supported Platforms.
  • Ensure Greenplum Database 4.3.x is not installed on your system.

    Installing Pivotal Greenplum 5.10.0 on a DCA system with an existing Greenplum Database 4.3.x installation is not supported. For information about uninstalling Greenplum Database software, see your Dell EMC DCA documentation.

Installing Pivotal Greenplum 5.10.0

  1. Download or copy the Greenplum Database DCA installer file greenplum-db-appliance-5.10.0-RHEL6-x86_64.bin to the Greenplum Database master host.
  2. As root, run the DCA installer for 5.10.0 on the Greenplum Database master host and specify the file hostfile that lists all hosts in the cluster, one host name per line. If necessary, copy hostfile to the directory containing the installer before running the installer.

    This example command runs the installer for Greenplum Database 5.10.0.

    # ./greenplum-db-appliance-5.10.0-RHEL6-x86_64.bin hostfile

Upgrading from 5.x to 5.10.0 on DCA Systems

Upgrading Pivotal Greenplum from 5.x to 5.10.0 on a Dell EMC DCA system involves stopping Greenplum Database, updating the Greenplum Database software binaries, and restarting Greenplum Database.

Important: This section is only for upgrading to Pivotal Greenplum 5.10.0 on DCA systems. For information about upgrading on non-DCA systems, see Upgrading to Greenplum Database 5.10.0.
Note: If you have databases that were created with Greenplum Database 5.3.0 or an earlier 5.x release, upgrade the gp_bloat_diagfunction and view in the gp_toolkit schema. For information about the issue and how check a database for the issue, see Update for gp_toolkit.gp_bloat_diag Issue.
  1. Log in to your Greenplum Database master host as the Greenplum administrative user (gpadmin):
    # su - gpadmin
  2. Download or copy the installer file greenplum-db-appliance-5.10.0-RHEL6-x86_64.bin to the Greenplum Database master host.
  3. Perform a smart shutdown of your current Greenplum Database 5.x system (there can be no active connections to the database). This example uses the -a option to disable confirmation prompts:
    $ gpstop -a
  4. As root, run the Greenplum Database DCA installer for 5.10.0 on the Greenplum Database master host and specify the file hostfile that lists all hosts in the cluster. If necessary, copy hostfile to the directory containing the installer before running the installer.

    This example command runs the installer for Greenplum Database 5.10.0 for Red Hat Enterprise Linux 6.x.

    # ./greenplum-db-appliance-5.10.0-RHEL6-x86_64.bin hostfile

    The file hostfile is a text file that lists all hosts in the cluster, one host name per line.

  5. Install Greenplum Database extension packages. For information about installing a Greenplum Database extension package, see gppkg in the Greenplum Database Utility Guide.
  6. After all segment hosts have been upgraded, you can log in as the gpadmin user and restart your Greenplum Database system:
    # su - gpadmin
    $ gpstart
  7. If you are utilizing Data Domain Boost, you have to re-enter your DD Boost credentials after upgrading to Greenplum Database 5.10.0 as follows:
    gpcrondump --ddboost-host ddboost_hostname --ddboost-user ddboost_user
      --ddboost-backupdir backup_directory
Note: If you do not reenter your login credentials after an upgrade, your backup will never start because the Greenplum Database cannot connect to the Data Domain system. You will receive an error advising you to check your login credentials.

Resolved Issues

The listed issues are resolved in Pivotal Greenplum Database 5.10.0.

For issues resolved in prior 5.x releases, refer to the corresponding release notes. Release notes are available from the Pivotal Greenplum page on Pivotal Network or on the Pivotal Greenplum Database documentation site at Release Notes.

29464 - gpbackup/ gprestore
The gprestore utility failed restoring GRANT information when the ROLE names associated with the information contained upper and lower case characters or special characters. The failure occurred because the gpbackup utility did not correctly handle ROLE names that contained the specified types of characters during a backup operation.
This issue has been resolved. Now gpbackup correctly handles the specified type of ROLE names.
29426 - gpcopy
In some cases, after copying table data from a Greenplum Database 4.3.x system to a 5.x system with the gpcopy utility, deleting table data from the 5.x system caused a Greenplum Database PANIC. The issue occurred because, according to the table distribution policy, the table data copied by gpcopy was distributed incorrectly to segment instances.
This issue has been resolved. Now the gpcopy utility performs data distribution checking to ensure that data is distributed to segment instances correctly.
29435 - gpbackup/ gprestore
The gpbackup utility failed when a backup operation was performed on a Greenplum Database system was initialized with --locale=C, the database was configured with encoding=WIN874, and the name of a table that was being backed contained a Thai character.
This issue has been resolved. Now the backup operation completes in the specified situation.
29436 - gpbackup/ gprestore
When backing up a view, the gpbackup utility did not include the owner of the view.
This issue has been resolved. Now the utility includes the owner information when backing up a view.
29419 - DDL
Creating a partitioned table with a large number of partitions (more than 200 partitions) failed and returned an error stating that Greenplum Database could not form array type name. The error was caused by array type objects that Greenplum Database created when creating heap tables. In some cases, the array type naming algorithm caused naming conflicts when a large number of heap tables were created with similar names. One such situation occurred when a partitioned table with a large number of heap partitions was created.

This issue has been resolved. When creating partitioned tables, the associated array type is not required for child partitions and is no longer created.

29418 - External Tables
When inserting data from an external table, the INSERT command did not recognize the Greenplum Database server configuration parameter gp_max_csv_line_length. This caused INSERT commands to fail with a data line too long error when the row from the external table was shorter than the value specified by the configuration parameter gp_max_csv_line_length.
This issue has been resolved. Now the INSERT command recognizes the parameter and successfully inserts data from an external table in the specified situation.
29418 - gpperfmon
When copying data from an external CSV format table, if a text column contains line breaks and one of the lines is longer than the value of the gp_max_csv_line_length server configuration parameter, the copy fails with an invalid CSV data error. The error prevents gpperfmon from loading query data from the queries_tail external table into the queries_history table when query text contains line breaks and very long lines.
The gpperfmon code has been updated to prevent the error by setting the gp_max_csv_line_length configuration parameter to its maximum value (4MB) in the session. If any line in the query exceeds this length, all line breaks in the query are replaced with spaces. Although this alters the formatting of the query text for some queries, it prevents the COPY command from failing to load the queries_history table.
29415 - gpbackup/ gprestore
In some cases, the gpbackup utility did not correctly release locks on some tables when the --include-schema option was specified and the backup included partitioned tables. In some cases, the incorrect locking caused table access issues.
This issue has been resolved. Table locking during back ups performed by gpbackup has been improved.
29410 - Query Optimizer
For some queries that contain an EXCEPT clause that includes a partitioned table and that involved a bitmap index, GPORCA generated a PANIC. GPORCA did not correctly generate some alternative plans during optimization. The incorrect plan generation caused the PANIC.
This issue has been resolved. GPORCA plan generation has been improved for the specify type of queries.
29395 - DDL
The gpdbrestore or gprestore utility failed when the utility attempted to restore a table from a backup and the table was incorrectly defined with duplicate columns as distribution keys. The issue occurred when a CREATE TABLE AS command created a table with a distribution policy that incorrectly contained duplicate columns as distribution keys, and the table was backed up with the gpcrondump or gpbackup utility.
The CREATE TABLE ISSUE has been resolved. Now the CREATE TABLE AS command does not create the specified type of table. The command returns an error.
Note: Restore operations will continue to fail if you try to restore the incorrectly defined tables from a backup. See 29395
29374 - gpinitstandby/ gpactivatestandby
In some cases, long running Greenplum Database utilities that used SSH failed to complete. To fulfill operating system hardening requirements, some systems require SSH connections be dropped if they have not been used for a period of time. This caused some long running Greenplum Database utilities to fail when the operating system dropped the SSH connection.
This issue has been resolved. The Greenplum Database utilities have been enhanced to keep the SSH connection active until the utility completes.
29372 - gpcrondump/ gpdbrestore
When the gpdbrestore utility attempted to restore an external table and the --truncate option was specified, the utility failed when it attempted to truncate the external table.
This issue has been resolved. Now the utility does not attempt to truncate external tables in the specified situation.
29369 - Query Planner
For some queries where the predicate contained a subquery that contained a comparison such as BETWEEN '2' AND '1', the Greenplum Database legacy optimizer incorrectly returned an error stating that no parameter was found for an initplan subquery. The legacy optimizer did not correctly handle the comparison when generating a query plan.
This issue has been resolved. The legacy optimizer has been improved the handling the comparisons in subqueries.
29362 - gpcrondump/ gpdbrestore
In some cases, backup performance was poor when performing a backup with the gpcrondump utility and specifying the --table-file option. The method that the utility used to check for tables to be backed up was inefficient.
This issue has been resolved. The method used to check for tables to be backed up has been improved.
29364 - Query Optimizer
GPORCA performance was poor for some queries that required a large number of joins.
GPORCA has improved the use join cardinality estimates during join order processing when creating a query plan.
29357 - Storage: Access Methods
In some cases, deleting data from an append optimized tables returned an error stating that Greenplum Database could not find segment file to use. The error was caused when the cached state of an append-optimized table was not handled correctly during an ALTER TABLE...SET DISTRIBUTED BY or TRUNCATE operation.
This issue has been resolved. The handling of the cached state of an append-optimized table row has been improved.
29347 - PgBouncer
When Greenplum Database was configured to use the PgBouncer connection pool manager, PgBouncer occasionally returned an invalid server parameter error. The error was caused when PgBouncer did not handle SET commands correctly.
This issue has been resolved. Now PgBouncer handles SET commands correctly. See Changed Features.
29302 - Query Optimizer
GPORCA incorrectly handled the comparison operators ANY, ALL, EXISTS, or NOT-EXISTS that occurred in subqueries that were nested inside another scalar expression. This resulted in an incorrect query plan that produced incorrect results.
This issue has been resolved. Now GPORCA generates a correct plan for specified type of query.
159211957 - gpbackup/ gprestore
In some cases, the gprestore utility restored table data that caused a Greenplum Database PANIC. The issue occurred when the gpbackup utility version 1.0.x backed up table data with the incorrect distribution policy and then a gprestore version earlier than 1.6.1 restored the table data without performing data distribution checking to ensure that data was distributed to segment instances correctly.
This issue has been resolved. gpbackup version 1.1 and later backs up table data with the correct distribution policy. Also, gprestore version 1.6.1 and later performs data distribution checking.
158731322 - gpbackup/ gprestore
When backing up user defined data types, the gpbackup utility did not include precision information for numeric types.
This issue has been resolved. Now the backup includes the precision information.
158574185 - gpbackup/ gprestore
A back up operation failed when the gpbackup utility was run with the --single-data-file option and attempted to back up an external table.
This issue has been resolved. Now the back up operation completes and the backup includes the external table DDL.
158574172 - gpbackup/ gprestore
The gprestore utility restored external table definitions before restoring the required protocols. This caused a restore failure.
This issue has been resolved. Now the utility restores protocols before the external table definitions.
158468023 - gpbackup/ gprestore
When performing a back up operation, the gpbackup utility did not exclude all database objects that were created when a Greenplum Database extension was installed. This caused errors when restoring the backup with the gprestore utility.
This issue has been resolved. Now the gpbackup utility excludes all objects created by extensions. The objects are created when the extension is restored.
158265682 - gpbackup/ gprestore
When backing up indexes, the gpbackup utility did not include the schema when backing up some index information. This caused errors when restoring the index with the gprestore utility.
This issue has been resolved. Now the gpbackup utility qualifies the index information when backing it up.
158244514 - Dispatch
In some cases, Greenplum Database incorrectly returned an error stating that a host name could not be translated to an address. An error should not have been returned when the host name could not be resolved to an IP address. The issue was caused when a host system IP address was changed and Greenplum Database system was not restarted.
The issue has been resolved. The Greenplum Database handling of DNS changes has been improved and Greenplum Database returns a warning in the specified situation.
157103126 - VACUUM
When concurrent VACUUM operations were performed on different append-optimized tables, the dropping of disk files would be skipped. A disk file can be dropped after a VACUUM operation moves all current, visible data from the file into another file.
This issue has been resolved. Now concurrent VACUUM operations do not skip the dropping of empty disk files in the specified situation.

Known Issues and Limitations

Pivotal Greenplum 5.x has these limitations:

  • Upgrading a Greenplum Database 4.3.x release to Pivotal Greenplum 5.x is not supported. See Migrating Data to Pivotal Greenplum 5.x.
  • Some features are works-in-progress and are considered to be experimental features. Pivotal does not support using experimental features in a production environment. See Experimental Features.
  • Greenplum Database 4.3.x packages are not compatible with Pivotal Greenplum 5.x.

The following table lists key known issues in Pivotal Greenplum 5.x.

Table 5. Key Known Issues in Pivotal Greenplum 5.x
Issue Category Description
159034782 gpload If fast_match: true is specified in the gpload configuration file, the utility ignores the value of SCHEMA in the EXTERNAL section if the SCHEMA value is specified in the file. The utility uses the Greenplum Database default schema. The SCHEMA value specifies the schema of the external table database objects created by gpload.
29395 DDL The gpdbrestore or gprestore utility fails when the utility attempts to restore a table from a backup and the table is incorrectly defined with duplicate columns as distribution keys. The issue is caused when the gpcrondump or gpbackup utility backed up a table that is incorrectly defined. The CREATE TABLE AS command could create a table that is incorrectly defined with a distribution policy that contains duplicate columns as distribution keys.

The CREATE TABLE ISSUE has been resolved. Now the CREATE TABLE AS command does not create the specified type of table. The command returns an error. However, restore operations will continue to fail if you try to restore the incorrectly defined tables from a backup.

29351 gptransfer The gptransfer utility can copy a data row with a maximum length of 256 MB.
158011506 Catalog and Metadata In some cases, the timezone used by Greenplum Database might be different than the host system timezone, or the Greenplum Database timezone set by a user. In some rare cases, times used and displayed by Greenplum Database might be slightly different than the host system time.

The timezone used by Greenplum Database is selected from a set of internally stored PostgreSQL timezones. Greenplum Database selects the timezone by matching a PostgreSQL timezone with the user specified time zone, or the host system time zone. For example, when selecting a default timezone, Greenplum Database uses an algorithm to select a PostgreSQL timezones based on the host system timezone. If the system timezone includes leap second information, Greenplum Database cannot match the system timezone with a PostgreSQL timezone. Greenplum Database calculates a best match with a PostgreSQL timezone based on information from the host system.

Workaround: Set the Greenplum Database and host system timezones to a timezone that is supported by both Greenplum Database and the host system. For example, you can show and set the Greenplum Database timezone with the gpconfig utility. These commands show the Greenplum Database timezone and set the timezone to US/Pacific.

# gpconfig -s TimeZone
# gpconfig -c TimeZone -v 'US/Pacific'

You must restart Greenplum Database after changing the timezone. The command gpstop -ra restarts Greenplum Database.

The Greenplum Database catalog view pg_timezone_names provides Greenplum Database timezone information.

26589 Storage: Catalog and Metadata Greenplum Database does not acquire a lock on a schema when creating a table inside the schema. A concurrent DROP of the schema and CREATE TABLE operation will result in a leaked object (an orphan table or orphan relation columns) in the system catalog and possibly on disk. For example, a relation created as
CREATE TABLE foobar (col1 int)

during a concurrent DROP may leak either the table itself or the col1 attribute object.

These leaked objects do not affect any future queries. The catalog inconsistencies can be detected with the gpcheckcat utility.

Workaround: To block a concurrent DROP from occurring, acquire and hold a ROW SHARE lock on the row of the pg_namespace catalog table that corresponds to the schema. For example, this transaction acquires a ROW SHARE lock on the pg_namespace catalog table row for the schema my_schema.

begin;
SELECT nspname FROM pg_namespace WHERE nspname = 'my_schema' FOR SHARE;
  ...

end;

During the transaction, the lock prevents the schema from being dropped.

N/A PXF PXF is available only for supported Red Hat and CentOS platforms. PXF is not available for supported SuSE platforms.
151135629 COPY command When the ON SEGMENT clause is specified, the COPY command does not support specifying a SELECT statement in the COPY TO clause. For example, this command is not supported.
COPY (SELECT * FROM testtbl) TO '/tmp/mytst<SEGID>' ON SEGMENT
29064 Storage: DDL The money data type accepts out-of-range values as negative values, and no error message is displayed.

Workaround: Use only in-range values for the money data type (32-bit for Greenplum Database 4.x, or 64-bit for Greenplum Database 5.x). Or, use an alternative data type such as numeric or decimal.

3290 JSON The to_json() function is not implemented as a callable function. Attempting to call the function results in an error. For example:
tutorial=# select to_json('Fred said "Hi."'::text); 
ERROR: function to_json(text) does not exist
LINE 1: select to_json('Fred said "Hi."'::text);
^
HINT: No function matches the given name and argument types. 
You might need to add explicit type casts.

Workaround: Greenplum Database invokes to_json() internally when casting to the json data type, so perform a cast instead. For example: SELECT '{"foo":"bar"}'::json; Greenplum Database also provides the array_to_json() and row_to_json() functions.

148119917 Resource Groups Testing of the resource groups feature has found that a kernel panic can occur when using the default kernel in RHEL/CentOS system. The problem occurs due to a problem in the kernel cgroups implementation, and results in a kernel panic backtrace similar to:
[81375.325947] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      [81375.325986] IP: [<ffffffff812f94b1>] rb_next+0x1/0x50 [81375.326014] PGD 0 [81375.326025]
      Oops: 0000 [#1] SMP [81375.326041] Modules linked in: veth ipt_MASQUERADE
      nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype
      iptable_filter xt_conntrack nf_nat nf_conntrack bridge stp llc intel_powerclamp coretemp
      intel_rapl dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio kvm_intel kvm crc32_pclmul
      ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt
      iTCO_vendor_support ses enclosure ipmi_ssif pcspkr lpc_ich sg sb_edac mfd_core edac_core
      mei_me ipmi_si mei wmi ipmi_msghandler shpchp acpi_power_meter acpi_pad ip_tables xfs
      libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect crct10dif_pclmul
      sysimgblt crct10dif_common crc32c_intel drm_kms_helper ixgbe ttm mdio ahci igb libahci drm ptp
      pps_core libata dca i2c_algo_bit [81375.326369]  i2c_core megaraid_sas dm_mirror
      dm_region_hash dm_log dm_mod [81375.326396] CPU: 17 PID: 0 Comm: swapper/17 Not tainted
      3.10.0-327.el7.x86_64 #1 [81375.326422] Hardware name: Cisco Systems Inc
      UCSC-C240-M4L/UCSC-C240-M4L, BIOS C240M4.2.0.8b.0.080620151546 08/06/2015 [81375.326459] task:
      ffff88140ecec500 ti: ffff88140ed10000 task.ti: ffff88140ed10000 [81375.326485] RIP:
      0010:[<ffffffff812f94b1>]  [<ffffffff812f94b1>] rb_next+0x1/0x50 [81375.326514] RSP:
      0018:ffff88140ed13e10  EFLAGS: 00010046 [81375.326534] RAX: 0000000000000000 RBX:
      0000000000000000 RCX: 0000000000000000 [81375.326559] RDX: ffff88282f1d4800 RSI:
      ffff88280bc0f140 RDI: 0000000000000010 [81375.326584] RBP: ffff88140ed13e58 R08:
      0000000000000000 R09: 0000000000000001 [81375.326609] R10: 0000000000000000 R11:
      0000000000000001 R12: ffff88280b0e7000 [81375.326634] R13: 0000000000000000 R14:
      0000000000000000 R15: 0000000000b6f979 [81375.326659] FS:  0000000000000000(0000)
      GS:ffff88282f1c0000(0000) knlGS:0000000000000000 [81375.326688] CS:  0010 DS: 0000 ES: 0000
      CR0: 0000000080050033 [81375.326708] CR2: 0000000000000010 CR3: 000000000194a000 CR4:
      00000000001407e0 [81375.326733] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
      0000000000000000 [81375.326758] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
      0000000000000400 [81375.326783] Stack: [81375.326792]  ffff88140ed13e58 ffffffff810bf539
      ffff88282f1d4780 ffff88282f1d4780 [81375.326826]  ffff88140ececae8 ffff88282f1d4780
      0000000000000011 ffff88140ed10000 [81375.326861]  0000000000000000 ffff88140ed13eb8
      ffffffff8163a10a ffff88140ecec500 [81375.326895] Call Trace: [81375.326912]
      [<ffffffff810bf539>] ? pick_next_task_fair+0x129/0x1d0 [81375.326940]  [<ffffffff8163a10a>]
      __schedule+0x12a/0x900 [81375.326961]  [<ffffffff8163b9e9>] schedule_preempt_disabled+0x29/0x70
      [81375.326987]  [<ffffffff810d6244>] cpu_startup_entry+0x184/0x290 [81375.327011]
      [<ffffffff810475fa>] start_secondary+0x1ba/0x230 [81375.327032] Code: e5 48 85 c0 75 07 eb 19 66
      90 48 89 d0 48 8b 50 10 48 85 d2 75 f4 48 8b 50 08 48 85 d2 75 eb 5d c3 31 c0 5d c3 0f 1f 44
      00 00 55 <48> 8b 17 48 89 e5 48 39 d7 74 3b 48 8b 47 08 48 85 c0 75 0e eb [81375.327157] RIP
      [<ffffffff812f94b1>] rb_next+0x1/0x50 [81375.327179]  RSP <ffff88140ed13e10> [81375.327192] CR2:
      0000000000000010

Workaround: Upgrade to the latest-available kernel for your Red Hat or CentOS release to avoid the above system panic.

149789783 Resource Groups Significant Pivotal Greenplum performance degradation has been observed when enabling resource group-based workload management on Red Hat 6.x, CentOS 6.x, and SuSE 11 systems. This issue is caused by a Linux cgroup kernel bug. This kernel bug has been fixed in CentOS 7.x and Red Hat 7.x systems.

When resource groups are enabled on systems with an affected kernel, there can be a delay of 1 second or longer when starting a transaction or a query. The delay is caused by a Linux cgroup kernel bug where a synchronization mechanism called synchronize_sched is abused when a process is attached to a cgroup. See http://www.spinics.net/lists/cgroups/msg05708.html and https://lkml.org/lkml/2013/1/14/97 for more information.

The issue causes single attachment operations to take longer and also causes all concurrent attachments to be executed in sequence. For example, one process attachment could take about 0.01 second. When concurrently attaching 100 processes, the fastest process attachment takes 0.01 second and the slowest takes about 1 second. Pivotal Greenplum performs process attachments when a transaction or queries are started. So the performance degradation is dependent on concurrent started transactions or queries, and not related to concurrent running queries. Also Pivotal Greenplum has optimizations to bypass the rewriting when a QE is reused by multiple queries in the same session.

Workaround: This bug does not affect CentOS 7.x and Red Hat 7.x systems.

If you use Red Hat 6 and the performance with resource groups is acceptable for your use case, upgrade your kernel to version 2.6.32-696 or higher to benefit from other fixes to the cgroups implementation.

SuSE 11 does not have a kernel version that resolves this issue; resource groups are still considered to be an experimental feature on this platform. Resource groups are not supported on SuSE 11 for production use.

150906510 Backup and Restore Greenplum Database 4.3.15.0 and later backups contain the following line in the backup files:
SET gp_strict_xml_parse = false;

However, Greenplum Database 5.0.0 does not have a parameter named gp_strict_xml_parse. When you restore the 4.3 backup set to the 5.0.0 cluster, you may see the warning:

[WARNING]:-gpdbrestore finished but ERRORS were found, please check the restore report file for details

Also, the report file may contain the error:

ERROR:  unrecognized configuration parameter "gp_strict_xml_parse"

These warnings and errors do not affect the restoration procedure, and can be ignored.

Update for gp_toolkit.gp_bloat_diag Issue

In Greenplum Database 5.3.0 or an earlier 5.x release, Greenplum Database returned an integer out of range error in some cases when performing a query against the gp_toolkit.gp_bloat_diag view. The issue was resolved in Greenplum Database 5.4.0 (resolved issue 26518) .

When updating Greenplum Database, the gp_toolkit.gp_bloat_diag function and view must be updated in databases created with a Greenplum Database 5.3.0 or an earlier 5.x release. This issue has been fixed in databases created with Greenplum Database 5.4.0 and later.

To check whether the gp_toolkit.gp_bloat_diag function and view in a database requires an update, run the psql command \df to display information about the gp_toolkit.gp_bloat_diag function.

\df gp_toolkit.gp_bloat_diag

If the data type for btdexppages is integer, an update is required. If the data type is numeric an update is not required. In this example, the btdexppages data type is integer and requires an update.

List of functions
-[ RECORD 1 ]-------+------------------------------------------------------------------------------------------------
Schema              | gp_toolkit
Name                | gp_bloat_diag
Result data type    | record
Argument data types | btdrelpages integer, btdexppages integer, aotable boolean, OUT bltidx integer, OUT bltdiag text
Type                | normal

Run the following script to update the function and view to fix the issue on each database that was created with Greenplum Database 5.3.0 or an earlier 5.x release.

As the gpadmin user, follow these steps.

  1. Copy the script into a text file on the Greenplum Database master.
  2. Run the script on each database that requires the update.
    This example updates gp_toolkit.gp_bloat_diag function and view in the database mytest and assumes that the script is in the update_bloat_diag.sql in the gpadmin home directory.
    psql -f /home/gpadmin/update_bloat_diag.sql -d mytest

Run the script during a low activity period. Running the script during a high activity period does not affect database functionality but might affect performance.

Script to Update gp_toolkit.gp_bloat_diag Function and View
BEGIN;
CREATE OR REPLACE FUNCTION gp_toolkit.gp_bloat_diag(btdrelpages int, btdexppages numeric, aotable bool,
    OUT bltidx int, OUT bltdiag text)
AS
$$
    SELECT
        bloatidx,
        CASE
            WHEN bloatidx = 0
                THEN 'no bloat detected'::text
            WHEN bloatidx = 1
                THEN 'moderate amount of bloat suspected'::text
            WHEN bloatidx = 2
                THEN 'significant amount of bloat suspected'::text
            WHEN bloatidx = -1
                THEN 'diagnosis inconclusive or no bloat suspected'::text
        END AS bloatdiag
    FROM
    (
        SELECT
            CASE
                WHEN $3 = 't' THEN 0
                WHEN $1 < 10 AND $2 = 0 THEN -1
                WHEN $2 = 0 THEN 2
                WHEN $1 < $2 THEN 0
                WHEN ($1/$2)::numeric > 10 THEN 2
                WHEN ($1/$2)::numeric > 3 THEN 1
                ELSE -1
            END AS bloatidx
    ) AS bloatmapping

$$
LANGUAGE SQL READS SQL DATA;

GRANT EXECUTE ON FUNCTION gp_toolkit.gp_bloat_diag(int, numeric, bool, OUT int, OUT text) TO public;

CREATE OR REPLACE VIEW gp_toolkit.gp_bloat_diag
AS
    SELECT
        btdrelid AS bdirelid,
        fnnspname AS bdinspname,
        fnrelname AS bdirelname,
        btdrelpages AS bdirelpages,
        btdexppages AS bdiexppages,
        bltdiag(bd) AS bdidiag
    FROM
    (
        SELECT
            fn.*, beg.*,
            gp_toolkit.gp_bloat_diag(btdrelpages::int, btdexppages::numeric, iao.iaotype::bool) AS bd
        FROM
            gp_toolkit.gp_bloat_expected_pages beg,
            pg_catalog.pg_class pgc,
            gp_toolkit.__gp_fullname fn,
            gp_toolkit.__gp_is_append_only iao

        WHERE beg.btdrelid = pgc.oid
            AND pgc.oid = fn.fnoid
            AND iao.iaooid = pgc.oid
    ) as bloatsummary
    WHERE bltidx(bd) > 0;

GRANT SELECT ON TABLE gp_toolkit.gp_bloat_diag TO public;
COMMIT;