Pivotal Greenplum 5.19.0 Release Notes

A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 5.x documentation.

Pivotal Greenplum 5.19.0 Release Notes

Welcome to Pivotal Greenplum 5.19.0

Pivotal Greenplum Database is a massively parallel processing (MPP) database server that supports next generation data warehousing and large-scale analytics processing. By automatically partitioning data and running parallel queries, it allows a cluster of servers to operate as a single database supercomputer performing tens or hundreds times faster than a traditional database. It supports SQL, MapReduce parallel processing, and data volumes ranging from hundreds of gigabytes, to hundreds of terabytes.

This document contains pertinent release information about Pivotal Greenplum Database 5.19.0. For previous versions of the release notes for Greenplum Database, go to Pivotal Greenplum Database Documentation. For information about Greenplum Database end of life, see the Pivotal Support Lifecycle Policy.

Pivotal Greenplum 5.x software is available for download from the Pivotal Greenplum page on Pivotal Network.

Pivotal Greenplum 5.x is based on the open source Greenplum Database project code.

Important: The Greenplum gpbackup and gprestore utilities are now distributed separately from Pivotal Greenplum Database, and are updated independently of the core Greenplum Database server. These utilities will not be updated in future Greenplum Database 5.x releases. You can upgrade to the latest gpbackup and gprestore versions by downloading and installing the latest Pivotal Greenplum Backup and Restore release from Pivotal Network.
Important: Pivotal Support does not provide support for open source versions of Greenplum Database. Only Pivotal Greenplum Database is supported by Pivotal Support.

Pivotal Greenplum 5.19.0 is a minor release. This release contains a changed feature and resolves some issues.

New Features

Greenplum Database 5.19.0 includes the following new features:

  • PXF version 5.4.0 is included, which introduces several new and changed features and bug fixes. See PXF Version 5.4.0 below.
  • Greenplum 5.19 is compatible with Greenplum Command Center 4.7, which includes new features and bug fixes. See the Greenplum Command Center documentation for more information.
  • The PL/R and PL/Java packages were updated to version 2.3.4 and version 1.4.4, respectively. Both procedural languages can now be registered using the CREATE EXTENSION command as described in the documentation.
  • A new version of the GPORCA optimizer is included, with performance improvements and bug fixes. See Resolved Issues for more information.

PXF Version 5.4.0

New Features

PXF includes the following new features:

  • PXF now supports column projection. This enables connectors to project only the specific columns requested by the user. The JDBC, Parquet, and HiveORC profiles leverage this PXF feature to support column projection. See About Column Projection in PXF for information about PXF column projection support.
  • PXF now supports file-based server configuration for the JDBC Connector. When you initialize PXF, it copies a jdbc-site.xml template file to the $PXF_CONF/templates directory. You can use this template as a guide for PXF JDBC server configuration. Refer to Configuring the JDBC Connector for more information.
  • The PXF JDBC Connector now provides the QUOTE_COLUMNS option to control whether PXF should quote column names.
  • PXF exposes new configuration options that you can specify when you write Parquet data to HDFS or to an object store. These options allow you to set the row group size, page size, dictionary page size, and the Parquet version. Refer to the Parquet Custom Options topic for syntax and usage information.
  • You can specify the host name and/or port number of the PXF agent. The configuration procedure is described in Configuring the PXF Agent Host and Port.
  • The PXF JDBC Connector now supports specifying per-connection, per-session, and per-statement properties via the JDBC server configuration. See JDBC Server Configuration for detailed information about setting and using these properties.
  • The PXF JDBC Connector also supports specifying the connection transaction isolation mode via the JDBC server configuration. Refer to Connection Transaction Isolation Property for information about this property.
  • The pxf cluster init and sync subcommands now also perform the operation on the Greenplum Database standby master host. With this new feature, PXF will continue to function after fail over from the Greenplum Database master host to the standby host.
  • PXF now supports the pxf cluster status subcommand. This command displays the status of the PXF service instance on each segment host in the Greenplum Database cluster. Refer to the pxf cluster reference page for more information.
  • PXF has improved memory usage by caching external file metadata per query within each PXF JVM. This caching is on by default. PXF Fragment Metadata Caching further describes this feature and its configuration procedure.
  • The new PXF JDBC Connector named query feature enables you to specify a statically-defined query to run against the remote SQL database. Refer to JDBC Named Query Configuration and About Using Named Queries for information about this feature.
  • PXF now supports reading a multi-line text or JSON file stored in HDFS or an object store as a single row. This feature enables you to read a directory of files into a single external table. Refer to Reading a Multi-Line Text File into a Single Table Row for more information about this feature.

Changed Features

PXF includes the following changes:

  • PXF now bundles Hadoop 2.9.2 libraries.
  • PXF has increased the date precision to microseconds when reading Parquet files from an external system.
  • PXF now maps date and timestamp data types between Greenplum Database and Parquet as follows:
    • PXF previously serialized a repeated Greenplum Database date type to/from JSON as a numeric timestamp. PXF now serializes a repeated date type to/from JSON as a string.
    • When writing a Greenplum Database timestamp type field to Parquet, PXF previously converted the timestamp directly to an int96. PXF now localizes the timestamp to the current system timezone and converts it to universal time (UT) before finally converting to int96.
  • The PXF Hive Connector hive-site.xml template file now includes the hive.metastore.integral.jdo.pushdown property. This property is required to enable partition filter pushdown for Hive integral types. Refer to Partition Filter Pushdown for more information about the PXF Hive Connector's support for this feature.
  • The pxf cluster sync command no longer verifies that it is run on the Greenplum Database master host.
  • PXF has improved the pxf cluster sync command. The command now uses rsync to synchronize configuration from the master to the segments. Previously, the segment hosts were initiating the rsync operation.

Resolved Issues

PXF includes the following resolved issues:

29832
The PXF JDBC Connector in some cases did not propagate an INSERT error to the user. As a result, the operation appeared to succeed when it actually had not.
This issue has been resolved. The PXF JDBC Connector now correctly propagates the INSERT error to the user.
164473961
In some cases, a SELECT query on a PXF external table that referenced a Hive table incorrectly generated a Hive MetaStore connection error when the Hive service was running in a Kerberos-secured Hadoop cluster. This situation occurred because PXF did not instantiate the connect request with the Hive configuration settings.
This issue has been resolved. The PXF Hive Connector now properly includes the Hive configuration settings.
29896
In some situations, the PXF JDBC Connector returned an out of memory error when you queried a PXF external table that referenced a large table in the external SQL database.
This issue has been resolved. The PXF JDBC Connector now supports a configurable read fetch size property. The availability and default value of this property mitigates potential memory issues.
165265354
In some cases, when you performed a query with an OR or NOT predicate on a PXF external table that referenced a partitioned Hive table, the PXF Hive Connector may have incorrectly pruned partitions and returned an incomplete data set.
This issue has been resolved. The PXF Hive Connector now correctly handles OR predicates, and does not push down a predicate in which it encounters a relational or logical filter operator that it does not support. Note that the Connector does not currently optimize pushdown of NOT predicates.
29916
The PXF JDBC Connector returned an error when you accessed a date field in an Oracle database, you specified a filter with a timestamp type cast, and the filter value included an hour outside of the range 0-11 inclusive.
This issue has been resolved. PXF now uses 24 hour timestamp format when accessing date fields in an Oracle database.

Beta Features

Because Pivotal Greenplum Database is based on the open source Greenplum Database project code, it includes several Beta features to allow interested developers to experiment with their use on development systems. Feedback will help drive development of these features, and they may become supported in future versions of the product.

Warning: Beta features are not supported for production deployments.
Greenplum Database 5.19.0 includes these Beta features:
  • Storage plugin API for gpbackup and gprestore. Partners, customers, and OSS developers can develop plugins to use in conjunction with gpbackup and gprestore.

    For information about the storage plugin API, see Backup/Restore Storage Plugin API.

  • Recursive WITH Queries (Common Table Expressions). See WITH Queries (Common Table Expressions).
  • Resource groups remain a Beta feature only on the SuSE 11 platform, due to limited cgroups functionality in the kernel.

    SuSE 12 resolves the Linux cgroup issues that caused the performance degradation when Greenplum Database resource groups are enabled.

  • Using the Greenplum Platform Extension (PXF) connectors to write Parquet data is a Beta feature.

Deprecated Features

Deprecated features will be removed in a future major release of Greenplum Database. Pivotal Greenplum 5.x deprecates:

  • The --skip_root_stats option to analyzedb (deprecated since 5.18).

    The default behavior of analyzedb is to skip analyzing root partitions (it behaves as if the flag were always used).

  • The gptransfer utility (deprecated since 5.17).

    The utility copies objects between Greenplum Database systems. The gpcopy utility provides gptransfer functionality.

  • The gphdfs external table protocol (deprecated since 5.17).

    Consider using the Greenplum Platform Extension Framework (PXF) pxf external table protocol to access data stored in an external Hadoop file system. Refer to Accessing External Data with PXF for more information.

  • The Greenplum Platform Extension Framework (PXF) HDFS profile names for the Text, Avro, JSON, Parquet, and SequenceFile data formats (deprecated since 5.16).

    Refer to Connectors, Data Formats, and Profiles in the PXF Hadoop documentation for more information.

  • The server configuration parameter gp_max_csv_line_length (deprecated since 5.11).

    For data in a CSV formatted file, the parameter controls the maximum allowed line length that can be imported into the system).

  • The server configuration parameter gp_unix_socket_directory (deprecated since 5.9).
    Note: Do not change the value of this parameter. The default location is required for Greenplum Database utilities.
  • Support for Data Domain Boost 3.0.0.3 (deprecated since 5.2).

    The DELL EMC end of Primary Support date is December 31, 2017.

  • These unused catalog tables (deprecated since 5.1):
    • gp_configuration
    • gp_db_interfaces
    • gp_interfaces
  • The gpcrondump and gpdbrestore utilities (deprecated since 5.0).

Resolved Issues

The listed issues are resolved in Pivotal Greenplum Database 5.19.0.

For issues resolved in prior 5.x releases, refer to the corresponding release notes. Release notes are available from the Pivotal Greenplum page on Pivotal Network.

165755867 - Query Optimizer
Fixed a problem where GPORCA added unnecessary redistribute motions in the Append portion of a query plan, for UNION ALL queries. This fix improves the performance of such queries by ensuring that no redistribute motions occur for any children of the Append.
165298331 - Query Optimizer
GPORCA was enhanced to convert a FULL OUTER JOIN into a LEFT JOIN for queries having predicates that suggest one side cannot be NULL. This change improves performance for the affected queries.
165026785 - Query Optimizer
Fixed a problem where the GPORCA optimizer could choose a slower plan that performed an IndexedNestedLoopJoin with broadcast motions. GPORCA no longer adds the unnecessary broadcast motions.
165032693 - Query Optimizer
The GPORCA optimizer now performs column elimination with UNION queries against column-oriented tables. This change improves performance for the specified queries.
29842 - Server Parser
If a user-defined function that truncated a partitioned table was canceled and then re-started, a panic condition could occur during the planning stage. This problem occurred because Greenplum modified the function's cached version of the plan, in order to provide segments with the partitions that need to be truncated. The problem was resolved by making a copy of the original TRUNCATE statement to modify and dispatch to segments, while leaving the original cached statement unchanged.
29621 - Query Optimizer
The GPORCA optimizer did not properly recognize Binary Coercible Casts (BCCs) when inferring predicates. For example, if a predicate filtered a varchar column against a text constant value, GPORCA did not properly handle the implicit varchar to text cast required to infer the predicate. This could result in queries that performed poorly in GPORCA compared to the Postgres Planner. This problem has been resolved.
29850 - Query Optimizer
In some cases, GPORCA generated a suboptimal plan for queries that specified a FULL JOIN condition. This could result in queries that performed poorly in GPORCA compared to the Postgres Planner. This problem has been resolved. Greenplum Database now falls back to the Postgres Planner when it encounters a query that specifies a FULL JOIN condition.
29847 - Query Optimizer
In some cases, GPORCA did not eliminate partitions properly due to incorrectly inferred predicates. This caused a query to run much slower with GPORCA than when run with the legacy planner. This problem has been resolved. GPORCA now pushes predicates down as far as possible before inferring.
29823 - Query Optimizer
Fixed a problem where an UPDATE command (delete and insert operations) on a partitioned table failed because the delete used the project list for the insert to select the partition to delete from. The delete operation now projects the columns of the table that is being updated for partition selection.
29767 - Storage: Catalog and Metadata
When a table has both a PRIMARY KEY and a DISTRIBUTED BY clause, the DISTRIBUTED BY columns must be equal to or a left-subset of the PRIMARY KEY. However, this rule could be broken by altering the DISTRIBUTED BY columns after the table is created. It is also possible to DROP or ADD a primary key constraint to break this rule. This issue is resolved. The rule is now enforced when altering the distribution key or creating a unique index for the primary key.
29702 - Storage: Transaction Management
Fixed a problem where an INSERT statement executed during a VACUUM operation could fail with a message ERROR: cannot insert into segno ....
29856 - Legacy Optimizer
When using the legacy optimizer, a query on a partitioned table where the partition key is used in a >= elimination, and where the partition key is also used in a partition_key = other_field clause, caused the database to go into a crash recovery. This has been resolved.
29718 - Legacy Optimizer
The legacy optimizer produced an incorrect Window plan with a set returning function (SRF) in the target list. This has been resolved.
29866 - SSL
WAL replication processes restarted on the master and standby during long-lived processes such as gprestore and gpcheckcat due to a problem with OpenSSL connection renegotiation. This is resolved. The ssl_renegotiation_limit parameter is set to 0 by default to disable connection renegotiation.
29597 - Distributed Transaction Manager
During session initialization, the cdb transaction manager chooses a superuser to perform distributed transaction recovery. The code assumed that superusers have a rolvaliduntil value set to NULL, which means the role never expires. A superuser could be chosen that was not permitted to log in, which could cause a SIGSEGV. This is resolved. Now the rolvaliduntil value is validated to ensure the superuser can log in.
If all superusers have an expired rolvaliduntil value, a non-superuser will get an error, instead of a SIGSEGV, when attempting to log in after restarting the master.
29853 - gpbackup/gprestore
When a role was assigned the permission CREATEEXTTABLE(protocol='gpfdist', type='readable'), the gpbackup utility produced a CREATE ROLE command with an incorrect protocol and a missing type argument. This issue has been resolved.
29580 - Logger
During log rotation, the next log rotation time could be set multiple times, which caused an extra zero-length log file to be created. This has been fixed.
26225 - gpcheckcat
The gpcheckcat utility summary report failed to generate with the message Execution error: ERROR: column "none" does not exist. The error occurred when there was an orphan toast table entry in a segment. This has been fixed. The summary report now generates the list of tables and their inconsistencies.
29867 - MPP: Dispatch
A complicated query that includes a join and subquery could hang because of a data motion deadlock between segments. The issue is fixed by detecting the deadlock risk and generating a better plan to avoid the deadlock.
29854 - ANALYZE
If a table has significant bloat, it may have many empty pages. Due to the sampling method ANALYZE uses, this may result in an estimate of 0 rows when in fact many rows exist in a table, causing poor query performance. ANALYZE now emits a NOTICE if it happens to sample only empty pages so that the user can take appropriate action:
NOTICE: ANALYZE detected all empty sample pages for table pg_namespace, please run VACUUM FULL for accurate estimation.

Known Issues and Limitations

Pivotal Greenplum 5.x has these limitations:

  • Upgrading a Greenplum Database 4.3.x release to Pivotal Greenplum 5.x is not supported. See Migrating Data to Pivotal Greenplum 5.x.
  • Some features are works-in-progress and are considered to be Beta features. Pivotal does not support using Beta features in a production environment. See Beta Features.
  • Greenplum Database 4.3.x packages are not compatible with Pivotal Greenplum 5.x.

The following table lists key known issues in Pivotal Greenplum 5.x.

Table 1. Key Known Issues in Pivotal Greenplum 5.x
Issue Category Description
MPP-29861 gpstart If a segment's data directory becomes inaccessible or the contents of the data directory are deleted, the segment goes offline and is marked as down in gp_segment_configuration. However, if you have temporary or transaction files in a separate filespace or if you stop and start the database, gpstart can fail with the error:
20190404:17:26:44:025089 gpstart:mdw:gpadmin-[ERROR]:-Multiple OIDs found in flat files
20190404:17:26:44:025089 gpstart:mdw:gpadmin-[ERROR]:-Filespaces are inconsistent. Abort Greenplum Database start.

The error indicates that two files are missing from a segment directory, gp_temporary_files_filespace and gp_transaction_files_filespace. gpstart reports the error even if the segments have already been marked down.

Workaround: To resolve this problem you must manually create the missing files in the segment directory using the segment's corresponding dbid, transaction file segment directory, and filespace oid. See the detailed instructions for creating these files in Article Number: 6784 gpstart fails with error "Multiple OIDs found in flat files" on the Pivotal Support site.

164671144 gpssh-exkeys

The gpssh-exkeys utility uses the Paramiko SSH library for Python, which has a dependency on the Python Cryptography Toolkit (PyCrypto) library. The following security vulnerabilities have been identified in some versions of PyCrypto.

  • https://nvd.nist.gov/vuln/detail/CVE-2018-6594 - lib/Crypto/PublicKey/ElGamal.py in PyCrypto through 2.6.1 generates weak ElGamal key parameters, which allows attackers to obtain sensitive information by reading ciphertext data (i.e., it does not have semantic security in face of a ciphertext-only attack). The Decisional Diffie-Hellman (DDH) assumption does not hold for PyCrypto's ElGamal implementation.

    Paramiko does not import this algorithm, so Greenplum Database is unaffected by this vulnerability.

  • https://nvd.nist.gov/vuln/detail/CVE-2013-7459 - Heap-based buffer overflow in the ALGnew function in block_templace.c in Python Cryptography Toolkit (aka pycrypto) allows remote attackers to execute arbitrary code as demonstrated by a crafted iv parameter to cryptmsg.py.

    This bug was introduced in PyCrypto 2.6. Greenplum Database has PyCrypto 2.0.1, and is unaffected by the vulnerability.

  • https://nvd.nist.gov/vuln/detail/CVE-2013-1445 - The Crypto.Random.atfork function in PyCrypto before 2.6.1 does not properly reseed the pseudo-random number generator (PRNG) before allowing a child process to access it, which makes it easier for context-dependent attackers to obtain sensitive information by leveraging a race condition in which a child process is created and accesses the PRNG within the same rate-limit period as another process.

    Paramiko version 1.7.6-9, used in Greenplum Database 4.x and 5.x, introduced a workaround to this bug. Paramiko version 1.18.4 in Greenplum Database 6 solves it in a more permanent way.

  • https://nvd.nist.gov/vuln/detail/CVE-2012-2417 - PyCrypto before 2.6 does not produce appropriate prime numbers when using an ElGamal scheme to generate a key, which reduces the signature space or public key space and makes it easier for attackers to conduct brute force attacks to obtain the private key.

    Paramiko does not import this algorithm, so Greenplum Database is unaffected by the vulnerability.

Through testing and investigation, Pivotal has determined that these vulnerabilities do not affect Greenplum Database, and no actions are required for existing Greenplum Database 4.3 or 5.x releases. However, there may be additional unidentified vulnerabilities in the PyCrypto library, and users who install a later version of PyCrypto could be exposed to other vulnerabilities.

The PyCrypto library will be removed from Greenplum Database 6.0.

Workaround: Administrators can set up passwordless SSH between hosts in the Greenplum Database cluster without using the gpssh-exkeys utility. This must be done before initializing the Greenplum Database system.

  1. Create an SSH key for the gpadmin user on each host. For example, log in to each host as gpadmin and use the ssh-keygen command to generate an SSH key. Do not enter a passphrase.
  2. Add the public key for each host in the cluster to every other host in the cluster. For example, use the ssh-copy-id command from each host to copy the public key to every other host in the cluster.

When adding new hosts to the Greenplum Database system, you must create a new SSH key for each new host and exchange keys between the existing hosts and new hosts.

165162549 GPORCA

In Greenplum Database 5.18.0, the analyzedb utility was modified to not run ANALYZE ROOTPARTION on partitioned tables. This change was made because the ANALYZE command merges the statistics for the root partition after all the leaf partitions of a partitioned table have been analyzed. However, since analyzedb runs ANALYZE in parallel on the leaf partitions, the statistics on the root partition do not get updated. This causes GPORCA to pick a bad plan.

The workaround is to run analyzedb with a parallelism of 1. For example:
analyzedb -p 1 dbname
N/A Greenplum Stream Server

The Pivotal Greenplum Stream Server (GPSS does not support loading data from multiple Kafka topics to the same Greenplum Database table. All jobs will hang if GPSS encounters this situation.

29766 VACUUM A long-running catalog query can block VACUUM operations on the system catalog until the query completes or is canceled. This type of blocking cannot be observed using pg_locks, and the VACUUM operation itself cannot be canceled until the long-running query completes or is canceled.
29699 ANALYZE In Greenplum Database 5.15.1 and earlier 5.x releases, an ANALYZE command might return a error that states target lists can have at most 1664 entries when performing an ANALYZE operation on a table with a large number of columns (more than 800 columns). The error occurs because the in-memory sample table created by ANALYZE requires an additional column to indicate whether a column is NULL or is a truncated column for each variable length column being analyzed (such as varchar, text, and bpchar, numeric, arrays, and geometric datatype columns). The error is returned when ANALYZE attempts to create a sample table and the number of columns (table columns and indicator columns) exceeds the maximum number of columns allowed.

In Greenplum Database 5.16.0 and later 5.x releases, the ANALYZE sample table does not require an additional column for variable length text columns such as varchar, text, and bpchar columns.

WORKAROUND: To collect statistics on the table, perform ANALYZE operations on 2 or more sets of table columns.

162317340 Client Tools

On Pivotal Network in the file listings for Greenplum Database releases between 5.7.1 and 5.14.0, the Greenplum Database AIX Client Tools download file is incorrectly labeled as Loaders for AIX 7. The file you download is the correct AIX 7 Client Tools file.

29674 VACUUM Performing parallel VACUUM operations on a catalog table such as pg_class, gp_relation_node, or pg_type and another table causes a deadlock and blocks connections to the database.

Workaround: Avoid performing parallel VACUUM operations on catalog tables and user tables.

29139 DML In some cases for an append-optimized partitioned table, Greenplum Database acquires a ROW EXCLUSIVE lock on all leaf partitions of the table when inserting data directly into one of the leaf partitions of the table. The locks are acquired the first time Greenplum Database performs validation on the leaf partitions. When inserting data into one leaf partition, the locks are not acquired on the other leaf partitions as long as the validation information remains in memory.

The issue does not occur for heap-storage partitioned tables.

29523 gptoolkit An upgrade between minor releases does not update the template0 database, and in some cases, using these views in the gp_toolkit schema might cause issues if you create a database using template0 as the template database after you upgrade to Greenplum Database 5.11.0 or later.
  • The gp_toolkit.gp_bloat_expected_pages view might incorrectly report that a root partition table is bloated even though root partition tables do not contain data if you upgrade from Greenplum Database 5.10.x or earlier.
  • The gp_toolkit.gp_bloat_diag view might return an integer out of range error in some cases if you upgrade from Greenplum Database 5.3.0 or earlier.

For example, the issues might occur if you upgrade a Greenplum Database system from 5.3.0 or an earlier 5.x release and then run a gprestore operation with the --redirect-db option to create a new database. The utility creates a new database using template0 as the template database.

Workaround: You can update the views in the gp_toolkit schema in the new database. For information about checking and updating gp_toolkit, see Update for gp_toolkit.gp_bloat_expected_pages Issue and Update for gp_toolkit.gp_bloat_diag Issue.

29485 Catalog and Metadata When a session creates temporary objects in a database, Greenplum Database might not the drop temporary objects when the session ends if the session terminates abnormally or is terminated from an administrator command.
29496 gpconfig For a small number of server configuration parameters such as log_min_messages, the command gpconfig -s <config_param> does not display the correct value of the parameter for the segment hosts when the value of the parameter on master is different than the value on the segments.

For parameters with the set classification master, the utility displays the value set on the master for both master and segments (for information about set classifications, see Setting Parameters). For those parameters, the value on the master is passed as part of queries to segment instances. The SQL query that gpconfig runs to display the master and segment parameter values returns the master host value that is passed to the segment as part of the query.

For a few parameters such as log_min_messages, segment instances use the segment host value specified in the postgresql.conf file at start up. The segment value can overridden for the scope of a query.

Workaround: To display the parameter value specified in the postgresql.conf file on the master host and segment hosts, you can specify the gpconfig option --file.

29395 DDL The gpdbrestore or gprestore utility fails when the utility attempts to restore a table from a backup and the table is incorrectly defined with duplicate columns as distribution keys. The issue is caused when the gpcrondump or gpbackup utility backed up a table that is incorrectly defined. The CREATE TABLE AS command could create a table that is incorrectly defined with a distribution policy that contains duplicate columns as distribution keys.

The CREATE TABLE ISSUE has been resolved. Now the CREATE TABLE AS command does not create the specified type of table. The command returns an error. However, restore operations will continue to fail if you try to restore the incorrectly defined tables from a backup.

29351 gptransfer The gptransfer utility can copy a data row with a maximum length of 256 MB.
158011506 Catalog and Metadata In some cases, the timezone used by Greenplum Database might be different than the host system timezone, or the Greenplum Database timezone set by a user. In some rare cases, times used and displayed by Greenplum Database might be slightly different than the host system time.

The timezone used by Greenplum Database is selected from a set of internally stored PostgreSQL timezones. Greenplum Database selects the timezone by matching a PostgreSQL timezone with the user specified time zone, or the host system time zone. For example, when selecting a default timezone, Greenplum Database uses an algorithm to select a PostgreSQL timezones based on the host system timezone. If the system timezone includes leap second information, Greenplum Database cannot match the system timezone with a PostgreSQL timezone. Greenplum Database calculates a best match with a PostgreSQL timezone based on information from the host system.

Workaround: Set the Greenplum Database and host system timezones to a timezone that is supported by both Greenplum Database and the host system. For example, you can show and set the Greenplum Database timezone with the gpconfig utility. These commands show the Greenplum Database timezone and set the timezone to US/Pacific.

# gpconfig -s TimeZone
# gpconfig -c TimeZone -v 'US/Pacific'

You must restart Greenplum Database after changing the timezone. The command gpstop -ra restarts Greenplum Database.

The Greenplum Database catalog view pg_timezone_names provides Greenplum Database timezone information.

N/A PXF PXF is available only for supported Red Hat and CentOS platforms. PXF is not available for supported SuSE platforms.
151135629 COPY command When the ON SEGMENT clause is specified, the COPY command does not support specifying a SELECT statement in the COPY TO clause. For example, this command is not supported.
COPY (SELECT * FROM testtbl) TO '/tmp/mytst<SEGID>' ON SEGMENT
29064 Storage: DDL The money data type accepts out-of-range values as negative values, and no error message is displayed.

Workaround: Use only in-range values for the money data type (32-bit for Greenplum Database 4.x, or 64-bit for Greenplum Database 5.x). Or, use an alternative data type such as numeric or decimal.

3290 JSON The to_json() function is not implemented as a callable function. Attempting to call the function results in an error. For example:
tutorial=# select to_json('Fred said "Hi."'::text); 
ERROR: function to_json(text) does not exist
LINE 1: select to_json('Fred said "Hi."'::text);
^
HINT: No function matches the given name and argument types. 
You might need to add explicit type casts.

Workaround: Greenplum Database invokes to_json() internally when casting to the json data type, so perform a cast instead. For example: SELECT '{"foo":"bar"}'::json; Greenplum Database also provides the array_to_json() and row_to_json() functions.

148119917 Resource Groups Testing of the resource groups feature has found that a kernel panic can occur when using the default kernel in RHEL/CentOS system. The problem occurs due to a problem in the kernel cgroups implementation, and results in a kernel panic backtrace similar to:
[81375.325947] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      [81375.325986] IP: [<ffffffff812f94b1>] rb_next+0x1/0x50 [81375.326014] PGD 0 [81375.326025]
      Oops: 0000 [#1] SMP [81375.326041] Modules linked in: veth ipt_MASQUERADE
      nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype
      iptable_filter xt_conntrack nf_nat nf_conntrack bridge stp llc intel_powerclamp coretemp
      intel_rapl dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio kvm_intel kvm crc32_pclmul
      ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt
      iTCO_vendor_support ses enclosure ipmi_ssif pcspkr lpc_ich sg sb_edac mfd_core edac_core
      mei_me ipmi_si mei wmi ipmi_msghandler shpchp acpi_power_meter acpi_pad ip_tables xfs
      libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect crct10dif_pclmul
      sysimgblt crct10dif_common crc32c_intel drm_kms_helper ixgbe ttm mdio ahci igb libahci drm ptp
      pps_core libata dca i2c_algo_bit [81375.326369]  i2c_core megaraid_sas dm_mirror
      dm_region_hash dm_log dm_mod [81375.326396] CPU: 17 PID: 0 Comm: swapper/17 Not tainted
      3.10.0-327.el7.x86_64 #1 [81375.326422] Hardware name: Cisco Systems Inc
      UCSC-C240-M4L/UCSC-C240-M4L, BIOS C240M4.2.0.8b.0.080620151546 08/06/2015 [81375.326459] task:
      ffff88140ecec500 ti: ffff88140ed10000 task.ti: ffff88140ed10000 [81375.326485] RIP:
      0010:[<ffffffff812f94b1>]  [<ffffffff812f94b1>] rb_next+0x1/0x50 [81375.326514] RSP:
      0018:ffff88140ed13e10  EFLAGS: 00010046 [81375.326534] RAX: 0000000000000000 RBX:
      0000000000000000 RCX: 0000000000000000 [81375.326559] RDX: ffff88282f1d4800 RSI:
      ffff88280bc0f140 RDI: 0000000000000010 [81375.326584] RBP: ffff88140ed13e58 R08:
      0000000000000000 R09: 0000000000000001 [81375.326609] R10: 0000000000000000 R11:
      0000000000000001 R12: ffff88280b0e7000 [81375.326634] R13: 0000000000000000 R14:
      0000000000000000 R15: 0000000000b6f979 [81375.326659] FS:  0000000000000000(0000)
      GS:ffff88282f1c0000(0000) knlGS:0000000000000000 [81375.326688] CS:  0010 DS: 0000 ES: 0000
      CR0: 0000000080050033 [81375.326708] CR2: 0000000000000010 CR3: 000000000194a000 CR4:
      00000000001407e0 [81375.326733] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
      0000000000000000 [81375.326758] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
      0000000000000400 [81375.326783] Stack: [81375.326792]  ffff88140ed13e58 ffffffff810bf539
      ffff88282f1d4780 ffff88282f1d4780 [81375.326826]  ffff88140ececae8 ffff88282f1d4780
      0000000000000011 ffff88140ed10000 [81375.326861]  0000000000000000 ffff88140ed13eb8
      ffffffff8163a10a ffff88140ecec500 [81375.326895] Call Trace: [81375.326912]
      [<ffffffff810bf539>] ? pick_next_task_fair+0x129/0x1d0 [81375.326940]  [<ffffffff8163a10a>]
      __schedule+0x12a/0x900 [81375.326961]  [<ffffffff8163b9e9>] schedule_preempt_disabled+0x29/0x70
      [81375.326987]  [<ffffffff810d6244>] cpu_startup_entry+0x184/0x290 [81375.327011]
      [<ffffffff810475fa>] start_secondary+0x1ba/0x230 [81375.327032] Code: e5 48 85 c0 75 07 eb 19 66
      90 48 89 d0 48 8b 50 10 48 85 d2 75 f4 48 8b 50 08 48 85 d2 75 eb 5d c3 31 c0 5d c3 0f 1f 44
      00 00 55 <48> 8b 17 48 89 e5 48 39 d7 74 3b 48 8b 47 08 48 85 c0 75 0e eb [81375.327157] RIP
      [<ffffffff812f94b1>] rb_next+0x1/0x50 [81375.327179]  RSP <ffff88140ed13e10> [81375.327192] CR2:
      0000000000000010

Workaround: Upgrade to the latest-available kernel for your Red Hat or CentOS release to avoid the above system panic.

149789783 Resource Groups Significant Pivotal Greenplum performance degradation has been observed when enabling resource group-based workload management on Red Hat 6.x, CentOS 6.x, and SuSE 11 systems. This issue is caused by a Linux cgroup kernel bug. This kernel bug has been fixed in CentOS 7.x and Red Hat 7.x systems.

When resource groups are enabled on systems with an affected kernel, there can be a delay of 1 second or longer when starting a transaction or a query. The delay is caused by a Linux cgroup kernel bug where a synchronization mechanism called synchronize_sched is abused when a process is attached to a cgroup. See http://www.spinics.net/lists/cgroups/msg05708.html and https://lkml.org/lkml/2013/1/14/97 for more information.

The issue causes single attachment operations to take longer and also causes all concurrent attachments to be executed in sequence. For example, one process attachment could take about 0.01 second. When concurrently attaching 100 processes, the fastest process attachment takes 0.01 second and the slowest takes about 1 second. Pivotal Greenplum performs process attachments when a transaction or queries are started. So the performance degradation is dependent on concurrent started transactions or queries, and not related to concurrent running queries. Also Pivotal Greenplum has optimizations to bypass the rewriting when a QE is reused by multiple queries in the same session.

Workaround: This bug does not affect CentOS 7.x and Red Hat 7.x systems.

If you use Red Hat 6 and the performance with resource groups is acceptable for your use case, upgrade your kernel to version 2.6.32-696 or higher to benefit from other fixes to the cgroups implementation.

SuSE 11 does not have a kernel version that resolves this issue; resource groups are still considered to be a Beta feature on this platform. Resource groups are not supported on SuSE 11 for production use.

163807792 gpbackup/ gprestore When the % sign was specified as the delimeter in an external table text format, gpbackup escaped the % sign incorrectly in the CREATE EXTERNAL TABLE command. This has been resolved. The % sign is correctly escaped.
150906510 Backup and Restore Greenplum Database 4.3.15.0 and later backups contain the following line in the backup files:
SET gp_strict_xml_parse = false;

However, Greenplum Database 5.0.0 does not have a parameter named gp_strict_xml_parse. When you restore the 4.3 backup set to the 5.0.0 cluster, you may see the warning:

[WARNING]:-gpdbrestore finished but ERRORS were found, please check the restore report file for details

Also, the report file may contain the error:

ERROR:  unrecognized configuration parameter "gp_strict_xml_parse"

These warnings and errors do not affect the restoration procedure, and can be ignored.

168142530 Backup and Restore Backups created on Greenplum Database versions before 4.3.33.0 or 5.1.19.0 may fail to restore to Greenplum Database versions 4.3.33.0 or 5.1.19.0 or later.

In Greenplum Database 4.3.33.0 and 5.1.19.0, a check was introduced to ensure that the distribution key for a table is equal to the primary key or is a left-subset of the primary key. If you add a primary key to a table that contains no data, Greenplum Database automatically updates the distribution key to match the primary key.

The index key for any unique index on a table must also match or be a left-subset of the distribution key.

Earlier Greenplum Database versions did not enforce these policies.

Restoring a table from an older backup that has a different distribution key causes errors because the backup data file on each segment contains data that was distributed using the original distribution key.

Restoring a unique index with a key that does not match the distribution key will fail with an error when attempting to create the index.

This issue affects the gprestore, gpdbrestore, and pg_restore utilities.

Differences Compared to Open Source Greenplum Database

Pivotal Greenplum 5.x includes all of the functionality in the open source Greenplum Database project and adds:
  • Product packaging and installation script.
  • Support for QuickLZ compression. QuickLZ compression is not provided in the open source version of Greenplum Database due to licensing restrictions.
  • Support for managing Greenplum Database using Pivotal Greenplum Command Center.
  • Support for full text search and text analysis using Pivotal GPText.
  • Support for data connectors:
    • Greenplum-Spark Connector
    • Greenplum-Informatica Connector
    • Greenplum-Kafka Integration
    • Gemfire-Greenplum Connector
    • Greenplum Stream Server
  • Data Direct ODBC/JDBC Drivers
  • gpcopy utility for copying or migrating objects between Greenplum systems.

Supported Platforms

Pivotal Greenplum 5.19.0 runs on the following platforms:

  • Red Hat Enterprise Linux 64-bit 7.x (See the following Note)
  • Red Hat Enterprise Linux 64-bit 6.x (See the following Note)
  • SuSE Linux Enterprise Server 64-bit 12 SP2 and SP3 with kernel version greater than 4.4.73-5. (See the following Note)
  • SuSE Linux Enterprise Server 64-bit 11 SP4 (See the following Note)
  • CentOS 64-bit 7.x
  • CentOS 64-bit 6.x (See the following Note)
  • Oracle Linux 64-bit 7.4, using the Red Hat Compatible Kernel (RHCK)
Note: For the supported Linux operating systems, Pivotal Greenplum Database is supported on system hosts using either AMD or Intel CPUs based on the x86-64 architecture. Pivotal recommends using a homogeneous set of hardware (system hosts) in a Greenplum Database system.
Important: Significant Greenplum Database performance degradation has been observed when enabling resource group-based workload management on Red Hat 6.x, CentOS 6.x, and SuSE 11 systems. This issue is caused by a Linux cgroup kernel bug. This kernel bug has been fixed in CentOS 7.x and Red Hat 7.x systems.

If you use Red Hat 6 and the performance with resource groups is acceptable for your use case, upgrade your kernel to version 2.6.32-696 or higher to benefit from other fixes to the cgroups implementation.

SuSE 11 does not have a kernel version that resolves this issue; resource groups are still considered to be a Beta feature on this platform. Resource groups are not supported on SuSE 11 for production use. See known issue 149789783.

Pivotal Greenplum on SuSE 12 supports resource groups for production use. SuSE 12 resolves the Linux cgroup kernel issues that caused the performance degradation when Greenplum Database resource groups are enabled.

Note: For Greenplum Database that is installed on Red Hat Enterprise Linux 7.x or CentOS 7.x prior to 7.3, an operating system issue might cause Greenplum Database that is running large workloads to hang in the workload. The Greenplum Database issue is caused by Linux kernel bugs.

RHEL 7.3 and CentOS 7.3 resolves the issue.

Note: Greenplum Database on SuSE Linux Enterprise systems does not support these features.
  • The PL/Perl procedural language
  • The gpmapreduce tool
  • The PL/Container language extension
  • The Greenplum Platform Extension Framework (PXF)
Note: PL/Container is not supported on RHEL/CentOS 6.x systems, because those platforms do not officially support Docker.
Greenplum Database support on Dell EMC DCA.
  • Pivotal Greenplum Database 5.19.0 is supported on DCA systems that are running DCA software version 3.4 or greater.
  • Only Pivotal Greenplum Database is supported on DCA systems. Open source versions of Greenplum Database are not supported.
  • FIPS is supported on DCA software version 3.4 and greater with Pivotal Greenplum Database 5.2.0 and greater.
Note: These Greenplum Database releases are not certified on DCA because of an incompatibility in configuring timezone information.

5.5.0, 5.6.0, 5.6.1, 5.7.0, 5.8.0

These Greenplum Database releases are certified on DCA.

5.7.1, 5.8.1, 5.9.0 and later releases, and 5.x releases prior to 5.5.0.

Pivotal Greenplum 5.19.0 supports these Java versions:
  • 8.xxx
  • 7.xxx

Greenplum Database 5.19.0 software that runs on Linux systems uses OpenSSL 1.0.2l (with FIPS 2.0.16), cURL 7.54, OpenLDAP 2.4.44, and Python 2.7.12.

Greenplum Database client software that runs on Windows and AIX systems uses OpenSSL 0.9.8zg.

The Greenplum Database s3 external table protocol supports these data sources:

Pivotal Greenplum 5.19.0 supports Data Domain Boost on Red Hat Enterprise Linux.

This table lists the versions of Data Domain Boost SDK and DDOS supported by Pivotal Greenplum 5.x.

Table 2. Data Domain Boost Compatibility
Pivotal Greenplum Data Domain Boost2 DDOS
5.19.0

5.18.0

5.17.0

5.16.0

5.14.0

5.13.0

5.12.0

5.11.1

5.11.0

5.10.2

5.9.0

5.8.1

5.8.0

5.7.1

5.7.0

5.4.1

5.4.0

5.2.0

5.1.0

5.0.0

3.3

3.0.0.31

6.1 (all versions)

6.0 (all versions)

Note: In addition to the DDOS versions listed in the previous table, Pivotal Greenplum 5.0.0 and later supports all minor patch releases (fourth digit releases) later than the certified version.

1Support for Data Domain Boost 3.0.0.3 is deprecated. The DELL EMC end of Primary Support date is December 31, 2017.

2The Greenplum Database utilities gpbackup and gprestore support Data Domain DD Boost File System Plugin (BoostFS) v1.1 with DDOS 6.0 or greater.

The gpbackup and gprestore utilities support using Dell EMC Data Domain Boost software with the DD Boost Storage Plugin.

Note: Pivotal Greenplum 5.19.0 does not support the ODBC driver for Cognos Analytics V11.

Connecting to IBM Cognos software with an ODBC driver is not supported. Greenplum Database supports connecting to IBM Cognos software with the DataDirect JDBC driver for Pivotal Greenplum. This driver is available as a download from Pivotal Network.

Veritas NetBackup

Pivotal Greenplum 5.19.0 supports backup with Veritas NetBackup version 7.7.3. See Backing Up Databases with Veritas NetBackup.

Supported Platform Notes

The following notes describe platform support for Pivotal Greenplum. Please send any questions or comments to Pivotal Support at https://support.pivotal.io.

  • Pivotal Greenplum is supported using either IPV4 or IPV6 protocols.
  • The only file system supported for running Greenplum Database is the XFS file system. All other file systems are explicitly not supported by Pivotal.
  • Greenplum Database is supported on network or shared storage if the shared storage is presented as a block device to the servers running Greenplum Database and the XFS file system is mounted on the block device. Network file systems are not supported. When using network or shared storage, Greenplum Database mirroring must be used in the same way as with local storage, and no modifications may be made to the mirroring scheme or the recovery scheme of the segments. Other features of the shared storage such as de-duplication and/or replication are not directly supported by Pivotal Greenplum Database, but may be used with support of the storage vendor as long as they do not interfere with the expected operation of Greenplum Database at the discretion of Pivotal.
  • Greenplum Database is supported when running on virtualized systems, as long as the storage is presented as block devices and the XFS file system is mounted for the storage of the segment directories.
    Warning: Running Pivotal Greenplum on hyper-converged infrastructure (HCI) has known issues with performance, scalability, and stability and is not recommended as a scalable solution for Pivotal Greenplum and may not be supported by Pivotal if stability problems appear related to the infrastructure. HCI virtualizes all of the elements of conventional hardware systems and includes, at a minimum, virtualized computing, a virtualised SAN, and virtualized networking.
  • A minimum of 10-gigabit network is required for a system configuration to be supported by Pivotal.
  • Greenplum Database is supported on Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Compute (GCP).
    • AWS - For production workloads, r4.8xlarge and r4.16xlarge instance types with four 12TB ST1 EBS volumes for each segment host, or d2.8xlarge with ephemeral storage configured with 4 RAID 0 volumes, are supported. EBS storage is recommended. EBS storage is more reliable and provides more features than ephemeral storage. Note that Amazon has no provisions to replace a bad ephemeral drive; when a disk failure occurs, you must replace the node with the bad disk.

      Pivotal recommends using an Auto Scaling Group (ASG) to provision nodes in AWS. An ASG automatically replaces bad nodes, and you can add further automation to recover the Greenplum processes on the new nodes automatically.

      Deployments should be in a Placement Group within a single Availability Zone. Because Amazon recommends using the same instance type in a Placement Group, use a single instance type for all nodes, including the masters.

    • Azure - For production workloads, Pivotal recommends configuring Standard_H8 instance type with 4 2TB disks and 2 segments per host and recommend using 8 2TB disks and 4 segments per host with Standard_H16 instance type. Standard_H16 uses 8 2TB disks and 4 segments per host. This means software RAID 0 is required so that the number of volumes do not exceed the number of segments.
      For Azure deployments, you must also configure the Greenplum Database system to not use port 65330. Add the following line to the sysctl.conf file on all Greenplum Database hosts.
      $net.ipv4.ip_local_reserved_ports=65330
    • GCP - For all workloads, n1-standard-8 and n1-highmem-8 are supported which are relatively small instance types. This is because of the disk performance in GCP forces the configuration to have just 2 segments per host but with many hosts to scale. Use pd-standard disks and the size of the disk is recommended to be 6 TB. For performance perspective, use a factor of 8 when determining how many nodes to deploy in GCP, so a 16 segment host cluster in AWS would require 128 nodes in GCP.

  • For Red Hat Enterprise Linux 7.2 or CentOS 7.2, the default systemd setting RemoveIPC=yes removes IPC connections when non-system users logout. This causes the Greenplum Database utility gpinitsystem to fail with semaphore errors. To avoid this issue, see "Setting the Greenplum Recommended OS Parameters" in the Greenplum Database Installation Guide.

Pivotal Greenplum Tools and Extensions Compatibility

Client Tools

Greenplum releases a number of client tool packages on various platforms that can be used to connect to Greenplum Database and the Greenplum Command Center management tool. The following table describes the compatibility of these packages with this Greenplum Database release.

Tool packages are available from Pivotal Network.

Table 3. Pivotal Greenplum 5.19.0 Tools Compatibility
Tool Description of Contents Tool Version(s) Server Version(s)
Pivotal Greenplum Clients Greenplum Database Command-Line Interface (psql) 5.8 5.x
Pivotal Greenplum Loaders Greenplum Database Parallel Data Loading Tools (gpfdist, gpload) 5.8 5.x
Pivotal Greenplum Command Center Greenplum Database management tool 4.7 5.19 and later
4.0.0 5.7.0 and later
3.3.2 5.0.0 and later
3.2.2 5.0.0 - 5.2.x
Pivotal Greenplum Workload Manager1 Greenplum Database query monitoring and management tool 1.8.0 5.0.0

The Greenplum Database Client Tools and Load Tools are supported on the following platforms:

  • AIX 7.2 (64-bit) (Client and Load Tools only)2
  • Red Hat Enterprise Linux x86_64 7.x (RHEL 7)
  • Red Hat Enterprise Linux x86_64 6.x (RHEL 6)
  • SuSE Linux Enterprise Server x86_64 SLES 11 SP4, or SLES 12 SP2/SP3
  • Windows 10 (32-bit and 64-bit)
  • Windows 8 (32-bit and 64-bit)
  • Windows Server 2012 (32-bit and 64-bit)
  • Windows Server 2012 R2 (32-bit and 64-bit)
  • Windows Server 2008 R2 (32-bit and 64-bit)
Note: 1For Pivotal Greenplum Command Center 4.0.0 and later, workload management is an integrated Command Center feature rather than the separate tool Pivotal Greenplum Workload Manager.

2For Greenplum Database 5.4.1 and earlier 5.x releases, download the AIX Client and Load Tools package either from the Greenplum Database 5.11.1 file collection or the Greenplum Database 5.0.0 file collection on Pivotal Network.

Extensions

Table 4. Pivotal Greenplum 5.19.0 Extensions Compatibility
Pivotal Greenplum Extension Versions
MADlib machine learning for Greenplum Database 5.x1 1.15.1, 1.15, 1.14
PL/Java for Greenplum Database 5.x 1.4.4, 1.4.32
PL/R for Greenplum Database 5.x 2.3.4, 2.3.3
PostGIS Spatial and Geographic Objects for Greenplum Database 5.x 2.1.5+pivotal.2
Python Data Science Module Package for Greenplum Database 5.x3 1.1.1, 1.1.0, 1.0.0
R Data Science Library Package for Greenplum Database 5.x4 1.0.1, 1.0.0
PL/Container for Greenplum Database 5.x 1.15, 1.26, 1.3, 1.4
Note: 1Pivotal recommends that you upgrade to the most recent version of MADlib. For information about MADlib support and upgrade information, see the MADlib FAQ. For information on installing the MADlib extension in Greenplum Database, see Greenplum MADlib Extension for Analytics.

2The PL/Java extension package version 1.4.3 is compatible only with Greenplum Database 5.11.0 and later, it is not compatible with 5.10.x or earlier. If you are upgrading from Greenplum Database 5.10.x or earlier and have installed PL/Java 1.4.2, you must upgrade the PL/Java extension to version 1.4.3.

3For information about the Python package, including the modules provided, see the Python Data Science Module Package.

4For information about the R package, including the libraries provided, see the R Data Science Library Package.

5To upgrade from PL/Container 1.0 to PL/Container 1.1 and later, you must drop the PL/Container 1.0 language before registering the new version of PL/Container. For information on upgrading the PL/Container extension in Greenplum Database, see PL/Container Language Extension.

6PL/Container version 1.2 can utilize the resource group capabilities that were introduced in Greenplum Database 5.8.0. If you downgrade to a Greenplum Database system that uses PL/Container 1.1 or earlier, you must use plcontainer runtime-edit to remove any resource_group_id settings from the PL/Container runtime configuration file. See Upgrading from PL/Container 1.1.

These Greenplum Database extensions are installed with Pivotal Greenplum Database
  • Fuzzy String Match Extension
  • PL/Python Extension
  • pgcrypto Extension

Pivotal Greenplum Data Connectors

  • Greenplum Platform Extension Framework (PXF) - PXF, integrated with Greenplum Database, provides access to HDFS, Hive, HBase, and SQL external data stores. Refer to Accessing External Data with PXF in the Greenplum Database Administrator Guide for PXF configuration and usage information.
    Note: PXF is available only for supported Red Hat and CentOS platforms. PXF is not available for supported SuSE platforms.
  • Greenplum-Spark Connector - The Pivotal Greenplum-Spark Connector supports high speed, parallel data transfer from Greenplum Database to an Apache Spark cluster. The Greenplum-Spark Connector is available as a separate download from Pivotal Network. Refer to the Greenplum-Spark Connector documentation for compatibility and usage information.
  • Greenplum-Informatica Connector - The Pivotal Greenplum-Informatica connector supports high speed data transfer from an Informatica PowerCenter cluster to a Pivotal Greenplum Database cluster for batch and streaming ETL operations. See the Pivotal Greenplum-Informatica Connector Documentation.
  • Greenplum-Kafka Integration - The Pivotal Greenplum-Kafka Integration provides high speed, parallel data transfer from a Kafka cluster to a Pivotal Greenplum Database cluster for batch and streaming ETL operations. Refer to the Pivotal Greenplum-Kafka Integration Documentation for more information about this feature.
  • Greenplum Stream Server - The Pivotal Greenplum Stream Server is an ETL tool that provides high speed, parallel data transfer from Informatica, Kafka, and custom client data sources to a Pivotal Greenplum Database cluster. Refer to the Performing ETL Operations with the Pivotal Greenplum Stream Server Documentation for more information about this feature.
  • Gemfire-Greenplum Connector - The Pivotal Gemfire-Greenplum Connector supports the transfer of data between a GemFire region and a Greenplum Database cluster. The Gemfire-Greenplum Connector is available as a separate download from Pivotal Network. Refer to the Gemfire-Greenplum Connector documentation for compatibility and usage information.

Pivotal GPText Compatibility

Pivotal Greenplum Database 5.19.0 is compatible with Pivotal GPText version 2.1.3 and later.

Pivotal Greenplum Command Center

See the Greenplum Command Center documentation for GPCC and Greenplum Workload Manager compatibility information, see the Pivotal Greenplum Command Center 3.x and 2.x Release Notes.
Note: For Pivotal Greenplum Command Center 4.0.0 and later, workload management is an integrated Command Center feature rather than the separate tool Pivotal Greenplum Workload Manager.

Hadoop Distribution Compatibility

Greenplum Database provides access to HDFS with gphdfs and the Greenplum Platform Extension Framework (PXF).

PXF Hadoop Distribution Compatibility

PXF supports the Cloudera, Hortonworks Data Platform, MapR, and generic Apache Hadoop distributions.

If you plan to access JSON format data stored in a Cloudera Hadoop cluster, PXF requires a Cloudera version 5.8 or later Hadoop distribution.

gphdfs Hadoop Distribution Compatibility

The supported Hadoop distributions for gphdfs are listed below:

Table 5. Supported gphdfs Hadoop Distributions
Hadoop Distribution Version gp_hadoop_ target_version
Cloudera CDH 5.x cdh
Hortonworks Data Platform HDP 2.x hdp
MapR MapR 4.x, MapR 5.x mpr
Apache Hadoop 2.x hadoop
Note: MapR requires the MapR client.

Upgrading to Greenplum Database 5.19.0

The upgrade path supported for this release is Greenplum Database 5.x to Greenplum Database 5.19.0. Upgrading a Greenplum Database 4.3.x release to Pivotal Greenplum 5.x is not supported. See Migrating Data to Pivotal Greenplum 5.x.

Note: If you are upgrading Greenplum Database on a DCA system, see Pivotal Greenplum on DCA Systems.
Important: Pivotal recommends that customers set the Greenplum Database timezone to a value that is compatible with their host systems. Setting the Greenplum Database timezone prevents Greenplum Database from selecting a timezone each time the cluster is restarted and sets the timezone for the Greenplum Database master and segment instances. After you upgrade to this release and if you have not set a Greenplum Database timezone value, verify that the selected Greenplum Database timezone is acceptable for your deployment. See Configuring Timezone and Localization Settings for more information.

Prerequisites

Before starting the upgrade process, Pivotal recommends performing the following checks.

  • Verify the health of the Greenplum Database host hardware, and that you verify that the hosts meet the requirements for running Greenplum Database. The Greenplum Database gpcheckperf utility can assist you in confirming the host requirements.
    Note: If you need to run the gpcheckcat utility, Pivotal recommends running it a few weeks before the upgrade and that you run gpcheckcat during a maintenance period. If necessary, you can resolve any issues found by the utility before the scheduled upgrade.

    The utility is in $GPHOME/bin. Pivotal recommends that Greenplum Database be in restricted mode when you run gpcheckcat utility. See the Greenplum Database Utility Guide for information about the gpcheckcat utility.

    If gpcheckcat reports catalog inconsistencies, you can run gpcheckcat with the -g option to generate SQL scripts to fix the inconsistencies.

    After you run the SQL scripts, run gpcheckcat again. You might need to repeat the process of running gpcheckcat and creating SQL scripts to ensure that there are no inconsistencies. Pivotal recommends that the SQL scripts generated by gpcheckcat be run on a quiescent system. The utility might report false alerts if there is activity on the system.

    Important: If the gpcheckcat utility reports errors, but does not generate a SQL script to fix the errors, contact Pivotal support. Information for contacting Pivotal Support is at https://support.pivotal.io.
  • During the migration process from Greenplum Database 5.0.0, a backup is made of some files and directories in $MASTER_DATA_DIRECTORY. Pivotal recommends that files and directories that are not used by Greenplum Database be backed up, if necessary, and removed from the $MASTER_DATA_DIRECTORY before migration. For information about the Greenplum Database migration utilities, see the Greenplum Database Documentation.

For information about supported versions of Greenplum Database extensions, see Pivotal Greenplum Tools and Extensions Compatibility.

If you are utilizing Data Domain Boost, you have to re-enter your DD Boost credentials after upgrading to Greenplum Database 5.19.0 as follows:

gpcrondump --ddboost-host ddboost_hostname --ddboost-user ddboost_user
  --ddboost-backupdir backup_directory
Note: If you do not reenter your login credentials after an upgrade, your backup will never start because the Greenplum Database cannot connect to the Data Domain system. You will receive an error advising you to check your login credentials.

If you have configured the Greenplum Platform Extension Framework (PXF) in your previous Greenplum Database installation, you must stop the PXF service, and you might need to back up PXF configuration files before upgrading to a new version of Greenplum Database. Refer to PXF Pre-Upgrade Actions for instructions.

If you do not plan to use PXF, or you have not yet configured PXF, no action is necessary.

Upgrading from 5.x to 5.19.0

An upgrade from 5.x to 5.19.0 involves stopping Greenplum Database, updating the Greenplum Database software binaries, upgrading and restarting Greenplum Database. If you are using Greenplum Database extension packages there are additional requirements. See Prerequisites in the previous section.

Note: If you are upgrading from Greenplum Database 5.10.x or earlier and have installed the PL/Java extension, you must upgrade the PL/Java extension to extension package version 1.4.3. Previous releases of the PL/Java extension are not compatible with Greenplum Database 5.11.0 and later. For information about the PL/Java extension package, see Pivotal Greenplum Tools and Extensions Compatibility.
Note: If you have databases that were created with Greenplum Database 5.10.x or an earlier 5.x release, upgrade the gp_bloat_expected_pages view in the gp_toolkit schema. For information about the issue and how check a database for the issue, see Update for gp_toolkit.gp_bloat_expected_pages Issue.
Note: If you are upgrading from Greenplum Database 5.7.0 or an earlier 5.x release and have configured PgBouncer in your Greenplum Database installation, you must migrate to the new PgBouncer when you upgrade Greenplum Database. Refer to Migrating PgBouncer for specific migration instructions.
Note: If you have databases that were created with Greenplum Database 5.3.0 or an earlier 5.x release, upgrade the gp_bloat_diagfunction and view in the gp_toolkit schema. For information about the issue and how check a database for the issue, see Update for gp_toolkit.gp_bloat_diag Issue.
Note: If the Greenplum Command Center database gpperfmon is installed in your Greenplum Database system, the migration process changes the distribution key of the Greenplum Database log_alert_* tables to the logtime column. The redistribution of the table data might take some time the first time you start Greenplum Database after migration. The change occurs only the first time you start Greenplum Database after a migration.
  1. Log in to your Greenplum Database master host as the Greenplum administrative user:
    $ su - gpadmin
  2. Perform a smart shutdown of your current Greenplum Database 5.x system (there can be no active connections to the database). This example uses the -a option to disable confirmation prompts:
    $ gpstop -a
  3. Run the binary installer for 5.19.0 on the Greenplum Database master host.
    When prompted, choose an installation location in the same base directory as your current installation. For example:
    /usr/local/greenplum-db-5.19.0

    If you install Greenplum Database with the rpm (as root), the installation directory is /usr/local/greenplum-db-5.19.0.

    For the rpm installation, update the permissions for the new installation. For example, run this command as root to change user and group of the installed files to gpadmin.

    # chown -R gpadmin:gpadmin /usr/local/greenplum*
  4. If needed, update the greenplum_path.sh file for use with your specific installation. These are some examples.
    • If Greenplum Database uses LDAP authentication, edit the greenplum_path.sh file to add the line:
      export LDAPCONF=/etc/openldap/ldap.conf
    • If Greenplum Database uses PL/Java, you might need to set or update the environment variables JAVA_HOME and LD_LIBRARY_PATH in greenplum_path.sh.
    Note: When comparing the previous and new greenplum_path.sh files, be aware that installing some Greenplum Database extensions also updates the greenplum_path.sh file. The greenplum_path.sh from the previous release might contain updates that were the result of those extensions. See step 9 for installing Greenplum Database extensions.
  5. Edit the environment of the Greenplum Database superuser (gpadmin) and make sure you are sourcing the greenplum_path.sh file for the new installation. For example change the following line in .bashrc or your chosen profile file:
    source /usr/local/greenplum-db-5.0.0/greenplum_path.sh

    to:

    source /usr/local/greenplum-db-5.19.0/greenplum_path.sh

    Or if you are sourcing a symbolic link (/usr/local/greenplum-db) in your profile files, update the link to point to the newly installed version. For example:

    $ rm /usr/local/greenplum-db
    $ ln -s /usr/local/greenplum-db-5.19.0 /usr/local/greenplum-db
  6. Source the environment file you just edited. For example:
    $ source ~/.bashrc
  7. Run the gpseginstall utility to install the 5.19.0 binaries on all the segment hosts specified in the hostfile. For example:
    $ gpseginstall -f hostfile
    Note: The gpseginstall utility copies the installed files from the current host to the remote hosts. It does not use rpm to install Greenplum Database on the remote hosts, even if you used rpm to install Greenplum Database on the current host.
  8. Use the Greenplum Database gppkg utility to install Greenplum Database extensions. If you were previously using any Greenplum Database extensions such as pgcrypto, PL/R, PL/Java, PL/Perl, and PostGIS, download the corresponding packages from Pivotal Network, and install using this utility. See the Greenplum Database Documentation for gppkg usage details.

    Also copy any additional files that are used by the extensions (such as JAR files, shared object files, and libraries) from the previous version installation directory to the new version installation directory on the master and segment host systems.

  9. If you are upgrading from Greenplum Database 5.7 or an earlier 5.x release and have configured PgBouncer in your Greenplum Database installation, you must migrate to the new PgBouncer when you upgrade Greenplum Database. Refer to Migrating PgBouncer for specific migration instructions.
  10. After all segment hosts have been upgraded, you can log in as the gpadmin user and restart your Greenplum Database system:
    # su - gpadmin
    $ gpstart
  11. If you are utilizing Data Domain Boost, you have to re-enter your DD Boost credentials after upgrading from Greenplum Database to 5.19.0 as follows:
    gpcrondump --ddboost-host ddboost_hostname --ddboost-user ddboost_user
      --ddboost-backupdir backup_directory 
    Note: If you do not reenter your login credentials after an upgrade, your backup will never start because the Greenplum Database cannot connect to the Data Domain system. You will receive an error advising you to check your login credentials.
  12. If you configured PXF in your previous Greenplum Database installation, you must re-initialize the PXF service after you upgrade Greenplum Database. Refer to Upgrading PXF for instructions.

After upgrading Greenplum Database, ensure features work as expected. For example, you should test that backup and restore perform as expected, and Greenplum Database features such as user-defined functions, and extensions such as MADlib and PostGIS perform as expected.

Troubleshooting a Failed Upgrade

If you experience issues during the migration process and have active entitlements for Greenplum Database that were purchased through Pivotal, contact Pivotal Support. Information for contacting Pivotal Support is at https://support.pivotal.io.

Be prepared to provide the following information:

  • A completed Upgrade Procedure.
  • Log output from gpcheckcat (located in ~/gpAdminLogs)

Migrating Data to Pivotal Greenplum 5.x

Upgrading a Pivotal Greenplum Database 4.x system directly to Pivotal Greenplum Database 5.x is not supported.

You can migrate existing data to Greenplum Database 5.x using standard backup and restore procedures (gpcrondump and gpdbrestore) or by using gptransfer. The gpcopy utility can be used to migrate data from Greenplum Database 4.3.26 or later to 5.9 or later.

Follow these general guidelines for migrating data:
  • Make sure that you have a complete backup of all data in the Greenplum Database 4.3.x cluster, and that you can successfully restore the Greenplum Database 4.3.x cluster if necessary.
  • You must install and initialize a new Greenplum Database 5.x cluster using the version 5.x gpinitsystem utility.
    Note: Unless you modify file locations manually, gpdbrestore only supports restoring data to a cluster that has an identical number of hosts and an identical number of segments per host, with each segment having the same content_id as the segment in the original cluster. If you initialize the Greenplum Database 5.x cluster using a configuration that is different from the version 4.3 cluster, then follow the steps outlined in Restoring to a Different Greenplum System Configuration to manually update the file locations.
    Important: For Greenplum Database 5.x, Pivotal recommends that customers set the Greenplum Database timezone to a value that is compatible with the host systems. Setting the Greenplum Database timezone prevents Greenplum Database from selecting a timezone each time the cluster is restarted and sets the timezone for the Greenplum Database master and segment instances. See Configuring Timezone and Localization Settings for more information.
  • If you intend to install Greenplum Database 5.x on the same hardware as your 4.3.x system, you will need enough disk space to accommodate over 5 times the original data set (2 full copies of the primary and mirror data sets, plus the original backup data in ASCII format) in order to migrate data with gpcrondump and gpdbrestore. Keep in mind that the ASCII backup data will require more disk space than the original data, which may be stored in compressed binary format. Offline backup solutions such as Dell EMC Data Domain or Veritas NetBackup can reduce the required disk space on each host.

    If you attempt to migrate your data on the same hardware but run out of free space, gpcopy provides the --truncate-source-after option to truncate each source table after copying the table to the destination cluster and validating the copy succeeded. This reduces the amount of free space needed to migrate clusters that reside on the same hardware. See Migrating Data with gpcopy for more information.

  • Use the version 5.x gpdbrestore utility to load the 4.3.x backup data into the new cluster.
  • If the Greenplum Database 5.x cluster resides on separate hardware from the 4.3.x cluster, and the clusters have different numbers of segments, you can optionally use the version 5.x gptransfer utility to migrate the 4.3.x data. You must initiate the gptransfer operation from the version 5.x cluster, pulling the older data into the newer system.

    On a Greenplum Database system with FIPS enabled, validating table data with MD5 (specifying the gptransfer option --validate=md5) is not available. Use the option sha256 to validate table data.

    Validating table data with SHA-256 (specifying the option --validate=sha256) requires the Greenplum Database pgcrypto extension. The extension is included with Pivotal Greenplum 5.x. The extension package must be installed on supported Pivotal Greenplum 4.3.x systems. Support for pgcrypto functions in a Greenplum 4.3.x database is not required.

  • Greenplum Database 5.x removes automatic implicit casts between the text type and other data types. After you migrate from Greenplum Database version 4.3.x to version 5.x, this change in behavior may impact existing applications and queries. Refer to About Implicit Text Casting in Greenplum Database in the Greenplum Database Installation Guide for information, including a discussion about supported and unsupported workarounds.
  • After migrating data you may need to modify SQL scripts, administration scripts, and user-defined functions as necessary to account for changes in Greenplum Database version 5.x. Look for Upgrade Action Required entries in the Pivotal Greenplum 5.0.0 Release Notes for features that may necessitate post-migration tasks.
  • If you are migrating from Greenplum Database 4.3.27 or an earlier 4.3.x release and have configured PgBouncer in your Greenplum Database installation, you must migrate to the new PgBouncer when you upgrade Greenplum Database. Refer to Migrating PgBouncer for specific migration instructions.

Pivotal Greenplum on DCA Systems

On supported Dell EMC DCA systems, you can install Pivotal Greenplum 5.19.0, or you can upgrade from Pivotal Greenplum 5.x to 5.19.0.

Only Pivotal Greenplum Database is supported on DCA systems. Open source versions of Greenplum Database are not supported.

Important: Upgrading Pivotal Greenplum Database 4.3.x to Pivotal Greenplum 5.19.0 is not supported. See Migrating Data to Pivotal Greenplum 5.x.
Note: These Greenplum Database releases are not certified on DCA because of an incompatibility in configuring timezone information.

5.5.0, 5.6.0, 5.6.1, 5.7.0, 5.8.0

These Greenplum Database releases are certified on DCA.

5.7.1, 5.8.1, 5.9.0 and later releases, and 5.x releases prior to 5.5.0.

Installing the Pivotal Greenplum 5.19.0 Software Binaries on DCA Systems

Important: This section is for installing Pivotal Greenplum 5.19.0 only on DCA systems. Also, see the information on the DELL EMC support site (requires login).

For information about installing Pivotal Greenplum on non-DCA systems, see the Greenplum Database Installation Guide.

Prerequisites

  • Ensure your DCA system supports Pivotal Greenplum 5.19.0. See Supported Platforms.
  • Ensure Greenplum Database 4.3.x is not installed on your system.

    Installing Pivotal Greenplum 5.19.0 on a DCA system with an existing Greenplum Database 4.3.x installation is not supported. For information about uninstalling Greenplum Database software, see your Dell EMC DCA documentation.

Installing Pivotal Greenplum 5.19.0

  1. Download or copy the Greenplum Database DCA installer file greenplum-db-appliance-5.19.0-RHEL6-x86_64.bin to the Greenplum Database master host.
  2. As root, run the DCA installer for 5.19.0 on the Greenplum Database master host and specify the file hostfile that lists all hosts in the cluster, one host name per line. If necessary, copy hostfile to the directory containing the installer before running the installer.

    This example command runs the installer for Greenplum Database 5.19.0.

    # ./greenplum-db-appliance-5.19.0-RHEL6-x86_64.bin hostfile

Upgrading from 5.x to 5.19.0 on DCA Systems

Upgrading Pivotal Greenplum from 5.x to 5.19.0 on a Dell EMC DCA system involves stopping Greenplum Database, updating the Greenplum Database software binaries, and restarting Greenplum Database.

Important: This section is only for upgrading to Pivotal Greenplum 5.19.0 on DCA systems. For information about upgrading on non-DCA systems, see Upgrading to Greenplum Database 5.19.0.
Note: If you are upgrading from Greenplum Database 5.10.x or earlier and have installed the PL/Java extension, you must upgrade the PL/Java extension to extension package version 1.4.3. Previous releases of the PL/Java extension are not compatible with Greenplum Database 5.11.0 and later. For information about the PL/Java extension package, see Pivotal Greenplum Tools and Extensions Compatibility.
Note: If you have databases that were created with Greenplum Database 5.10.x or an earlier 5.x release, upgrade the gp_bloat_expected_pages view in the gp_toolkit schema. For information about the issue and how check a database for the issue, see Update for gp_toolkit.gp_bloat_expected_pages Issue.
Note: If you are upgrading from Greenplum Database 5.7.0 or an earlier 5.x release and have configured PgBouncer in your Greenplum Database installation, you must migrate to the new PgBouncer when you upgrade Greenplum Database. Refer to Migrating PgBouncer for specific migration instructions.
Note: If you have databases that were created with Greenplum Database 5.3.0 or an earlier 5.x release, upgrade the gp_bloat_diagfunction and view in the gp_toolkit schema. For information about the issue and how check a database for the issue, see Update for gp_toolkit.gp_bloat_diag Issue.
  1. Log in to your Greenplum Database master host as the Greenplum administrative user (gpadmin):
    # su - gpadmin
  2. Download or copy the installer file greenplum-db-appliance-5.19.0-RHEL6-x86_64.bin to the Greenplum Database master host.
  3. Perform a smart shutdown of your current Greenplum Database 5.x system (there can be no active connections to the database). This example uses the -a option to disable confirmation prompts:
    $ gpstop -a
  4. As root, run the Greenplum Database DCA installer for 5.19.0 on the Greenplum Database master host and specify the file hostfile that lists all hosts in the cluster. If necessary, copy hostfile to the directory containing the installer before running the installer.

    This example command runs the installer for Greenplum Database 5.19.0 for Red Hat Enterprise Linux 6.x.

    # ./greenplum-db-appliance-5.19.0-RHEL6-x86_64.bin hostfile

    The file hostfile is a text file that lists all hosts in the cluster, one host name per line.

  5. If needed, update the greenplum_path.sh file for use with your specific installation. These are some examples.
    • If Greenplum Database uses LDAP authentication, edit the greenplum_path.sh file to add the line:
      export LDAPCONF=/etc/openldap/ldap.conf
    • If Greenplum Database uses PL/Java, you might need to set or update the environment variables JAVA_HOME and LD_LIBRARY_PATH in greenplum_path.sh.
    Note: When comparing the previous and new greenplum_path.sh files, be aware that installing some Greenplum Database extensions also updates the greenplum_path.sh file. The greenplum_path.sh from the previous release might contain updates that were the result of those extensions. See step 6 for installing Greenplum Database extensions.
  6. Install Greenplum Database extension packages. For information about installing a Greenplum Database extension package, see gppkg in the Greenplum Database Utility Guide.

    Also migrate any additional files that are used by the extensions (such as JAR files, shared object files, and libraries) from the previous version installation directory to the new version installation directory.

  7. After all segment hosts have been upgraded, you can log in as the gpadmin user and restart your Greenplum Database system:
    # su - gpadmin
    $ gpstart
  8. If you are utilizing Data Domain Boost, you have to re-enter your DD Boost credentials after upgrading to Greenplum Database 5.19.0 as follows:
    gpcrondump --ddboost-host ddboost_hostname --ddboost-user ddboost_user
      --ddboost-backupdir backup_directory
    Note: If you do not reenter your login credentials after an upgrade, your backup will never start because the Greenplum Database cannot connect to the Data Domain system. You will receive an error advising you to check your login credentials.

After upgrading Greenplum Database, ensure features work as expected. For example, you should test that backup and restore perform as expected, and Greenplum Database features such as user-defined functions, and extensions such as MADlib and PostGIS perform as expected.

Update for gp_toolkit.gp_bloat_expected_pages Issue

In Greenplum Database 5.10.x and earlier 5.x releases, the Greenplum Database view gp_toolkit.gp_bloat_expected_pages view might incorrectly report that a root partition table is bloated even though root partition tables do not contain data. This information could cause a user to run a VACUUM FULL operation on the partitioned table when the operation was not required. The issue was resolved in Greenplum Database 5.11.0 (resolved issue 29523) .

When updating Greenplum Database, the gp_toolkit.gp_bloat_expected_pages view must be updated in databases created with a Greenplum Database 5.10.x or an earlier 5.x release. This issue has been fixed in databases created with Greenplum Database 5.11.0 and later. For information about using template0 as the template database after upgrading from Greenplum Database 5.10.x or an earlier 5.x release, see known issue 29523.

To check whether the gp_toolkit.gp_bloat_expected_pages view in a database requires an update, run the psql command \d+ to display the view definition.

\d+ gp_toolkit.gp_bloat_expected_pages
The updated view definition contains this predicate.
AND NOT EXISTS
( SELECT parrelid
     FROM pg_partition
     WHERE parrelid = pgc.oid )

Perform the following steps as the gpadmin user to update the view on each database that was created with Greenplum Database 5.11.0 or an earlier 5.x release.

  1. Copy the script into a text file on the Greenplum Database master.
  2. Run the script on each database that requires the update.
    This example updates gp_toolkit.gp_bloat_expected_pages view in the database mytest and assumes that the script is in the gp_bloat_expected_pages in the gpadmin home directory.
    psql -f /home/gpadmin/gp_bloat_expected_pages.sql -d mytest

Run the script during a low activity period. Running the script during a high activity period does not affect database functionality but might affect performance.

Script to Update gp_toolkit.gp_bloat_expected_pages View

BEGIN;
CREATE OR REPLACE VIEW gp_toolkit.gp_bloat_expected_pages
AS
    SELECT
      btdrelid,
      btdrelpages,
       CASE WHEN btdexppages < numsegments
          THEN numsegments
          ELSE btdexppages
        END as btdexppages
    FROM
    ( SELECT
        oid as btdrelid,
        pgc.relpages as btdrelpages,
        CEIL((pgc.reltuples * (25 + width))::numeric / current_setting('block_size')::numeric) AS btdexppages,
        (SELECT numsegments FROM gp_toolkit.__gp_number_of_segments) AS numsegments
        FROM
        ( SELECT pgc.oid, pgc.reltuples, pgc.relpages
            FROM pg_class pgc
            WHERE NOT EXISTS
            ( SELECT iaooid
                FROM gp_toolkit.__gp_is_append_only
                WHERE iaooid = pgc.oid AND iaotype = 't' )
            AND NOT EXISTS
            ( SELECT parrelid
                FROM pg_partition
                WHERE parrelid = pgc.oid )) AS pgc
        LEFT OUTER JOIN
          ( SELECT  starelid, SUM(stawidth * (1.0 - stanullfrac)) AS width
              FROM pg_statistic pgs
              GROUP BY 1) AS btwcols
        ON pgc.oid = btwcols.starelid
        WHERE starelid IS NOT NULL) AS subq;

GRANT SELECT ON TABLE gp_toolkit.gp_bloat_expected_pages TO public;
COMMIT;

Update for gp_toolkit.gp_bloat_diag Issue

In Greenplum Database 5.3.0 or an earlier 5.x release, Greenplum Database returned an integer out of range error in some cases when performing a query against the gp_toolkit.gp_bloat_diag view. The issue was resolved in Greenplum Database 5.4.0 (resolved issue 26518) .

When updating Greenplum Database, the gp_toolkit.gp_bloat_diag function and view must be updated in databases created with a Greenplum Database 5.3.0 or an earlier 5.x release. This issue has been fixed in databases created with Greenplum Database 5.4.0 and later. For information about upgrading from Greenplum Database 5.3.0 or an earlier 5.x release and then using template0 as the template database, see known issue 29523.

To check whether the gp_toolkit.gp_bloat_diag function and view in a database requires an update, run the psql command \df to display information about the gp_toolkit.gp_bloat_diag function.

\df gp_toolkit.gp_bloat_diag

If the data type for btdexppages is integer, an update is required. If the data type is numeric an update is not required. In this example, the btdexppages data type is integer and requires an update.

List of functions
-[ RECORD 1 ]-------+------------------------------------------------------------------------------------------------
Schema              | gp_toolkit
Name                | gp_bloat_diag
Result data type    | record
Argument data types | btdrelpages integer, btdexppages integer, aotable boolean, OUT bltidx integer, OUT bltdiag text
Type                | normal

Perform the following steps as the gpadmin user to update the function and view to fix the issue on each database that was created with Greenplum Database 5.3.0 or an earlier 5.x release.

  1. Copy the script into a text file on the Greenplum Database master.
  2. Run the script on each database that requires the update.
    This example updates gp_toolkit.gp_bloat_diag function and view in the database mytest and assumes that the script is in the update_bloat_diag.sql in the gpadmin home directory.
    psql -f /home/gpadmin/update_bloat_diag.sql -d mytest

Run the script during a low activity period. Running the script during a high activity period does not affect database functionality but might affect performance.

Script to Update gp_toolkit.gp_bloat_diag Function and View
BEGIN;
CREATE OR REPLACE FUNCTION gp_toolkit.gp_bloat_diag(btdrelpages int, btdexppages numeric, aotable bool,
    OUT bltidx int, OUT bltdiag text)
AS
$$
    SELECT
        bloatidx,
        CASE
            WHEN bloatidx = 0
                THEN 'no bloat detected'::text
            WHEN bloatidx = 1
                THEN 'moderate amount of bloat suspected'::text
            WHEN bloatidx = 2
                THEN 'significant amount of bloat suspected'::text
            WHEN bloatidx = -1
                THEN 'diagnosis inconclusive or no bloat suspected'::text
        END AS bloatdiag
    FROM
    (
        SELECT
            CASE
                WHEN $3 = 't' THEN 0
                WHEN $1 < 10 AND $2 = 0 THEN -1
                WHEN $2 = 0 THEN 2
                WHEN $1 < $2 THEN 0
                WHEN ($1/$2)::numeric > 10 THEN 2
                WHEN ($1/$2)::numeric > 3 THEN 1
                ELSE -1
            END AS bloatidx
    ) AS bloatmapping

$$
LANGUAGE SQL READS SQL DATA;

GRANT EXECUTE ON FUNCTION gp_toolkit.gp_bloat_diag(int, numeric, bool, OUT int, OUT text) TO public;

CREATE OR REPLACE VIEW gp_toolkit.gp_bloat_diag
AS
    SELECT
        btdrelid AS bdirelid,
        fnnspname AS bdinspname,
        fnrelname AS bdirelname,
        btdrelpages AS bdirelpages,
        btdexppages AS bdiexppages,
        bltdiag(bd) AS bdidiag
    FROM
    (
        SELECT
            fn.*, beg.*,
            gp_toolkit.gp_bloat_diag(btdrelpages::int, btdexppages::numeric, iao.iaotype::bool) AS bd
        FROM
            gp_toolkit.gp_bloat_expected_pages beg,
            pg_catalog.pg_class pgc,
            gp_toolkit.__gp_fullname fn,
            gp_toolkit.__gp_is_append_only iao

        WHERE beg.btdrelid = pgc.oid
            AND pgc.oid = fn.fnoid
            AND iao.iaooid = pgc.oid
    ) as bloatsummary
    WHERE bltidx(bd) > 0;

GRANT SELECT ON TABLE gp_toolkit.gp_bloat_diag TO public;
COMMIT;