Release Notes

Release Notes

The Pivotal Greenplum Streaming Server (GPSS) is included in the Pivotal Greenplum Database distribution. GPSS for Redhat/CentOS 6 and 7 is also updated and distributed independently of Greenplum Database starting with version 1.3.1. You may need to download and install the GPSS distribution to obtain the most recent version of this component.

Supported Platforms

Pivotal Greenplum Streaming Server is compatible with these Greenplum Database versions:

  • Pivotal Greenplum Database 5.17.0 and later
  • Pivotal Greenplum Database 6.0.0 and later

Release 1.3.3

Release Date: February 14, 2020

Greenplum Streaming Server 1.3.3 adds a new feature and resolves an issue.

Note: If you are upgrading from GPSS version 1.3.1 or older, you are required to perform upgrade actions for this release. Be sure to review Upgrading the Streaming Server to plan your upgrade to GPSS 1.3.3.

New Feature

The gpkafka load --debug-port option, removed in GPSS 1.3.2, is reinstated.

Resolved Issue

Greenplum Streaming Server 1.3.3 resolves this issue:

N/A
GPSS did not immediately dispatch Kafka data to Greenplum Database, but rather buffered and sent data in batches to Greenplum. This issue is resolved; GPSS loads data to Greenplum as it receives the data, and commits a batch as specified by the job COMMIT configuration property settings.

Release 1.3.2

Release Date: February 4, 2020

Greenplum Streaming Server 1.3.2 adds new features and resolves several issues.

Note: You are required to perform upgrade actions for this release. Be sure to review Upgrading the Streaming Server to plan your upgrade to GPSS 1.3.2.

New Features

Greenplum Streaming Server 1.3.2 includes these new features:

  • GPSS supports the gpload-style MAPPING syntax target_column_name : { source_column_name | expression }.
  • gpsscli subcommands support SSL-encrypted communications with a gpss server instance.
  • gpsscli subcommands support the following new flags and options:
    • --config gpsscliconfig.json identifies the GPSS service instance to which to direct the command. When SSL encryption is enabled between the GPSS client and server, you also use this file to identify the file system location of the client SSL certificates.
    • --no-check-ca disables certificate verification when SSL is enabled between the GPSS client and server.
  • gpsscli submit and gpsscli load support the -f | --force flag. When specified, it forces GPSS to reload the job configuration.
  • gpkafka subcommands support these flags and options:
    • --logdir dir specifies the log file directory.
    • --verbose outputs more detailed processing information.
  • gpkafka load supports the following options and flags:
    • --name job_name specifies the job name.
    • --partition specifies that the command display job progress by partition rather than by batch (the default).
    • --config gpfdistconfig.json specifies the gpfdist protocol configuration via a file. When SSL encryption is enabled on the data channel between GPSS and Greenplum Database, you also use this file to identify the file system location of the GPSS server SSL certificates.
    • --gpfdist-host hostaddr and --gpfdist-port portnum specify the gpfdist protocol host and port number.
    • -f | --force forces gpkafka to reload the job configuration.

Changed Features

Greenplum Streaming Server 1.3.2 includes these changes:

  • GPSS provides more detailed information about file updates when a .tar.gz install package is extracted in a Greenplum installation.
  • The format of the GPSS server configuration file gpss.json is changed:
    • You can now specify unique encryption certificates for GPSS (via ListenAddress) and for Gpfdist.
    • The ListenAddress:SSL property is removed from the file.

    Refer to gpss.json for more information about the new file format.

  • GPSS prints the contents of the server configuration file (gpss.json) to stdout during gpss startup.
  • GPSS includes more detailed information in the messages displayed when it encounters errors during data deserialization.
  • The GPSS extension version is updated to 1.0, and GPSS now checks for version compatibility.
  • GPSS automatically registers its extension (CREATE EXTENSION gpss) in a Greenplum database the first time a Greenplum superuser or the database owner uses GPSS to load data and the extension was not previously registered.
  • All GPSS install packages now include the kafkacat debug utility and its dependent libserdes.so library. Refer to https://github.com/edenhill/kafkacat for detailed information about this utility.
  • gpkafka load no longer launches a gpss server instance, but rather calls the backend server code directly.
  • gpkafka load removes the --debug-port option.
  • GPSS removes the ENCRYPTION property from the gpkafka.yaml load configuration file. You now specify Kafka encryption properties via the PROPERTY block in the file.
  • GPSS removes the LOCAL_HOSTNAME and LOCAL_HOSTNAME properties from the gpkafka.yaml load configuration file. You now provide this information via the new --config gpfdistconfig.json or --gpfdist-host hostaddr and --gpfdist-port portnum options to gpkafka load.
  • GPSS removes the --no-reuse option from the gpsscli load and gpsscli start commands, no longer providing per-job configuration of external table reuse. Continue to configure external table reuse on a per-gpss instance basis via the ReuseTables configuration property.
  • GPSS's handling of the ERROR_LIMIT property is changed when it is run against Greenplum Database 6.x. GPSS now tolerates one more error than the setting. For example, GPSS tolerates 3 bad data rows when the ERROR_LIMIT is set to 2.
  • The gpkafka check command is renamed to gpkafka history.

Resolved Issues

Greenplum Streaming Server 1.3.2 resolves these issues:

30224
In some cases, GPSS did not correctly handle messages containing a zero length key or value when reading Avro data from Kafka. This issue is resolved.
30202
In some cases, a data load operation from Kafka could timeout if a Kafka broker became unavailable. This issue is resolved.
170274895
GPSS wrote a password in plain text to a log file. This issue is resolved, GPSS now replaces the password text with *.
170202002
GPSS re-read messages from a Kafka topic when the METADATA:SCHEMA property was changed in a load configuration file and the associated job was restarted. This could cause GPSS to load duplicate messages into Greenplum Database. This issue is resolved.
169588268
Specifying more than one schema registry service address in the load configuration file could cause GPSS to provoke a segment crash. This issue is resolved.
169309023
gpkafka load did not terminate its backend gpss server process when the command was interrupted with a Control-c. This issue is resolved.
169200795
When loading Kafka data into Greenplum Database in UPDATE and MERGE modes, GPSS required that a MAPPING exist for each column identified in the MATCH_COLUMNS and UPDATE_COLUMNS lists. This issue is resolved; GPSS now automatically adds match and update columns into the insert columns list.

Release 1.3.1

Release Date: December 19, 2019

Greenplum Streaming Server version 1.3.1 is the first standalone release of GPSS. GPSS 1.3.1 is also included in the Greenplum Database version 5.24 and 6.2 distributions.

Greenplum Streaming Server 1.3.1 is a maintenance release that resolves several issues.

Resolved Issues

Greenplum Streaming Server 1.3.1 resolves these issues:

169806983
In some cases, reading from Kafka using the default MINIMAL_INTERVAL (0 seconds) caused GPSS to consume a large amount of CPU resources, even when no new messages existed in the Kafka topic. This issue is resolved.
169807372, 169831558
GPSS 1.3.0 did not recognize internal history tables that were created with GPSS 1.2.6 and earlier. In some cases, this caused GPSS to load duplicate messages into Greenplum Database. This issue is resolved.

Release 1.3.0

Release Date: November 1, 2019

Greenplum Streaming Server version 1.3.0 is included in the Greenplum Database version 5.23 and 6.1 distributions.

Greenplum Streaming Server 1.3.0 is a minor release that includes new and changed features and resolves several issues.

New and Changed Features

Greenplum Streaming Server 1.3.0 includes these new and changed features:

  • GPSS now supports log rotation, utilizing a mechanism that you can easily integrate with the Linux logrotate system. See Managing GPSS Log Files for more information.
  • GPSS has added the new INPUT:FILTER load configuration property. This property enables you to specify a filter that GPSS applies to Kafka input data before loading it into Greenplum Database.
  • GPSS displays job progress by partition when you provide the --partition flag to the gpsscli progress command.
  • GPSS enables you to load Kafka data that was emitted since a specific timestamp into Greenplum Database. To use this feature, you provide the --force-reset-timestamp flag when you run gpsscli load, gpsscli start, or gpkafka load.
  • GPSS now supports update and merge operations on data stored in a Greenplum Database table. The load configuration file accepts MODE, MATCH_COLUMNS, UPDATE_COLUMNS, and UPDATE_CONDITION property values to direct these operations. Example: Merging Data from Kafka into Greenplum Using the Streaming Server provides an example merge scenario.
  • GPSS supports Kerberos authentication to both Kafka and Greenplum Database.
  • GPSS supports SSL encryption between GPSS and Kafka.
  • GPSS supports SSL encryption on the data channel between GPSS and Greenplum Database.

Resolved Issues

Greenplum Streaming Server 1.3.0 is a minor release that resolves these issues:

168130147
In some situations, specifying the --force-reset-earliest flag when loading data failed to read from the correct offset. This problem has been fixed. (Using the --force-reset-xxx flags outside of an offset mismatch scenario is discouraged.)
167997441
GPSS did not save error data to the external table error log when it encountered an incorrectly-formatted JSON or Avro message. This issue has been fixed; invoking gp_read_error_log() on the external table now displays the offending data.
164823612
GPSS incorrectly treated Kafka jobs that specified the same Kafka topic and Greenplum output schema name and output table name, but different database names, as the same job. This issue has been resolved. GPSS now includes the Greenplum database name when constructing a job definition.

Known Issues

Greenplum Streaming Server 1.3.x has these known issues:

N/A
Due to a regression in GPSS 1.3.0, GPSS no longer immediately dispatches Kafka data to Greenplum Database as it receives the data. GPSS now buffers and sends a batch of data to Greenplum as specified by the job COMMIT configuration, or when an application invokes the Close service.

Resolved in GPSS 1.3.3.

170202002
Updating the METADATA:SCHEMA property and restarting a previously-run load job could cause gpkafka to re-read Kafka messages published to the topic, and load duplicate messages into Greenplum Database.

Resolved in GPSS 1.3.2.

169200795
When loading Kafka data into Greenplum Database in UPDATE and MERGE modes, GPSS requires that a MAPPING exist for each column name identified in the MATCH_COLUMNS and UPDATE_COLUMNS lists.

Resolved in GPSS 1.3.2.

169807372
GPSS version 1.3.0 does not recognize internal history tables that were created with GPSS v1.2.6 and earlier. If you re-submit a load job that was originally initiated with the GPSS from a Greenplum Database 6.0.x or 5.22 or earlier distribution or Greenplum 6.0.x Clients Package, GPSS will read Kafka messages starting from the earliest available offset in the topic. This may cause GPSS to load duplicate messages into Greenplum Database.

Workaround: Do not upgrade to Greenplum Database 6.1 or 5.23; wait for a Greenplum or GPSS release that includes GPSS v1.3.1 or later.

Resolved in GPSS 1.3.1.
169806983
In some cases, reading from Kafka using the default MINIMAL_INTERVAL (0 seconds) causes GPSS to consume a large amount of CPU resources, even when no new messages exist in the Kafka topic.

Workaround: Specify a MINIMAL_INTERVAL in the load configuration YAML file when you submit the job; for example, specify a value of 2000 (2 seconds) or 10000 (10 seconds).

Resolved in GPSS 1.3.1.