Upgrading the Streaming Server

Upgrading the Streaming Server

If you are using the Greenplum Streaming Server (GPSS) in your current Greenplum Database installation, you must perform the GPSS upgrade procedure when:
  • You upgrade to a newer version of Greenplum Database, or
  • You install a new standalone GPSS package on your ETL host or in your Greenplum Database installation.

The GPSS upgrade procedures describe how to upgrade GPSS in your Greenplum Database installation or on your ETL host. This procedure uses GPSS.from to refer to your currently-installed GPSS and GPSS.new to refer to the GPSS installed when you upgrade to the new version of Greenplum Database or install a new GPSS package.

The GPSS upgrade procedure has two parts. You perform one procedure before, and one procedure after, you upgrade to a new version of Greenplum Database or GPSS:

Step1: GPSS Pre-Upgrade Actions

Perform this procedure in your GPSS.from installation before you upgrade to a new version of Greenplum Database or GPSS:

  1. Log in to the Greenplum Database master host or the ETL host and set up your environment. For example:
    $ ssh gpadmin@<gpmaster>
    gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh

    Or:

    $ ssh etluser@<etlhost>
    etluser@etlhost$ . /usr/local/gpss/gpss_path.sh
  2. Identify and note the current version (GPSS.from) of GPSS. For example:
    $ gpss --version
  3. Stop all gpss jobs that are in the Running state.
  4. Stop all running gpss instances.
  5. Upgrade to the new version of Greenplum Database or install a new version of GPSS, and then continue your GPSS upgrade with Step2: Upgrading GPSS.

Step2: Upgrading GPSS

After you upgrade to the new version of Greenplum Database or install the new version of GPSS in your Greenplum installation, perform the following procedure to upgrade the GPSS.new software:

  1. Log in to the Greenplum Database master host or the ETL host and set up your environment. For example, on the master:
    $ ssh gpadmin@<gpmaster>
    gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
  2. Identify and note the new version (GPSS.new) of GPSS. For example:
    gpadmin@gpmaster$ gpss --version
  3. If you are upgrading from GPSS version 1.3.0 or older:

    GPSS 1.3.0 introduced a regression that caused it to no longer recognize history tables (internal tables that GPSS creates for each job) that were created with GPSS 1.2.6. This regression could cause GPSS to load duplicate Kafka messages into Greenplum. This issue is resolved in GPSS 1.3.1.

    You are not required to perform any upgrade steps related to this issue; GPSS will automatically perform the required actions when you resubmit and restart a load job that you initiated with GPSS 1.3.0. GPSS's upgrade actions are dependent upon the GPSS version(s) from which you are upgrading, and are described below:

    • If you are upgrading directly from GPSS 1.2.6 or older, GPSS performs no special upgrade actions.
    • If you are upgrading from GPSS 1.3.0 and you previously submitted load jobs with both GPSS 1.2.6 or older and 1.3.0, GPSS copies the internal history table for each submitted job to a table with the correct name format, and uses those tables. GPSS also retains and renames the internal history table for each GPSS 1.3.0 job, adding the prefix deprecated_.
    • If you first and only used GPSS 1.3.0 and are upgrading from this version, GPSS renames the internal history table for each restarted job.
  4. If you are upgrading to GPSS version 1.3.2:
    • GPSS 1.3.2 includes a new version of the gpss extension. If you installed a new version of Greenplum Database, or you installed the GPSS gppkg or .tar.gz packages in your Greenplum installation, you must drop and re-create the extension in any Greenplum database in which you are using GPSS or gpkafka to load data. A database superuser or the database owner must run these SQL commands:
      DROP EXTENSION gpss;
      CREATE EXTENSION gpss;
      (If the extension does not already exist, GPSS automatically creates it in a database the first time a Greenplum superuser or the database owner submits a load job to any table in that database.)
    • GPSS 1.3.2 changes the gpss.json configuration file:
      • The new file format allows you to specify unique SSL Certificates for GPSS and gpfdist. If you are using SSL to encrypt communication between GPSS and Kafka, Greenplum, or the GPSS client, you must update the gpss.json server configuration file to configure the correct Certificate block.
      • The ListenAddress:SSL property is removed. Ensure that you remove this property from all GPSS server configuration files.
    • GPSS 1.3.2 renames gpkafka check to gpkafka history. If you have any scripts or programs that reference gpkafka check, you must replace these references with gpkafka history.
    • GPSS 1.3.2 removes the ENCRYPTION property from the gpkafka.yaml job configuration file. Ensure that you remove this property from all job configuration files, and that you provide Kafka SSL configuration properties via the PROPERTY block in the file.
    • GPSS 1.3.2 removes the LOCAL_HOSTNAME and LOCAL_PORT properties from the gpkafka.yaml job configuration file. You must remove these properties from all job configurations, and specify the gpfdist configuration for each job in one of the following ways:
      • If you are loading data with gpkafka load, provide the --config gpfdistconfig.json or --gpfdist-host hostaddr and --gpfdist-port portnum options when you run the command.
      • If you are loading data with the gpsscli job management commands, ensure that the gpss.json configuration file for the gpss server instance servicing the request specifies the desired Gpfdist:Host and Gpfdist:Port settings.
    • GPSS 1.3.2 removes the --no-reuse flag from the gpsscli load and gpsscli start commands. If you have any scripts or programs that reference this flag, you must remove the references.
  5. Restart your gpss instances.
  6. Resubmit and restart your GPSS jobs.

    For any Kafka job that you resubmit and restart, GPSS will consume Kafka messages from the offset associated with the latest timestamp recorded in the history table for the job.