Installing, Configuring, and Managing the Greenplum Stream Server

A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 5.x documentation.

Installing, Configuring, and Managing the Greenplum Stream Server

The Greenplum Stream Server (GPSS) manages communication and data transfer transfer between a client (for example, the Pivotal Greenplum-Informatica Connector) and Greenplum Database. You must configure and start a GPSS instance before you use the service to load data into Greenplum Database.

Prerequisites

The Greenplum Stream Server gpss and gpsscli command line utilities are automatically installed with Greenplum Database version 5.16 and later.

Before you start a GPSS server instance, ensure that you:

  • Install and start a compatible Greenplum Database version.
  • Can identify the hostname of your master node.
  • Can identify the port on which your Greenplum Database master server process is running, if it is not running on the default port (5432).
  • Select one more more GPSS host machines that have connectivity to:
    • The GPSS client host systems.
    • The Greenplum Database master and all segment hosts.

If you are using the gpsscli client utility, ensure that you run the command on a host that has connectivity to:

  • The client data source host systems. For example, for a Kafka data source, you must have connectivity to each broker host in the Kafka cluster.
  • The Greenplum Database master and all segment hosts.

Registering the GPSS Extension

You must explicity register the Greenplum Stream Server extension in each database in which you will use GPSS to write data to Greenplum tables. To register these functions, you must have Greenplum Database SUPERUSER privileges on the database, or you must be the database owner.

Perform the following procedure to register the GPSS extension:

  1. Open a new terminal window, log in to the Greenplum Database master host as the gpadmin administrative user, and set up the Greenplum environment. For example:
    $ ssh gpadmin@gpmaster
    gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
  2. Start the psql subsystem, connecting to a database in which you want to register the GPSS formatter function. For example:
    gpmaster$ psql -d testdb
  3. Enter the following command to register the extension:
    testdb=# CREATE EXTENSION gpss;
  4. Perform steps 2 and 3 for each database in which the Greenplum Stream Server will write client data.

Configuring the Greenplum Stream Server

You configure an invocation of the Greenplum Stream Server via a JSON-formatted configuration file. This configuration file includes properties that identify the listen address of the GPSS service as well as the gpfdist host and port number. You can also specify encryption options in the file.

The contents of a sample GPSS JSON configuration file named gpsscfg1.json follow:

{
    "ListenAddress": {
        "Host": "",
        "Port": 50007,
        "SSL": true
    },
    "Gpfdist": {
        "Host": "",
        "Port": 8319
    },
    "Certificate": {
        "CertFile": "/home/gpadmin/gpdb_bin/gpdb/server.crt",
        "KeyFile": "/home/gpadmin/gpdb_bin/gpdb/server.key",
        "CAFile": "/home/gpadmin/gpdb_bin/gpdb/rootCA.pem"
    }
}

Refer to the gpss.json reference page for detailed information about the GPSS configuration file format and the configuration properties that the utility supports.

Running the Greenplum Stream Server

You use the gpss utility to start an instance of the Greenplum Stream Server on the local host. When you run the command, you provide the name of the configuration file that defines the properties of the GPSS and gpfdist service instances. You can also specify the name of a directory to which gpss writes log files. For example, to start a GPSS instance specifying a log directory named gpsslogs relative to the current working directory:

$ gpss gpsscfg1.json --log-dir ./gpsslogs

The default mode of operation for gpss is to wait for, and then consume, job requests and data from a client. When run in this mode, gpss waits indefinitely. You can interrupt and exit the command with Control-c. You may also choose to run gpss in the background (&). In both cases, gpss writes log and status messages to stdout.

Note: gpss keeps track of the loading progress of client jobs in memory. When you stop a GPSS server instance, you lose all registered jobs. You must re-submit any previously-submitted jobs that you require after you restart the GPSS instance. gpss will resume a job from the last load offset.

Refer to the gpss reference page for additional information about this command.