Overview of the Greenplum Streaming Server

Overview of the Greenplum Streaming Server

The Greenplum Streaming Server (GPSS) is an ETL (extract, transform, load) tool. An instance of the GPSS server ingests streaming data from one or more clients, using Greenplum Database readable external tables to transform and insert the data into a target Greenplum table. The data source and the format of the data are specific to the client.

The Greenplum Streaming Server includes the gpss command-line utility. When you run gpss, you start an instance of GPSS; this instance waits indefinitely for client data.

The Greenplum Streaming Server also includes the gpsscli command-line utility, a client tool for submitting data load jobs to a GPSS instance and managing those jobs.

Note: The Greenplum Streaming Server gpsscli client utility currently supports Kafka and file data sources.


The Greenplum Streaming Server is a gRPC server. The GPSS gRPC service definition includes the operations and messages necessary to connect to Greenplum Database and examine Greenplum metadata. The service definition also includes the operations and messages necessary to write data from a client into a Greenplum Database table. For more information about gRPC, refer to the gRPC documentation.

The gpsscli utility is a Greenplum Streaming Server gRPC client, as are the Tanzu Greenplum Connector for Informatica and the Tanzu Greenplum Connector for Apache NiFi. You can develop your own GPSS gRPC client using the GPSS Data or Streaming Job APIs.

Figure 1. Greenplum Streaming Server Architecture

A typical sequence of events for performing an ETL task using the Greenplum Streaming Server follows:

  1. A user initiates one or more ETL load jobs via a client application.
  2. The client application uses the gRPC protocol to submit and start data load job(s) to a running GPSS service instance.
  3. The GPSS service instance submits each load request transaction to the Greenplum Database cluster master instance. GPSS uses the gpfdist protocol to store data in external tables that it creates or reuses.
  4. The GPSS service instance writes the data delivered from the client directly into the segments of the Greenplum Database cluster.