gpsscli load

A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 5.x documentation.

gpsscli load

Load data with the Greenplum Stream Server.

Synopsis

gpsscli load config.yaml [--name job_name]
     [--quit-at-eof] [--no-reuse] [{--force-reset-earliest | --force-reset-latest}]
     [--gpss-host host] [--gpss-port port]
     [-l | --log-dir directory] [-v | --verbose]

gpsscli load {-h | --help}

Description

The gpsscli load command initiates a load job to a specific Greenplum Stream Server (GPSS) instance. When you run gpsscli load, the command submits, starts, and displays the progress of a GPSS job.

You provide a YAML-formatted configuration file that defines the job parameters when you run the command. You may also choose a name to identify the job. If you do not provide a name, the command returns a unique job identifier.

By default, gpsscli load loads all available data and then waits indefinitely for new messages to load. In the case of user interrupt or exit, the GPSS job remains in the Running state. You must explicitly stop the job with gpsscli stop when running in this mode.

When you provide the --quit-at-eof option to the command, the utility exits after it reads all published data, writes the data to Greenplum Database, and stops the job. The GPSS job is in the Stopped state when the command returns.

In the case of user interrupt or exit, the GPSS job remains in the Running state. You must explicitly stop the job with gpsscli stop.

If gpsscli load detects an offset mismatch, you can choose to resume a load operation from the earliest available data. Or, you may choose to load only new data.

If the GPSS instance to which you want to send the request is not running on the default host (127.0.0.1) or the default port number (5000), you can specify the GPSS host and/or port via command line options.

Options

config.yaml
The YAML-formatted configuration file that defines the job parameters. If the filename provided is not an absolute path, Greenplum Database assumes the file system location is relative to the current working directory.
Note: GPSS uses a YAML configuration file to uniquely identify a load operation. Submit a configuration file only once. If you submit the same configuration file more than once, GPSS will create the job, but it will eventually error out.
--name job_name
Use job_name to identify the job. If you do not provide a name, the command returns a unique job identifier.
--quit-at-eof
When you specify this option, gpsscli load exits after it reads all of the Kafka messages published to the topic. The default behaviour of gpsscli load is to wait indefinitely for, and then consume, new Kafka messages published to the topic.
--no-reuse
The default behaviour of gpsscli load is to reuse an external table that it may have previously created for this job. When you specify the --no-reuse option, gpsscli load will drop the external tale currently associated with the job, and create a new external table for the job.
--force-reset-earliest
gpsscli load returns an error if its recorded offset is behind that of the current earliest data offset for the source. Specify the --force-reset-earliest option to resume the load operation from the earliest available data.
--force-reset-latest
gpsscli load returns an error if its recorded offset is behind that of the current earliest data offset for the source. Specify the --force-reset-latest option to load only new data emitted by the source.
--gpss-host host
Specify the GPSS host. The default host address is 127.0.0.1.
--gpss-port port
Specify the GPSS port number. The default port number is 5000.
-l | --log-dir directory
Specify the directory to which GPSS writes client command log files. GPSS must have write permission to the directory. GPSS creates the log directory if it does not exist.
If you do not provide this option, GPSS writes gpsscli client log files to the $HOME/gpAdminLogs directory.
-v | --verbose
The default behaviour of the command utility is to display information and error messages to stdout. When you specify the --verbose option, GPSS also outputs debug-level messages about the operation.
-h | --help
Show command utility help, and then exit.

Examples

Submit a GPSS load job named from_topic1 whose load parameters are defined by the configuration file named loadcfg.yaml:

$ gpsscli load --name from_topic1 loadcfg.yaml