gpsscli load

gpsscli load

Load data with the Greenplum Streaming Server.

Synopsis

gpsscli load jobconfig.yaml [--name job_name]
     [-f | --force] [--quit-at-eof]
     [{--force-reset-earliest | --force-reset-latest | --force-reset-timestamp tstamp}]
     [--config gpsscliconfig.json]
     [--gpss-host host] [--gpss-port port]
     [--no-check-ca] [-l | --log-dir directory] [--verbose]

gpsscli load {-h | --help}

Description

The gpsscli load command initiates a load job to a specific Greenplum Streaming Server (GPSS) instance. When you run gpsscli load, the command submits, starts, and displays the progress of a GPSS job.

You provide a YAML-formatted configuration file that defines the job parameters when you run the command. You may also choose a name to identify the job. If you do not provide a name, the command returns a unique job identifier.

By default, gpsscli load loads all available data and then waits indefinitely for new messages to load. In the case of user interrupt or exit, the GPSS job remains in the Running state. You must explicitly stop the job with gpsscli stop when running in this mode.

When you provide the --quit-at-eof option to the command, the utility exits after it reads all published data, writes the data to Greenplum Database, and stops the job. The GPSS job is in the Stopped state when the command returns.

If gpsscli load detects an offset mismatch when loading from a Kafka data source, you can choose to resume a load operation from the earliest available data. Or, you may choose to load only new data, or data emitted since a specific time.

If the GPSS instance to which you want to send the request is not running on the default host (127.0.0.1) or the default port number (5000), you can specify the GPSS host and/or port via command line options.

Options

jobconfig.yaml
The YAML-formatted configuration file that defines the job parameters. If the filename provided is not an absolute path, Greenplum Database assumes the file system location is relative to the current working directory.
Note: GPSS uses a YAML configuration file to uniquely identify a load operation. Submit a configuration file only once. If you submit the same configuration file more than once, GPSS will create the job, but it will eventually error out.
--name job_name
Use job_name to identify the job. If you do not provide a name, the command returns a unique job identifier.
-f | --force
Force GPSS to reload the configuration of a running job. GPSS stops the job, updates the job with the configuration specified in in jobconfig.yaml, and then restarts the job. If you previously named the job, you must provide --name job_name when you force job configuration reload with this option.
Note: Do not attempt to update a configuration property that GPSS uses to uniquely identify a job. If you change any such configuration property, GPSS creates a new internal job and loads all available messages.
--quit-at-eof
When you specify this option, gpsscli load exits after it reads all of the source data. The default behaviour of gpsscli load is to wait indefinitely for, and then consume, new data from the source.
gpsscli load ignores job retry SCHEDULE configuration settings when it is invoked with the --quit-at-eof flag.
--force-reset-earliest
gpsscli load returns an error if its recorded offset does not match that of the Kafka data source. Re-run gpsscli load and specify the --force‑reset‑earliest option to resume the load operation from the earliest available data offset known to the data source.
Note: gpsscli load supports this option only when loading from a Kafka data source.
--force-reset-latest
gpsscli load returns an error if its recorded offset does not match that of the Kafka data source. Re-run gpsscli load and specify the --force‑reset‑latest option to load only new data emitted from the data source.
Note: gpsscli load supports this option only when loading from a Kafka data source.
--force-reset-timestamp tstamp
Specify the --force‑reset‑timestamp option to load Kafka messages published since the specified time. tstamp must specify epoch time in milliseconds, and is bounded by the earliest message time and the current time.
Note: gpsscli load supports this option only when loading from a Kafka data source.
--partition
By default, GPSS outputs the job progress by batch, and displays the start and end times, the message number and size, the number of inserted and rejected rows, and the transfer speed per batch. When you specify the --partition option, GPSS outputs the job progress by partition, and displays the partition identifier, the start and end times, the beginning and ending offsets, the message size, and the transfer speed per partition.
--config gpsscliconfig.json
The GPSS configuration file. This file includes properties that identify the gpss instance that services the command. When SSL encryption is enabled between the GPSS client and server, you also use this file to identify the file system location of the client SSL certificates. Refer to gpss.json for detailed information about the format of this file and the configuration properties supported.
Note: gpsscli subcommands read the configuration specified in the ListenAddress block of the gpsscliconfig.json file, and ignore the gpfdist configuration specified in the Gpfdist block of the file.
--gpss-host host
The GPSS host. The default host address is 127.0.0.1. If specified, overrides a ListenAddress:Host value provided in gpsscliconfig.json
--gpss-port port
The GPSS port number. The default port number is 5000. If specified, overrides a ListenAddress:Port value provided in gpsscliconfig.json
--no-check-ca
Disable certificate verification when SSL is enabled between the GPSS client and server. By default, GPSS checks the certificate authority (CA) each time that you invoke a gpsscli subcommand.
-l | --log-dir directory
The directory to which GPSS writes client command log files. GPSS must have write permission to the directory. GPSS creates the log directory if it does not exist.
If you do not provide this option, GPSS writes gpsscli client log files to the $HOME/gpAdminLogs directory.
--verbose
The default behaviour of the command utility is to display information and error messages to stdout. When you specify the --verbose option, GPSS also outputs debug-level messages about the operation.
-h | --help
Show command utility help, and then exit.

Examples

Submit a GPSS load job from Kafka named from_topic1 whose load parameters are defined by the configuration file named loadcfg.yaml:

$ gpsscli load --name from_topic1 loadcfg.yaml