gpsscli.yaml
gpsscli.yaml
gpsscli configuration file.
Synopsis
DATABASE: db_name USER: user_name PASSWORD: password HOST: master_host PORT: greenplum_port VERSION: version_number DATASOURCE DATASOURCE_specific_properties [SCHEDULE: RETRY_INTERVAL: retry_time MAX_RETRIES: num_retries]
PROPERTY: {{template_var}}
Description
You specify the configuration parameters for a Greenplum Streaming Server (GPSS) job in a YAML-formatted configuration file that you provide to the gpsscli submit command. There are two types of configuration parameters in this file - Greenplum Database connection parameters, and parameters specific to the data source from which you will load data into Greenplum.
This reference page uses the name gpsscli.yaml to refer to this file; you may choose your own name for the file.
The gpsscli utility processes the YAML configuration file in order, using indentation (spaces) to determine the document hierarchy and the relationships between the sections. The use of white space in the file is significant, and keywords are case-sensitive.
Keywords and Values
- DATABASE: db_name
- The name of the Greenplum database.
- USER: user_name
- The name of the Greenplum Database user/role. This user_name must have permissions as described in Configuring Greenplum Database Role Privileges.
- PASSWORD: password
- The password for the Greenplum Database user/role. By default, the GPSS client passes the password to the GPSS server in clear text. When the password has a SHADOW: prefix, it represents a shadowed password string, and GPSS uses the Shadow:Key specified in its gpss.json configuration file, or a default key, to decode the password.
- HOST: master_host
- The host name or IP address of the Greenplum Database master host.
- PORT: greenplum_port
- The port number of the Greenplum Database server on the master host.
- VERSION: version_number
- The version of the gpsscli configuration file. GPSS supports versions 1 and 2 of this format.
- DATASOURCE
- The data source. GPSS currently supports KAFKA and
FILE data sources;
refer to gpkafka-v2.yaml
and filesource.yaml
for configuration file format and parameters.
- DATASOURCE_specific_parameters
- Parameters specific to the datasource.
- SCHEDULE:
- Controls the frequency and interval of restarting failed jobs.
- RETRY_INTERVAL: retry_time
- The period of time that GPSS waits before retrying the job. You can specify the time interval in day (d), hour (h), minute (m), second (s), or millisecond (ms) integer units; do not mix units. The default retry interval is 5m (5 minutes).
- MAX_RETRIES: num_retries
- The maximum number of times that GPSS attempts to retry the job. The default is 0, do not retry. If you specify a negative value, GPSS retries the job indefinitely.
Template Variables
GPSS supports using template variables to specify property values in the load configuration file.
PROPERTY: {{template_var}}
MAX_RETRIES: {{numretries}}
GPSS substitutes the template variable with a value that you specify via the -p | --property template_var=value option to the gpsscli submit, gpsscli load, or gpkafka load command.
--property numretries=10GPSS substitutes occurrences of {{numretries}} in the load configuration file with the value 10 before submitting the job, and uses that value during job execution.
Examples
Submit a job to load data into Greenplum Database as defined in the load configuration file named loadit.yaml:
$ gpsscli submit loadit.yaml
Example Greenplum Database configuration parameters in loadit.yaml:
DATABASE: ops USER: gpadmin PASSWORD: changeme HOST: mdw-1 PORT: 15432 DATASOURCE_block ...
See Also
gpsscli load, gpsscli submit, gpkafka load, filesource.yaml, gpkafka-v2.yaml