gpkafka load

A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 5.x documentation.

gpkafka load

Load data from Kafka into Greenplum Database.

Synopsis

gpkafka load [--quit-at-eof] [--debug-port portnum] config.yaml

gpkafka load -h | --help 

Description

The gpkafka load utility loads data from a Kafka topic into a Greenplum Database table. When you run the command, you provide a YAML-formatted configuration file that defines load parameters such as the Greenplum Database connection options, the Kafka broker and topic, and the target Greenplum Database table.

By default, gpkafka load loads all Kafka messages published to the topic, and then waits indefinitely for new messages to load. When you provide the --quit-at-eof option to the command, the utility exits after it reads all published messages and writes the data to Greenplum Database.

If you provide the --debug-port option, gpkafka load displays debug information to stdout during the load operation and starts a debug server from which you can obtain additional debug information.

In the case of user interrupt or exit, gpkafka load resumes a load operation specifying the same Kafka topic and Greenplum Database table names from the last recorded offset.

Options

config.yaml
The Version 1 or Version 2 YAML-formatted configuration file that defines the load operation parameters. If the filename provided is not an absolute path, Greenplum Database assumes the file system location is relative to the current working directory. Refer to gpkafka.yaml and gpkafka-v2.yaml for the format and content of the parameters that you specify in Versions 1 and 2 of this file.
--quit-at-eof
When you specify this option, gpkafka load exits after it reads all of the Kafka messages published to the topic. The default behaviour of gpkafka load is to wait indefinitely for, and then consume, new Kafka messages published to the topic.
--debug-port portnum
When you specify this option, gpkafka load includes debug information such as source code file and line number in messages it writes to stdout. The utility also starts a debug server at the port identified by portnum; additional debug information including the call stack and performance statistics is available via curl http://gpkafkahost:portnum.
-h | --help
Show command help, and then exit.

Examples

Stream Kafka data into Greenplum Database using the load parameters defined in a configuration file named loadcfg.yaml located in the current directory:

gpkafka load loadcfg.yaml

Load Kafka data into Greenplum Database using a configuration file located in the current directory named loadcfg.yaml; exit the load operation after reading all Kafka messages published to the topic:

gpkafka load --quit-at-eof loadcfg.yaml