gpkafka load

A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 5.x documentation.

gpkafka load

Load data from Kafka into Greenplum Database.

Synopsis

gpkafka load [--quit-at-eof] config.yaml 

gpkafka load --help

Description

The gpkafka load utility loads data from a Kafka topic into a Greenplum Database table. When you run the command, you provide a YAML-formatted configuration file that defines load parameters such as the Greenplum Database connection options, the Kafka broker and topic, and the target Greenplum Database table.

By default, gpkafka load loads all Kafka messages published to the topic, and then waits indefinitely for new messages to load. When you provide the --quit-at-eof option to the command, the utility exits after it reads all published messages and writes the data to Greenplum Database.

In the case of user interrupt or exit, gpkafka load resumes a load operation specifying the same Kafka topic and Greenplum Database table names from the last recorded offset.

Options

config.yaml
The YAML-formatted configuration file that defines the load operation parameters. If the filename provided is not an absolute path, Greenplum Database assumes the file system location is relative to the current working directory. Refer to gpkafka.yaml for the format and content of the parameters that you specify in this file.
--quit-at-eof
When you specify this option, gpkafka load exits after it reads all of the Kafka messages published to the topic. The default behaviour of gpkafka load is to wait indefinitely for, and then consume, new Kafka messages published to the topic.
--help
Show command help, and then exit.

Examples

Stream Kafka data into Greenplum Database using the load parameters defined in a configuration file named loadcfg.yaml located in the current directory:

gpkafka load loadcfg.yaml

Load Kafka data into Greenplum Database using a configuration file located in the current directory named loadcfg.yaml; exit the load operation after reading all Kafka messages published to the topic:

gpkafka load --quit-at-eof loadcfg.yaml