gpfdist

A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 4.x documentation.

gpfdist

gpfdist serves external data files from a directory on the file host to all Greenplum Database segments in parallel. gpfdist uncompresses gzip (.gz) and bzip2 (.bz2) files automatically. Run gpfdist on the host on which the external data files reside.

All primary segments access the external file(s) in parallel, subject to the number of segments set in the gp_external_max_segments server configuration parameter. Use multiple gpfdist data sources in a CREATE EXTERNAL TABLE statement to scale the external table's scan performance. For more information about configuring this, see Controlling Segment Parallelism.

You can use the wildcard character (*) or other C-style pattern matching to denote multiple files to get. The files specified are assumed to be relative to the directory that you specified when you started gpfdist.

gpfdist is located in the $GPHOME/bin directory on your Greenplum Database master host and on each segment host. See the gpfdist reference documentation for more information about using gpfdist with external tables.