gp_dump

A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 4.x documentation.

gp_dump

Writes out a database to SQL script files, which can then be used to restore the database using gp_restore.

Note: The gp_dump utility is deprecated and will be removed in a future release. Use gpcrondump and gpdbrestore to backup and restore Greenplum databases.

Synopsis

gp_dump [-a | -s] [-c] [-d] [-D] [-n schema] [-o] [-O] 
    [-t table_name] [-T table_name] [-x] [-h hostname] [-p port] 
    [-U username] [-W] [-i] [-v] [--gp-c] [--gp-d=backup_directory] 
    [--gp-r=reportfile] [--gp-s=dbid [, ...]] database_name

gp_dump -? | --help 

gp_dump --version

Description

The gp_dump utility dumps the contents of a database into SQL script files, which can then be used to restore the database schema and user data at a later time using gp_restore. During a dump operation, users will still have full access to the database.

The functionality of gp_dump is analogous to PostgreSQL’s pg_dump utility, which writes out (or dumps) the content of a database into a script file. The script file contains SQL commands that can be used to restore the databases, data, and global objects such as users, groups, and access permissions.

The functionality of gp_dump is modified to accommodate the distributed nature of a Greenplum database. Keep in mind that a database in Greenplum Database is actually comprised of several PostgreSQL instances (the master and all segments), each of which must be dumped individually. The gp_dump utility takes care of dumping all of the individual instances across the system.

The gp_dump utility performs the following actions and produces the following dump files:

  • On the master host
    • Dumps CREATE DATABASE SQL statements into a file in the master data directory. The default naming convention of this file is gp_cdatabase_1_dbid_timestamp. This statement can be run on the master instance to recreate the user database(s).
    • Dumps the user database schema(s) into a SQL file in the master data directory. The default naming convention of this file is gp_dump_1_dbid_timestamp. This file is used by gp_restore to recreate the database schema(s).
    • Creates a dump file in the master data directory named gp_dump_1_dbid_timestamp_post_data that contains commands to rebuild objects associated with the tables.

      When the database is restored with gp_restore, first, the schema and data are restored, and then, the dump file is used to rebuilt the other objects associated with the tables.

    • Creates a log file in the master data directory named gp_dump_status_1_dbid_timestamp.
    • gp_dump launches a gp_dump_agent for each segment instance to be backed up. gp_dump_agent processes run on the segment hosts and report status back to the gp_dump process running on the master host.
  • On the segment hosts
    • Dumps the user data for each segment instance into a SQL file in the segment instance’s data directory. By default, only primary (or active) segment instances are backed up. The default naming convention of this file is gp_dump_0_dbid_timestamp. This file is used by gp_restore to recreate that particular segment of user data.
    • Creates a log file in each segment instance’s data directory named gp_dump_status_0_dbid_timestamp.

Note that the 14 digit timestamp is the number that uniquely identifies the backup job, and is part of the filename for each dump file created by a gp_dump operation. This timestamp must be passed to the gp_restore utility when restoring a Greenplum database.

Options

-a | --data-only
Dump only the data, not the schema (data definitions).
-s | --schema-only
Dump only the object definitions (schema), not data.
-c | --clean
Output commands to clean (drop) database objects prior to (the commands for) creating them.
-d | --inserts
Dump data as INSERT commands (rather than COPY). This will make restoration very slow; it is mainly useful for making dumps that can be loaded into non-PostgreSQL based databases. Note that the restore may fail altogether if you have rearranged column order. The -D option is safer, though even slower.
-D | --column-inserts
Dump data as INSERT commands with explicit column names (INSERT INTO table (column, ...) VALUES ...). This will make restoration very slow; it is mainly useful for making dumps that can be loaded into non-PostgreSQL based databases.
-n schema | --schema=schema
Dumps the contents of the named schema only. If this option is not specified, all non-system schemas in the target database will be dumped. You cannot backup system catalog schemas (such as pg_catalog) with gp_dump.
Caution: In this mode, gp_dump makes no attempt to dump any other database objects that objects in the selected schema may depend upon. Therefore, there is no guarantee that the results of a single-schema dump can be successfully restored by themselves into a clean database.
-o | --oids
Dump object identifiers (OIDs) as part of the data for every table. Use of OIDs is not recommended in Greenplum Database, so this option should not be used if restoring data to another Greenplum Database installation.
-O | --no-owner
Do not output commands to set ownership of objects to match the original database. By default, gp_dump issues ALTER OWNER or SET SESSION AUTHORIZATION statements to set ownership of created database objects. These statements will fail when the script is run unless it is started by a superuser (or the same user that owns all of the objects in the script). To make a script that can be restored by any user, but will give that user ownership of all the objects, specify -O.
-t table | --table=table
Dump only tables (or views or sequences) matching the table pattern. Multiple tables can be selected by writing multiple -t switches. Also, the table parameter is interpreted as a pattern according to the same rules used by psql’s \d commands, so multiple tables can also be selected by writing wildcard characters in the pattern. When using wildcards, be careful to quote the pattern if needed to prevent the shell from expanding the wildcards. The -n and -N switches have no effect when -t is used, because tables selected by -t will be dumped regardless of those switches, and non-table objects will not be dumped.
Note: When -t is specified, gp_dump makes no attempt to dump any other database objects that the selected table(s) may depend upon. Therefore, there is no guarantee that the results of a specific-table dump can be successfully restored by themselves into a clean database.
Note: -t cannot be used to specify a child table partition. To dump a partitioned table, you must specify the parent table name.
-T table | --exclude-table=table
Do not dump any tables matching the table pattern. The pattern is interpreted according to the same rules as for -t. -T can be given more than once to exclude tables matching any of several patterns. When both -t and -T are given, the behavior is to dump just the tables that match at least one -t switch but no -T switches. If -T appears without -t, then tables matching -T are excluded from what is otherwise a normal dump.
-x | --no-privileges | --no-acl
Prevents the dumping of access privileges (GRANT/REVOKE commands).
-h hostname | --host=hostname
The host name of the Greenplum Database master host. If not provided, the value of $PGHOST or the local host is used.
-p port | --port=port
The Greenplum Database master port. If not provided, the value of $PGPORT or the port number provided at compile time is used.
-U username | --username=user
The database superuser account name, for example gpadmin. If not provided, the value of $PGUSER or the current OS user name is used.
-W (force password prompt)
Forces a password prompt. This will happen automatically if the server requires password authentication.
-i | --ignore-version
Ignores a version mismatch between gp_dump and the database server.
-v | --verbose
Specifies verbose mode. This will cause gp_dump to output detailed object comments and start/stop times to the dump file, and progress messages to standard error.
--gp-c (use gzip)
Use gzip for inline compression.
--gp-d=directoryname
Specifies the relative or absolute path where the backup files will be placed on each host. If this is a relative path, it is considered to be relative to the data directory. If the path does not exist, it will be created, if possible. If not specified, defaults to the data directory of each instance to be backed up. Using this option may be desirable if each segment host has multiple segment instances — it will create the dump files in a centralized location.
--gp-r=reportfile
Specifies the full path name where the backup job log file will be placed on the master host. If not specified, defaults to the master data directory or if running remotely, the current working directory.
--gp-s=dbid (backup certain segments)
Specifies the set of active segment instances to back up with a comma-separated list of the segments dbid. The default is to backup all active segment instances.
database_name
Required. The name of the database you want to dump. If not specified, the value of $PGDATABASE will be used. The database name must be stated last after all other options have been specified.
-? | --help (help)
Displays the online help.
--version (show utility version)
Displays the version of this utility.

Examples

Back up a database:

gp_dump gpdb

Back up a database, and create dump files in a centralized location on all hosts:

gp_dump --gp-d=/home/gpadmin/backups gpdb

Back up a particular schema only:

gp_dump -n myschema mydatabase

Back up a single segment instance only (by noting the dbid of the segment instance):

gp_dump --gp-s=5 gpdb