gp_restore

A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 4.x documentation.

gp_restore

Restores Greenplum databases that were backed up using gp_dump.

Note: The gp_restore utility is deprecated and will be removed in a future release. Use gpcrondump and gpdbrestore to backup and restore Greenplum databases.

Synopsis

gp_restore --gp-k=timestamp_key -d database_name [-i] [-v] [-a| -s] 
            [-h hostname] [-p port] [-U username] [-W] [--gp-c] 
            [--gp-i] [--gp-d=directoryname] [--gp-r=reportfile]
            [--gp-l=dbid [, ...]]

gp_restore -? | -h | --help 

gp_restore --version

Description

The gp_restore utility recreates the data definitions (schema) and user data in a Greenplum database using the script files created by a gp_dump operation. The use of this utility assumes:

  1. You have backup files created by a gp_dump operation.
  2. Your Greenplum Database system up and running.
  3. Your Greenplum Database system has the exact same number of segment instances (primary and mirror) as the system that was backed up using gp_dump.
  4. (Optional) The gp_restore utility uses the information in the Greenplum system catalog tables to determine the hosts, ports, and data directories for the segment instances it is restoring. If you want to change any of this information (for example, move the system to a different array of hosts) you must use the gprebuildsystem and gprebuildseg scripts to reconfigure your array before restoring.
  5. The databases you are restoring have been created in the system.
  6. If you used the options -s (schema only) , -a (data only), --gp-c (compressed), --gp-d (alternate dump file location) when performing the gp_dump operation, you must specify these options when doing the gp_restore as well.

The functionality of gp_restore is analogous to PostgreSQL’s pg_restore utility, which restores a database from files created by the database backup process. It issues the commands necessary to reconstruct the database to the state it was in at the time it was saved.

The functionality of gp_restore is modified to accommodate the distributed nature of a Greenplum database, and to use files created by a gp_dump operation. Keep in mind that a database in Greenplum is actually comprised of several PostgreSQL database instances (the master and all segments), each of which must be restored individually. The gp_restore utility takes care of populating each segment in the system with its own distinct portion of data.

Note:

The gp_dump utility creates a dump file in the master data directory named gp_dump_1_<dbid>_<timestamp>_post_data that contains commands to rebuild objects associated with the tables. When the database is restored with gp_restore, first, the schema and data are restored, and then, the dump file is used to rebuilt the other objects associated with the tables.

The gp_restore utility performs the following actions:

On the master host

  • Creates the user database schema(s) using the gp_dump_1_<dbid>_<timestamp> SQL file created by gp_dump.
  • Creates a log file in the master data directory named gp_restore_status_1_<dbid>_<timestamp>.
  • gp_restore launches a gp_restore_agent for each segment instance to be restored. gp_restore_agent processes run on the segment hosts and report status back to the gp_restore process running on the master host.

On the segment hosts

  • Restores the user data for each segment instance using the gp_dump_0_<dbid>_<timestamp> files created by gp_dump. Each segment instance on a host (primary and mirror instances) are restored.
  • Creates a log file for each segment instance named gp_restore_status_0_<dbid>_<timestamp>.

The 14 digit timestamp is the number that uniquely identifies the backup job to be restored, and is part of the filename for each dump file created by a gp_dump operation. This timestamp must be passed to the gp_restore utility when restoring a Greenplum Database.

Note:

The restore status files are stored under the db_dumps/<date> directory.

After the data in the tables is restored, check the report status files to verify that there no errors.

Options

--gp-k=timestamp_key
Required. The 14 digit timestamp key that uniquely identifies the backup set of data to restore. This timestamp can be found in the gp_dump log file output, as well as at the end of the dump files created by a gp_dump operation. It is of the form YYYYMMDDHHMMSS.
-d database_name | --dbname=dbname
Required. The name of the database to connect to in order to restore the user data. The database(s) you are restoring must exist, gp_restore does not create the database.
-i | --ignore-version
Ignores a version mismatch between gp_restore and the database server.
-v | --verbose
Specifies verbose mode.
-a | --data-only
Restore only the data, not the schema (data definitions).
-s | --schema-only
Restores only the schema (data definitions), no user data is restored.
-h hostname | --host=hostname
The host name of the Greenplum master host. If not provided, the value of PGHOST or the local host is used.
-p port | --port=port
The Greenplum master port. If not provided, the value of PGPORT or the port number provided at compile time is used.
-U username | --username=username
The database superuser account name, for example gpadmin. If not provided, the value of PGUSER or the current OS user name is used.
-W (force password prompt)
Forces a password prompt. This will happen automatically if the server requires password authentication.
--gp-c (use gunzip)
Use gunzip for inline decompression.
--gp-i (ignore errors)
Specifies that processing should ignore any errors that occur. Use this option to continue restore processing on errors.
--gp-d=directoryname
Specifies the relative or absolute path to backup files on the hosts. If this is a relative path, it is considered to be relative to the data directory. If not specified, defaults to the data directory of each instance being restored. Use this option if you created your backup files in an alternate location when running gp_dump.
--gp-r=reportfile
Specifies the full path name where the restore job report file will be placed on the master host. If not specified, defaults to the master data directory.
--gp-l=dbid [, ...] (restore certain segments)
Specifies whether to check for backup files on only the specified active segment instances (followed by a comma-separated list of the segments dbid). The default is to check for backup files on all active segments, restore the active segments, and then syncronize the mirrors.
-? | -h | --help (help)
Displays the online help.
--version (show utility version)
Displays the version of this utility.

Examples

Restore a Greenplum database using backup files created by gp_dump:

gp_restore --gp-k=2005103112453 -d gpdb

Restore a single segment instance only (by noting the dbid of the segment instance):

gp_restore --gp-k=2005103112453 -d gpdb --gp-s=5