Backing Up and Restoring Databases

A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 4.x documentation.

Backing Up and Restoring Databases

Taking regular backups ensures that you can restore your data or rebuild your Greenplum Database system if data corruption or system failure occur. You can also use backups to migrate data from one Greenplum Database system to another.

Backup and Restore Operations

Greenplum Database supports parallel and non-parallel backup and restore. Parallel backup and restore operations scale regardless of the number of segments in your system. Greenplum Database also supports non-parallel backup and restore utilities to enable migration from PostgreSQL to Greenplum. See Non-Parallel Backup.

Parallel Backup Overview

The Greenplum Database parallel dump utility gpcrondump backs up the Greenplum master instance and each active segment instance at the same time.

By default, gpcrondump creates dump files in the gp_dump subdirectory.

Several dump files are created for the master, containing database information such as DDL statements, the Greenplum system catalog tables, and metadata files. gpcrondump creates one dump file for each segment, which contains commands to recreate the data on that segment.

You can perform full or incremental backups. To restore a database to its state when an incremental backup was made, you will need to restore the previous full backup and all subsequent incremental backups.

Each file created for a backup begins with a 14-digit timestamp key that identifies the backup set the file belongs to.

Figure 1. Parallel Backups in Greenplum Database

gpchrondump can be run directly in a terminal on the master host, or you can add it to crontab on the master host to schedule regular backups. See Automating Parallel Backups with gpcrondump.

Non-Parallel Backup

You can use the PostgreSQL non-parallel backup utilities pg_dump and pg_dumpall to migrate PostgreSQL databases to Greenplum Database. These utilities create a single dump file on the master host that contains all data from all active segments. In most cases, the master host does not have enough disk space for a single backup file of a distributed database.

Another way to backup Greenplum Database data is to use the COPY TO SQL command to copy all or a portion of a table out of the database to a delimited text file on the master host.

Parallel Restores

The Greenplum Database parallel restore utility gpdbrestore takes the timestamp key generated by gpcrondump, validates the backup set, and restores the database objects and data into a distributed database in parallel. Parallel restore operations require a complete backup set created by gpcrondump, a full backup and any required incremental backups.

Figure 2. Parallel Restores in Greenplum Database

The Greenplum Database gpdbrestore utility provides flexibility and verification options for use with the automated backup files produced by gpcrondump or with backup files moved from the Greenplum array to an alternate location. See Restoring a Database Using gpdbrestore.

Note:

gpdbrestore can also be used to copy files to the alternate location.

Non-Parallel Restores

Greenplum supports the non-parallel PostgreSQL restore utility pg_restore to enable:

  • Migration from PostgreSQL to Greenplum Database.
  • Migration between Greenplum Database systems with different configurations, such as a source system with four segments to a target system with give segments. Neither gpdbrestore or gp_restore can distribute the source system’s backup files evenly across an expanded system.

pg_restore requires compressed dump files created by pg_dump or pg_dumpall. Before starting the restore, modify the CREATE TABLE statements in the dump files to include the Greenplum DISTRIBUTED clause.

Note:

If you do not include the DISTRIBUTED clause, Greenplum Database assigns a default value. For details, see CREATE TABLE in the Greenplum Database Reference Guide.

To perform a non-parallel restore using parallel backup files, collect each backup file from the segment hosts, copy them to the master host, and load them through the master. See Restoring to a Different Greenplum System Configuration.

Figure 3. Non-parallel Restore Using Parallel Backup Files

Backing Up a Database

The options for database backup are as follows.

  • Schedule or run routine dumps withgpcrondump. gpcrondump allows you to schedule routine backups, including incremental backups, using the UNIX scheduling utility, cron. Schedule cron jobs that call gpcrondump on the Greenplum master host. gpcrondump backs up databases, data, and objects such as database roles and server configuration files.

    Full backup jobs scheduled or run with gpcrondump can use Data Domain Boost. See Using Data Domain Boost.

  • Create a dump file for each segment with gp_dump. Use this option to backup a database or to migrate your data to a system with the same segment configuration. You must use the gp_restore utility to restore the database. You can use dump files created by gp_dump to restore to a different Greenplum system configuration.
  • Create a single dump file with pg_dump or pg_dumpall. Use this option to migrate your data to another database vendor’s system. If restoring to a PostgreSQL or Greenplum database and the dump file is in archive format, you can use pg_restore. If the dump file is in plain text format, you can use a client such as psql. To restore to another Greenplum Database system, do a parallel dump using gpcrondump or gp_dump, then do a non-parallel restore.

Incremental Backup Support

For Greenplum Database Version 4.2.5 and later, the utilities gpcrondump and gpdbrestore support incremental backups and restores of append-optimized tables, including column-oriented tables. Use the gpcrondump option --incremental to create an incremental backup.

An incremental backup is similar to a full backup. An incremental backup creates backup files for the master and segments. A unique 14-digit timestamp key identifies the files that comprise an incremental backup set. Similar to a full backup, an incremental backup backs up regular, heap-storage tables. The difference between an incremental backup and a full backup is that append-optimized tables are backed up only if one of the following operations was performed on the table after the last backup:

  • ALTER TABLE

  • DELETE

  • INSERT

  • TRUNCATE

  • UPDATE

  • DROP and then re-create the table

For partitioned append-optimized tables, only the changed table partitions are backed up.

Important:

Incremental backups are not supported with Data Domain Boost. You cannot use Data Domain Boost with a full backup if you plan to create incremental backups that use the full backup. You can use a Data Domain server as an NFS file system to perform incremental backups.

To create an incremental backup or restore data from an incremental backup, you need the complete backup set. A complete backup set consists of a full backup and any incremental backups that were created after the last full backup. See Example using full and incremental backups.

Because incremental backups are table-based for append-optimized tables, incremental backups are efficient when the updates are made to data in tables that are separate from tables that contain unchanged data, and the total amount of data in tables that have changed is small compared to the data in the tables that contain unchanged data. Incremental backups are also efficient with append-optimized partition tables if only a few table partitions are being updated. An incremental backup only backs up the partitions that have changed.

When you archive incremental backups, all incremental backups between the last full backup and the target incremental backup must be archived. You must archive all the files that are created to back up the master and all segments.

Changes to the Greenplum Database segment configuration invalidate incremental backups. After you change the segment configuration you must create a full backup before you can create an incremental backup.

Example using full and incremental backups

Each backup set has a key, which is a timestamp taken when the backup is created. For example, if you created a backup that starts on May 14, 2012, the backup set file names would contain 20120514hhmmss. The hhmmss represents the time, hour, minute, and second.

Assume you created both full and incremental backups of the database mytest. You used the following command to create full backups:

gpcrondump –x mytest –u /backupdir

You used this command to create incremental backups.

gpcrondump –x mytest –u /backupdir --incremental

When you specify the -u option, the backups are created in the directory \backupdir on the Greenplum Database hosts with the following timestamp keys. The full backups have the timestamp key 20120514054532 and 20121114064330. The other backups are incremental backups.

  • 20120514054532 (full backup)
  • 20120714095512
  • 20120914081205
  • 20121114064330 (full backup)
  • 20130114051246

If you create an incremental backup that starts on March 14, 2013, the timestamp for the backup would be 20130314hhmmss. To create the incremental backup, you need both the incremental backup 20130114051246 and the full backup 20121114064330. Also, you must specify the same -u option for any incremental backups that are part of the backup set.

To restore a database with the incremental backup 20120914081205, you need the incremental backups 20120914081205 and 20120714095512, and the full backup 20120514054532.

To restore the mytest database with the incremental backup 20130114051246, you need only the incremental backup and the full backup 20121114064330. The restore command would be similar to this command.

gpdbrestore –t 20130114051246 -u /backupdir

Backing Up a Set of Tables

You can perform an incremental backup on a set of database tables by specifying the gpcrondump option --prefix to identify the backup set when you specify the tables to include or exclude when you create the full backup.

First, create a full backup of a set of tables. When you create the full backup, specify the --prefix option to identify the backup set. To include a set of tables, use the gpcrondump option -t or --table-file. To exclude a set of tables, use the gpcrondump option -T or --exclude-table-file.

To create an incremental backup based on the full backup of the set of tables, specify the gpcrondump option --incremental and the --prefix option with the string specified when creating the full backup. The incremental backup is limited to only the tables in the full backup.

The following example uses the --table-file option to create a full backup for the set of tables listed in the file user-tables. The prefix user_backup identifies the backup set.

gpcrondump -x mydatabase --table-file=user-tables --prefix user_backup

To create an incremental backup for the backup identified by the prefix user_backup, specify the --incremental option and the option --prefix user_backup to identify backup set. This example creates an incremental backup.

gpcrondump -x mydatabase --incremental --prefix user_backup

This command lists the tables that were included or exclued for the full backup.

gpcrondump -x mydatabase --incremental --prefix user_backup --list-filter-tables

Restoring From an Incremental Backup

When restoring a backup with gpdbrestore, command line output displays whether the backup is an incremental or a full backup. If the gpdbrestore option -q is specified, the backup type information is written to the log file.

With the gpdbrestore option --noplan, you can restore only the data contained in an incremental backup.

With the --list-backup option you can display the full and incremental backup sets required to perform a restore.

Using Direct I/O

Direct I/O allows you to bypass the buffering of memory within the file system cache. When Direct I/O is used for a file, data is transferred directly from the disk to the application buffer, without the use of the file buffer cache. Direct I/O benefits applications by reducing CPU consumption and eliminating the overhead of copying data twice: first between the disk and the file buffer cache, and then from the file.

Note: Direct I/O is supported only on RHEL, CentOS and SUSE.

Turn on Direct I/O

$ gpconfig -c gp_backup_directIO -v on

Decrease network data chunks sent to dump when the database is busy

$ gpconfig -c gp_backup_directIO_read_chunk_mb -v 10

The above command sets the chunk size to 10MB; the default chunk size is 20MB. The default value has been tested to be the optimal setting. Decreasing it will increase the backup time and increasing it will result in little change to backup time.

Verify the current data chunk size

$ gpconfig –s gp_backup_directIO_read_chunk_mb

Verify whether Direct I/O is turned on

$ gpconfig –s gp_backup_directIO

Using Data Domain Boost

Data Domain Boost is a gpcrondump and gpdbrestore option that provides faster backups after the initial backup operation, and provides deduplication at the source to decrease network traffic. When you restore files from the Data Domain system with Data Domain Boost, some files are copied to the master local disk and are restored from there, and others are restored directly.

With Data Domain Boost managed file replication, you can replicate Greenplum Database backup images that are stored on a Data Domain system for disaster recover purposes. The gpmfr utility manages the Greenplum Database backup sets that are on the primary and a remote Data Domain system. For information about gpmfr, see the Greenplum Database Utility Guide.

Managed file replication requires network configuration when a replication network is being used between two Data Domain systems:

  • The Greenplum Database system requires the Data Domain login credentials to be configured with gpcrondump. These credentials are created for the local and remote Data Domain systems.
  • When the non-management network interface is used for replication on the Data Domain systems, static routes must be configured on the systems to pass the replication data traffic to the correct interfaces.

Do not use Data Domain Boost with gp_dump, pg_dump, or pg_dumpall.

Refer to Data Domain Boost documentation for detailed information.

Important: Incremental backups are not supported with Data Domain Boost. You cannot use Data Domain Boost with a full backup if you plan to create incremental backups that use the full backup. You can use a Data Domain server as an NFS file system to perform incremental backups.

Data Domain Boost Requirements

Using Data Domain Boost requires the following.

  • Purchase and install a Data Domain Boost license on the Data Domain.
  • Obtain sizing recommendations for Data Domain Boost.

Contact your EMC Data Domain account representative for assistance.

One-Time Data Domain Boost Credential Setup

There is a one-time process to set up credentials to use Data Domain Boost. Credential setup connects one Greenplum Database instance to one Data Domain instance. If you are using Data Domain Boost managed file replication capabilities for disaster recovery purposes, you must set up credentials for both the primary and remote Data Domain systems.

To perform the credential setup, run gpcrondump with the following options:

--ddboost-host ddboost_hostname --ddboost-user ddboost_user 
--ddboost-backupdir backup_directory

To remove credentials, run gpcrondump with the --ddboost-config-remove option.

To manage credentials for the remote Data Domain system that is used for backup replication, specify the --ddboost-remote option with the other gpcrondump options. For example, these options set up credentials for a Data Domain system that is used for backup replication. The system IP address is 192.0.2.230, the user ID is ddboostmyuser, and the location for the backups on the system is GPDB/gp_production:

--ddboost-host 192.0.2.230 --ddboost-user ddboostmyuser 
--ddboost-backupdir gp_production --ddboost-remote

For details, see gpcrondump in the Greenplum Database Utility Guide.

If you use two or more network connections to connect to the Data Domain system, use gpcrondump to set up the login credentials for the Data Domain hostnames associated with the network connections. To perform this setup for two network connections, run gpcrondump with the following options:

--ddboost-host ddboost_hostname1 
--ddboost-host ddboost_hostname2 --ddboost-user ddboost_user 
--ddboost-backupdir backup_directory

Configuring Data Domain Boost for the Greenplum Database

After you set up credentials for Data Domain Boost on the Greenplum Database, perform the following tasks in Data Domain to allow Data Domain Boost to work with the Greenplum Database:

Configuring Distributed Segment Processing in Data Domain

Configure the distributed segment processing option on the Data Domain system. The configuration applies to all the DCA servers and the Data Domain Boost plug-in installed on them. This option is enabled by default, but verify that it is enabled before using Data Domain Boost backups:

# ddboost option show

To enable distributed segment processing:

# ddboost option set distributed-segment-processing {enabled | disabled}

Configuring Advanced Load Balancing and Link Failover in Data Domain

If you have multiple network connections on a network subnet, you can create an interface group to provide load balancing and higher network throughput on your Data Domain system. When a Data Domain system on an interface group receives data from the media server clients, the data transfer is load balanced and distributed as separate jobs on the private network. You can achieve optimal throughput with multiple 1 GbE connections.

Note:

To ensure that interface groups function properly, use interface groups only when using multiple network connections on the same networking subnet.

To create an interface group on the Data Domain system, create interfaces with the net command (if interfaces do not already exist), add the interfaces to the group, and register the Data Domain system with the backup application.

  1. Add the interfaces to the group:
    # ddboost ifgroup add interface 192.0.2.1
    # ddboost ifgroup add interface 192.0.2.2
    # ddboost ifgroup add interface 192.0.2.3
    # ddboost ifgroup add interface 192.0.2.4
    
    Note: You can create only one interface group and this group cannot be named.
  2. Select one interface on the Data Domain system to register with the backup application. Create a failover aggregated interface and register that interface with the backup application.
    Note: You do not have to register one of the ifgroup interfaces with the backup application. You can use an interface that is not part of the ifgroup to register with the backup application.
  3. Enable DD Boost on the Data Domain system:
    # ddboost ifgroup enable
  4. Verify the Data Domain system configuration as follows:
    # ddboost ifgroup show config

    Results similar to the following appear.

    Interface
    -------------
    192.0.2.1
    192.0.2.2
    192.0.2.3
    192.0.2.4
    -------------
    

You can add or delete interfaces from the group at any time.

Note: Manage Advanced Load Balancing and Link Failover (an interface group) using the ddboost ifgroup command or from the Enterprise Manager Data Management > DD Boost view.

Export the Data Domain Path to the DCA Network

The commands and options in this topic apply to DDOS 5.0.x and 5.1.x. See the Data Domain documentation for details.

Use the following Data Domain commands to export the /backup/ost directory to the DCA for Data Domain Boost backups.

# nfs add /backup/ost 192.0.2.0/24, 198.51.100.0/24 (insecure)
Note: The IP addresses refer to the Greenplum system working with the Data Domain Boost system.
Create the Data Domain Login Credentials for the DCA

Create a username and password for the DCA to access the DD Boost Storage Unit (SU) at the time of backup and restore:

# user add <user> [password <password>] [priv {admin | security | user}]

Backup Options for Data Domain Boost

Specify the gpcrondump options to match the setup.

Data Domain Boost backs up files to the Data Domain system. Status and report files remain on the local disk.

To configure Data Domain Boost to remove old backup directories before starting a backup operation, specify a gpcrondump backup expiration option.

  • The -c option clears all backup directories.
  • The -o option clears the oldest backup directory.

To remove the oldest dump directory, specify gpcrondump --ddboost with the -o option. For example, if your retention period is 30 days, use gpcrondump --ddboost with the -o option on day 31.

Use gpcrondump --ddboost with the -c option to clear out all the old dump directories in db_dumps. The -c option deletes all dump directories that are at least one day old.

Using Named Pipes

Greenplum Database supports using named pipes with gpcrondump and gpdbrestore to back up and restore a Greenplum database. When backing up a database with regular files, the files that contain the backup information are placed in directories on the Greenplum Database segments. When you use named pipes, you can configure named pipes on Greenplum Database segments to connect to another process, such as input process to a backup device. With named pipes you can back up data without the need for regular files to temporarily store the backup files.

Backing up with named pipes is not supported if the option --ddboost is specified.

To back up a Greenplum database using named pipes:

  1. Generate the names of the named pipes with the gpcrondump options -K timestamp and --list-backup-files.

    The file names use the timestamp specified by the -K timestamp option and have the suffix _pipes and _regular_files. For example:

    gp_dump_20130514093000_pipes 
    
    gp_dump_20130514093000_regular_files
    

    The file names listed in the _pipes file can be created as named pipes. The file names in the _regular_files file cannot be created as named pipes. gpcrondump and gpdbrestore use the information in these files during backup and restore operations.

  2. Create the named pipes as writeable on all Greenplum Database segments.
  3. Back up the database using the named pipes.

    To back up a complete set of Greenplum Database backup files, the files listed in the _regular_files file must also be backed up.

  1. Configure the named pipes as readable.
  2. Restore the database using the named pipes and the backup files.

Example

This gpcrondump command creates two text files that contain the file names that will be used to back up the database testdb. The files are created in the directory /backups.

gpcrondump -x testdb -K 20130530090000 --list-backup-files -u /backups

After you create the writeable named pipes on all the Greenplum Database segments, you run gpcrondump to back up the database.

gpcrondump -x testdb -K 20130530090000 -u /backups

To restore the database with gpdbrestore, you configure the named pipes as readable and run this command:

gpdbrestore -x testdb -t 20130530090000 -u /backups

Backing Up a Database with gp_dump

The gp_dump utility dumps the contents of a Greenplum Database system into SQL files that contain SQL commands for recreating the system configuration and database and restoring the data. Users can access the database during dump operations.

Use gp_restore to restore databases from gp_dump files. gp_restore restores the schema and data first, then rebuilds objects associated with the database tables.

Important:

The gp_dump and gp_restore utilities do not support incremental backups and are deprecated. gp_dump and gp_restore will be removed in a future release. Use gpcrondump and gpdbrestore to backup and restore Greenplum databases.

gp_dump creates the system-level backup files and files for rebuilding database objects in the master host’s master data directory. The file names include the database id <dbid> and a 14-digit timestamp. The timestamp uniquely identifies the backup job and must be passed to gp_restore when you restore a Greenplum database.

  • Greenplum system catalog tables: gp_catalog_1_<dbid>_<timestamp>.
  • The CREATE DATABASE SQL statement: gp_cdatabase_1_<dbid>_<timestamp>. Run this statement on the master instance to recreate the database.
  • Create user database schema(s): gp_dump_1_<dbid>_<timestamp>. To recreate the database schema(s), specify this file to gp_restore.
  • Log file containing status information about the dump process: gp_dump_status_1_<dbid>_<timestamp>.
  • SQL commands for rebuilding table objects: gp_dump_1_<dbid>_<timestamp>_post_data.

gp_dump launches a gp_dump_agent for each segment instance to back up. gp_dump_agent processes run on the segment hosts and send their status to the gp_dump process running on the master host.

  • gp_dump dumps the user data for each segment instance into a SQL file in the segment instance’s data directory. By default, only primary (active) segment instances are backed up. The default naming convention of this file is gp_dump_0_<dbid>_<timestamp>. gp_restore uses these files to recreate particular user data segments.
  • gp_dump creates a log file in each segment instance’s data directory named gp_dump_status_0_<dbid>_<timestamp>.

To backup a Greenplum database using gp_dump

  1. From the master, run the gp_dump utility. For example (where mydatabase is the name of the database you are backing up):
    $ gp_dump mydatabase
    
    Note: gp_dump tries to resolve the master and segment hostnames from the calling machine and not from the machine the master is running on before connecting to them. This can cause failures if the calling machine is not the machine on which the master is running. To avoid this problem, use gpcrondump.

Automating Parallel Backups with gpcrondump

You can call gpcrondump directly or from a crontab entry. Use gpcrondump to backup databases, data, and objects such as database roles and server configuration files.

As the default, gpcrondump creates the dump files in the master and each segment instance’s data directory in <data_directory>/db_dumps/YYYYMMDD. The segment data dump files are compressed using gzip.

To schedule a dump operation using CRON

  1. On the master, log in as the Greenplum superuser (gpadmin).
  2. Define a crontab entry that calls gpcrondump. For example, if your shell is bin/bash and the PATH includes the location of the Greenplum Database management utilities, schedule a nightly dump of the sales database at one minute past midnight as follows:

    Linux Example:

    SHELL=/bin/bash
    GPHOME=/usr/local/greenplum-db-4.3.0.0
    MASTER_DATA_DIRECTORY=/data/gpdb_p1/gp-1
    01 0 * * * gpadmin source $GPHOME/greenplum_path.sh;
    
    gpcrondump -x sales -c -g -G -a -q >> gp_salesdump.log
    
  3. Create a file named mail_contacts in either the Greenplum superuser’s home directory or in $GPHOME/bin. For example:
    $ vi /home/gpadmin/mail_contacts
    $ vi /export/home/gpadmin/mail_contacts
    
  4. In this file, type one email address per line. For example:
    dba@example.com
    jjones@example.com
    
  5. Save and close the mail_contacts file. gpcrondump will send email notifications to the email addresses listed in this file.
  1. Ensure the One-Time Data Domain Boost Credential Setup is complete.
  2. Add the option --ddboost to the gpcrondump option:
    gpcrondump -x mydatabase -z -v --ddboost 
    
Important:

Do not use compression with Data Domain Boost backups. The -z option turns backup compression off.

Some of the options available in gpcrondump have different implications when using Data Domain Boost. For details, see gpcrondumpin the Greenplum Database Utility Guide.

Restoring From Parallel Backup Files

How you restore a database from parallel backup files depends on how you answer the following questions.

  1. Where are your backup files? If your backup files are on the segment hosts where gpcrondump or gp_dump created them, you can restore the database with gpdbrestore or gp_restore, respectively. If you moved your backup files off the Greenplum array, for example to an archive server with gpcrondump, use gpdbrestore.
  2. Are you recreating the Greenplum Database system, or just restoring your data? If Greenplum Database is running and you are restoring your data, use gpdbrestore or gp_restore. If you lost your entire array and need to rebuild the entire system from backup, use gpinitsystem.
  3. Are you restoring to a system with the same number of segment instances as your backup set? If you are restoring to an array with the same number of segment hosts and segment instances per host, use gpdbrestore or gp_restore. If you are migrating to a different array configuration, you must do a non-parallel restore. See Restoring to a Different Greenplum System Configuration.

Restoring a Database with gp_restore

gp_restore recreates the data definitions (schema) and user data in a database using backup files created by gp_dump.

Important: The gp_dump and gp_restore utilities are deprecated and will be removed in a future release. Use gpcrondump and gpdbrestore to backup and restore Greenplum databases.
The requirements for using gp_restore are:
  • Backup files created by gp_dump.
  • Backup files reside on the segment hosts where gp_dump created them.
  • A running Greenplum Database system with the same number of primary segment instances as the system backed up with gp_dump.
  • The restore target database is created in the system.
  • Use the same options for restore that you used for backup, for example, -s (schema only), -a (data only), --gp-c (compressed), --gp-d (alternate dump file location).

    The gp_restore utility performs the following actions.

On the master host

  • Runs the SQL DDL commands in the gp_dump_1_<dbid>_<timestamp> file created by gp_dump to recreate the database schema and objects.
  • Creates a log file in the master data directory named gp_restore_status_1_<dbid>_<timestamp>.
  • gp_restore launches a gp_restore_agent for each segment instance to be restored. gp_restore_agent processes run on the segment hosts and report status back to the gp_restore process running on the master host.

On the segment hosts

  • Restores the user data for each segment instance using the gp_dump_1_<dbid>_<timestamp> files created by gp_dump. Each primary and mirror segment instance on a host is restored.
  • Creates a log file for each segment instance named gp_restore_status_1_<dbid>_<timestamp>.

The 14-digit timestamp uniquely identifies the backup job to restore. You specify the timestamp when you invoke gp_restore with the --gp-k option.

To restore from a backup created by gp_dump

  1. Ensure that the backup files created by gp_dump reside on the master host and segment hosts for the Greenplum Database system you are restoring.
  2. Ensure that the restore’s target database exists. For example:
    $ createdb database_name
  3. From the master, run the gp_restore utility. --gp-k specifies the timestamp key of the backup job and -d specifies the database to connect to:
    $ gp_restore -gp-k=2012103112453 -d database_name

Restoring a Database Using gpdbrestore

The gpdbrestore utility provides convenience and flexibility in restoring from a set of backup files created by gpcrondump. To restore using gpdbrestore, ensure that you have:

  • A complete set of backup files created by a gpcrondump operation. A full backup and any required incremental backups.
  • A running Greenplum Database system.
  • A Greenplum Database system with the same number of primary segment instances as the system that was backed up.
  • The database you are restoring to is created in the system.

To restore from an archive host using gpdbrestore

This procedure assumes that the backup set was moved off the Greenplum array to another host in the network.

  1. Ensure that the archive host is reachable from the Greenplum master host:
    $ ping archive_host
  2. Ensure that the restore’s target database exists. For example:
    $ createdb database_name
  3. From the master, run the gpdbrestore utility. -R specifies the host name and path to a complete backup set:
    $ gpdbrestore -R archive_host:/gpdb/backups/archive/20120714
  1. Ensure the One-Time Data Domain Boost Credential Setup is complete.
  2. Add the option --ddboost to the gpdbrestore option:
    $ gpdbrestore -t backup_timestamp -v -ddboost
Note: Some of the options available in gpdbrestore have different implications when using Data Domain. For details, see gpdbrestore.

Restoring to a Different Greenplum System Configuration

To perform a parallel restore operation using gpdbrestore or gp_restore, the system you are restoring to must have the same configuration as the system that was backed up. To restore your database objects and data into a different system configuration, for example, to expand into a system with more segments, restore your parallel backup files by loading them through the Greenplum master. To perform a non-parallel restore, you must have:

  • A full backup set created by a gpcrondump or gp_dump operation. The backup file of the master contains the DDL to recreate your database objects. The backup files of the segments contain the data.
  • A running Greenplum Database system.
  • The database you are restoring to exists in the system.

Segment dump files contain a COPY command for each table followed by the data in delimited text format. Collect all of the dump files for all of the segment instances and run them through the master to restore your data and redistribute it across the new system configuration.

To restore a database to a different system configuration

  1. Ensure that you have a complete backup set, including dump files of the master (gp_dump_1_1_<timestamp>, gp_dump_1_1_<timestamp>_post_data) and one for each segment instance (gp_dump_0_2_<timestamp>, gp_dump_0_3_<timestamp>, gp_dump_0_4_<timestamp>, and so on). Each dump file must have the same timestamp key. gp_dump creates the dump files in each segment instance’s data directory. You must collect all the dump files and move them to one location on the master host. You can copy each segment dump file to the master, load it, and then delete it after it loads successfully.
  2. Ensure that the database you are restoring to is created in the system. For example:
    $ createdb database_name
  3. Load the master dump file to restore the database objects. For example:
    $ psql database_name -f /gpdb/backups/gp_dump_1_1_20120714
  4. Load each segment dump file to restore the data. For example:
    $ psql database_name -f /gpdb/backups/gp_dump_0_2_20120714
    $ psql database_name -f /gpdb/backups/gp_dump_0_3_20120714
    $ psql database_name -f /gpdb/backups/gp_dump_0_4_20120714
    $ psql database_name -f /gpdb/backups/gp_dump_0_5_20120714
    ...
  5. Load the post data file to restore database objects such as indexes, triggers, primary key constraints, etc.
    $ psql database_name -f /gpdb/backups/gp_dump_0_5_20120714_post_data