gpdbrestore

gpdbrestore

Restores a database from a set of dump files generated by gpcrondump.

Synopsis

gpdbrestore { -t timestamp_key { [-L] |
   [--netbackup-service-host netbackup_server 
   [--netbackup-block-size size] ] }
   -b YYYYMMDD | -R hostname:path_to_dumpset | -s database_name } 
   [--noplan] [--noanalyze] [-u backup_directory] [--list-backup]
   [--prefix prefix_string] [--report-status-dir report_directory]
   [-T schema.table [,...]] [--table-file file_name] [--truncate] [-e]
   [-m]
   [-restore-stats [include | only]]
   [-G [include | only]] [--change-schema schema_name] 
   [-B  parallel_processes] [-d master_data_directory] [-a] [-q] 
   [-l logfile_directory] [-v] [--ddboost] 
   [--redirect database_name ]

gpdbrestore -? 

gpdbrestore --version

Description

The gpdbrestore utility recreates the data definitions (schema) and user data in a Greenplum database using the script files created by gpcrondump operations.

When you restore from an incremental backup, the gpdbrestore utility assumes the complete backup set is available. The complete backup set includes the following backup files:

  • The last full backup before the specified incremental backup
  • All incremental backups created between the time of the full backup the specified incremental backup

The gpdbrestore utility provides the following functionality:

  • Automatically reconfigures for compression.
  • Validates the number of dump files are correct (for primary only, mirror only, primary and mirror, or a subset consisting of some mirror and primary segment dump files).
  • If a failed segment is detected, restores to active segment instances.
  • Except when restoring data from a NetBackup server, you do not need to know the complete timestamp key (-t) of the backup set to restore. Additional options are provided to instead give just a date (-b), backup set directory location (-R), or database name (-s) to restore.
  • The -R option allows the ability to restore from a backup set located on a host outside of the Greenplum Database array (archive host). Ensures that the correct dump file goes to the correct segment instance.
  • Identifies the database name automatically from the backup set.
  • Allows you to restore particular tables only (-T option) instead of the entire database. Note that single tables are not automatically dropped or truncated prior to restore.

    Performs an ANALYZE operation on the tables that are restored. You can disable the ANALYZE operation by specifying the option --noanalyze.

  • Can restore global objects such as roles and tablespaces (-G option).
  • Detects if the backup set is primary segments only or primary and mirror segments and performs the appropriate restore operation.
  • Allows you to drop the target database before a restore in a single operation.
Important: When restoring table data to an existing table, the utility assumes that the database table definition is the same as the table that was backed up. The utility does not check the table definitions.
Restoring a Database from NetBackup

Greenplum Database must be configured to communicate with the Symantec NetBackup master server that is used to restore database data. See the Greenplum Database System Administrator Guide for information about configuring Greenplum Database and NetBackup.

When restoring from NetBackup server, you must specify the timestamp of the backup with the -t option.

NetBackup is not compatible with DDBoost. Both NetBackup and DDBoost cannot be used in a single back up operation.

Restoring a Database with Named Pipes

If you used named pipes when you backed up a database with gpcrondump, named pipes with the backup data must be available when restoring the database from the backup.

Error Reporting

After a restore operation completes, the utility checks the gpdbrestore status file for SQL execution errors and displays a warning if an error is found. The default location of the restore status files are in the db_dumps/date/ directory.

Options

-a (do not prompt)
Do not prompt the user for confirmation.
-b YYYYMMDD
Looks for dump files in the segment data directories on the Greenplum Database array of hosts in db_dumps/YYYYMMDD. If --ddboost is specified, the systems looks for dump files on the Data Domain Boost host.
-B parallel_processes
The number of segments to check in parallel for pre/post-restore validation. If not specified, the utility will start up to 60 parallel processes depending on how many segment instances it needs to restore.
--change-schema=schema_name
Optional. Restores tables from a backup created with gpcrondump to a different schema. The schema_name must exist in the database. If the schema does not exist, the utility returns an error. System catalog schemas are not supported.
You must specify tables to restore with the -T and --table-file options. If a table that is being restored exists in schema-name, the utility returns a warning and attempts to append the data to the table from the backup. You can specify the --trunctate option to truncate table data before restoring data to the table from the backup.
-d master_data_directory
Optional. The master host data directory. If not specified, the value set for $MASTER_DATA_DIRECTORY will be used.
--ddboost
Use Data Domain Boost for this restore, if the --ddboost option was passed when the data was dumped. Before using Data Domain Boost, make sure the one-time Data Domain Boost credential setup is complete. See "Backing Up and Restoring Databases" in the Greenplum Database Administrator Guide for details.
If you backed up Greenplum Database configuration files with the gpcrondump option -g and specified the --ddboost option, you must manually restore the backup from the Data Domain system. The configuration files must be restored for the Greenplum Database master and all the hosts and segments. The backup location on the Data Domain system is the directory GPDB/backup_directory/date. The backup_directory is set when you specify the Data Domain credentials with gpcrondump.
This option is not supported if --netbackup-service-host is specified.
-e (drop target database before restore)
Drops the target database before doing the restore and then recreates it.
-G [include | only] (restore global objects)
Restores database metadata information that is not associated with a specific schema or table, such as roles and tablespaces, if the global object dump file db_dumps/date/prefix_string_gp_global_1_1_timestamp is found in the master data directory. The global object file is created with the gpcrondump option -G.
  • The keyword include restores global objects in addition to performing a restore. This is the default if no keyword is specified.
  • The keyword only restores only global objects. No other database objects or database table data are restored.
The -m option restores metadata associated with schemas or tables.
-l logfile_directory
The directory to write the log file. Defaults to ~/gpAdminLogs.
--list-backup
Lists the set of full and incremental backup sets required to perform a restore based on the timestamp_key specified with the -t option and the location of the backup set.
This option is supported only if the timestamp_key is for an incremental backup.
-L (list tablenames in backup set)
When used with the -t option, lists the table names that exist in the named backup set and exits. Does not perform a restore.
-m (restore metadata only)

Restores database metadata information such schema and table definitions and information created by SET statements. This option does not restore database table data. All table and schema metadata is restored unless options are specified that include or exclude tables or schemas. If table or schema filters are specified, the utility restores the schema and table metadata only for the schemas and tables that are specified to be restored.

Database information that is not associated with a specific schema or table, such as roles and tablespaces, is not restored. You can specify the -G option with this option to restore global metadata that was backed up with the gpcrondump utility.

Database statistics are not restored. You can specify the --restore-stats option to restore statistics that were backed up with the gpcrondump utility.
Not supported with the --noplan or --noanalyze options.
--netbackup-block-size size
Specify the block size, in bytes, of data being transferred from the Symantec NetBackup server. The default is 512 bytes.
NetBackup options are not supported if DDBoost backup options are specified.
--netbackup-service-host netbackup_server
The NetBackup master server that Greenplum Database connects to when backing up to NetBackup. If you specify this option, you must specify the timestamp of the backup with the -t option.
This option is not supported with any of the these options: -R, -s, -b, -L, or --ddboost.
NetBackup options are not supported if DDBoost backup options are specified.
--noanalyze
The ANALYZE command is not run after a successful restore. The default is to run the ANALYZE command on restored tables. This option is useful if running ANALYZE on tables in your database requires a significant amount of time.
If this option is specified, you should run ANALYZE manually on restored tables. Failure to run ANALYZE following a restore might result in poor database performance.
Not supported with the -m option.
--noplan
Restores only the data backed up during the incremental backup specified by the timestamp_key. No other data from the complete backup set are restored. The full backup set containing the incremental backup must be available.
If the timestamp_key specified with the -t option does not reference an incremental backup, an error is returned.
Not supported with the -m option.
--prefix prefix_string
If you specified the gpcrondump option --prefix prefix_string to create the backup, you must specify this option with the prefix_string when restoring the backup.
If you created a full backup of a set of tables with gpcrondump and specified a prefix, you can use gpcrondump with the options --list-filter-tables and --prefix prefix_string to list the tables that were included or excluded for the backup.
-q (no screen output)
Run in quiet mode. Command output is not displayed on the screen, but is still written to the log file.
-R hostname:path_to_dumpset
Allows you to provide a hostname and full path to a set of dump files. The host does not have to be in the Greenplum Database array of hosts, but must be accessible from the Greenplum master.
--redirect database_name
Specify the name of the database where the data is restored. Specify this option to restore data to a database that is different than the database specified during back up. If database_name does not exist, it is created.
--report-status-dir report_directory
Specifies the absolute path to the directory on the each Greenplum Database host (master and segment hosts) where gpdbrestore writes report status files for a restore operation. If report_directory does not exist or is not writable, gpdbrestore returns an error and stops.
If this option is not specified and the -u option is specified, report status files are written to the location specified by the -u option if the -u location is writable. If the location specified by -u option is not writable, the report status files are written to segment data directories.
--restore-stats [include | only]
Specify this option to restore database statistics that were backed up with the gpcrondump utility option --dump-stats.
  • The keyword include restores the statistics that were backed up in addition to performing a restore. This is the default if no keyword is specified.
  • The keyword only restores only the statistics that were backed up. No other database objects or database data are restored.
If this option is specified with other options that include or exclude tables or schemas to restore, the utility restores statistics only for the tables specified to be restored.
If statistics would be restored for a table that does not exist in the database, the utility displays a warning. The statistics are not restored.
-s database_name
Looks for latest set of dump files for the given database name in the segment data directories db_dumps directory on the Greenplum Database array of hosts.
-t timestamp_key
The 14 digit timestamp key that uniquely identifies a backup set of data to restore. It is of the form YYYYMMDDHHMMSS. Looks for dump files matching this timestamp key in the segment data directories db_dumps directory on the Greenplum Database array of hosts.
-T schema.table_name
A comma-separated list of specific table names to restore. The named table(s) must exist in the backup set of the database being restored. Existing tables are not automatically truncated before data is restored from backup. If your intention is to replace existing data in the table from backup, truncate the table prior to running gpdbrestore -T.
To restore the tables for a specific schema, you can specify the schema name with the * wildcard character. For example -T mytest.* restores the tables for the schema mytest. System catalog schemas are not supported.
If you specify the --truncate option with the -t schema.* option, all existing tables within the database schema are truncated before tables are restored from the backup.
--table-file file_name
Specify a file file_name that contains a list of table names to restore. The file contains any number of table names, listed one per line. See the -T option for information about restoring specific tables.
--truncate
Truncate table data before restoring data to the table from the backup. If this option is not specified, existing table data is not removed before data is restored to the table.
This option is supported only when restoring a set of tables with the option -T or --table-file. If a table to be restored does not exist in the database, the table is restored and the utility returns a warning message stating that the table did not exist in the database.
This option is not supported with the -e option.
-u backup_directory
Specifies the absolute path to the directory containing the db_dumps directory on each host. If not specified, defaults to the data directory of each instance to be backed up. Specify this option if you specified a backup directory with the gpcrondump option -u when creating a backup set.
If backup_directory is not writable, backup operation report status files are written to segment data directories. You can specify a different location where report status files are written with the --report-status-dir option.
Note: This option is not supported if --ddboost is specified.
-v | --verbose
Specifies verbose mode.
--version (show utility version)
Displays the version of this utility.
-? (help)
Displays the online help.

Examples

Restore the sales database from the latest backup files generated by gpcrondump (assumes backup files are in the segment data directories in db_dumps):

gpdbrestore -s sales

Restore a database from backup files that reside on an archive host outside the Greenplum Database array (command issued on the Greenplum master host):

gpdbrestore -R archivehostname:/data_p1/db_dumps/20080214

Restore global objects only (roles and tablespaces):

gpdbrestore -G
Note: The -R option is not supported when restoring a backup set that includes incremental backups.

If you restore from a backup set that contains an incremental backup, all the files in the backup set must be available to gpdbrestore. For example, the following timestamp keys specify a backup set. 20120514054532 is the full backup and the others are incremental.

20120514054532 
20120714095512 
20120914081205 
20121114064330 
20130114051246

The following gbdbrestore command specifies the timestamp key 20121114064330. The incremental backup with the timestamps 20120714095512 and 20120914081205 and the full backup must be available to perform a restore.

gpdbrestore -t 20121114064330

The following gbdbrestore command uses the --noplan option to restore only the data that was backed up during the incremental backup with the timestamp key 20121114064330. Data in the previous incremental backups and the data in the full backup are not restored.

gpdbrestore -t 20121114064330 --noplan

This gpdbrestore command restores Greenplum Database data from the data managed by NetBackup master server nbu_server1. The option -t 20130530090000 specifies the timestamp generated by gpcrondump when the backup was created. The -e option specifies that the target database is dropped before it is restored.

gpdbrestore -t 20130530090000 -e --netbackup-service-host=nbu_server1

See Also

gpcrondump