Parallel Backup with gpbackup and gprestore
A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 4.x documentation.
Parallel Backup with gpbackup and gprestore
gpbackup and gprestore are new utilities that provide an improved way of creating and restoring backup sets for Greenplum Database. By default, gpbackup stores only the object metadata files and DDL files for a backup in the Greenplum Database master data directory. Greenplum Database segments use the COPY .. ON SEGMENT command to store their data for backed-up tables in compressed CSV data files, located in each segment's backups directory.
The backup metadata files contain all of the information that gprestore needs to restore a full backup set in parallel. Backup metadata also provides the framework for restoring only individual objects in the data set, along with any dependent objects, in future versions of gprestore. (See Understanding Backup Files for more information.) Storing the table data in CSV files also provides opportunities for using other restore utilities, such as gpload, to load the data either in the same cluster or another cluster. By default, one file is created for each table on the segment. You can specify the -leaf-partition-data option with gpbackup to create one data file per leaf partition of a partitioned table, instead of a single file. This option also enables you to filter backup sets by leaf partitions.
Each gpbackup task uses a single transaction in Greenplum Database. During this transaction, metadata is backed up on the master host, and data for each table on each segment host is written to CSV backup files using COPY .. ON SEGMENT commands in parallel. The backup process acquires an ACCESS SHARE lock on each table that is backed up.
Requirements and Limitations
You can use gpbackup and gprestore on Greenplum Database systems that support the COPY .. ON SEGMENT command (Greenplum Database 5.1.0 and later, or 4.3.17.0 and later).
- If you create an index on a parent partitioned table, gpbackup does not back up that same index on child partitioned tables of the parent, as creating the same index on a child would cause an error. However, if you exchange a partition, gpbackup does not detect that the index on the exchanged partition is inherited from the new parent table. In this case, gpbackup backs up conflicting CREATE INDEX statements, which causes an error when you restore the backup set.
- You can execute multiple instances of gpbackup, but each execution requires a distinct timestamp.
- Database object filtering is currently limited to schemas and tables.
- Filtering objects with gprestore is not yet supported.
- You cannot use the -exclude-table-file with -leaf-partition-data. Although you can specify leaf partition names in a file specified with -exclude-table-file, gpbackup ignores the partition names.
- Incremental backups are not supported.
Objects Included in a Backup or Restore
Database (for database specified with -dbname) | Global (requires the -globals option to restore) |
---|---|
|
|
See also Understanding Backup Files.
Performing Basic Backup and Restore Operations
$ gpbackup -dbname <database_name>
$ gpbackup -dbname demo 20171103:15:25:58 gpbackup:gpadmin:0ee2f5fb02c9:017527-[INFO]:-Starting backup of database demo 20171103:15:25:58 gpbackup:gpadmin:0ee2f5fb02c9:017527-[INFO]:-Backup Timestamp = 20171103152558 20171103:15:25:58 gpbackup:gpadmin:0ee2f5fb02c9:017527-[INFO]:-Backup Database = demo 20171103:15:25:58 gpbackup:gpadmin:0ee2f5fb02c9:017527-[INFO]:-Backup Type = Unfiltered Compressed Full Backup 20171103:15:25:58 gpbackup:gpadmin:0ee2f5fb02c9:017527-[INFO]:-Acquiring ACCESS SHARE locks on tables 20171103:15:25:58 gpbackup:gpadmin:0ee2f5fb02c9:017527-[INFO]:-Locks acquired 20171103:15:25:58 gpbackup:gpadmin:0ee2f5fb02c9:017527-[INFO]:-Gathering table metadata 20171103:15:25:58 gpbackup:gpadmin:0ee2f5fb02c9:017527-[INFO]:-Writing global database metadata to /gpmaster/gpsne-1/backups/20171103/20171103152558/gpbackup_20171103152558_global.sql 20171103:15:25:59 gpbackup:gpadmin:0ee2f5fb02c9:017527-[INFO]:-Global database metadata backup complete 20171103:15:25:59 gpbackup:gpadmin:0ee2f5fb02c9:017527-[INFO]:-Writing pre-data metadata to /gpmaster/gpsne-1/backups/20171103/20171103152558/gpbackup_20171103152558_predata.sql 20171103:15:25:59 gpbackup:gpadmin:0ee2f5fb02c9:017527-[INFO]:-Pre-data metadata backup complete 20171103:15:25:59 gpbackup:gpadmin:0ee2f5fb02c9:017527-[INFO]:-Writing post-data metadata to /gpmaster/gpsne-1/backups/20171103/20171103152558/gpbackup_20171103152558_postdata.sql 20171103:15:25:59 gpbackup:gpadmin:0ee2f5fb02c9:017527-[INFO]:-Post-data metadata backup complete 20171103:15:25:59 gpbackup:gpadmin:0ee2f5fb02c9:017527-[INFO]:-Writing data to file Tables backed up: 2 / 2 [==============================================================] 100.00% 0s 20171103:15:25:59 gpbackup:gpadmin:0ee2f5fb02c9:017527-[INFO]:-Data backup complete 20171103:15:25:59 gpbackup:gpadmin:0ee2f5fb02c9:017527-[WARNING]:-Found neither /usr/local/greenplum-db/./bin/mail_contacts nor /home/gpadmin/mail_contacts 20171103:15:25:59 gpbackup:gpadmin:0ee2f5fb02c9:017527-[WARNING]:-Unable to send backup email notification 20171103:15:26:00 gpbackup:gpadmin:0ee2f5fb02c9:017527-[INFO]:-Backup completed successfully
$ ls /gpmaster/gpsne-1/backups/20171103/20171103152558/ gpbackup_20171103152558_config.yaml gpbackup_20171103152558_postdata.sql gpbackup_20171103152558_report gpbackup_20171103152558_global.sql gpbackup_20171103152558_predata.sql gpbackup_20171103152558_toc.yaml
$ ls /gpdata1/gpsne0/backups/20171103/20171103152558/ gpbackup_0_20171103152558_16524.gz gpbackup_0_20171103152558_16543.gz
$ gpbackup -dbname demo -backupdir /home/gpadmin/backups 20171103:15:31:56 gpbackup:gpadmin:0ee2f5fb02c9:017586-[INFO]:-Starting backup of database demo ... 20171103:15:31:58 gpbackup:gpadmin:0ee2f5fb02c9:017586-[INFO]:-Backup completed successfully $ find /home/gpadmin/backups/ -type f /home/gpadmin/backups/gpseg0/backups/20171103/20171103153156/gpbackup_0_20171103153156_16543.gz /home/gpadmin/backups/gpseg0/backups/20171103/20171103153156/gpbackup_0_20171103153156_16524.gz /home/gpadmin/backups/gpseg1/backups/20171103/20171103153156/gpbackup_1_20171103153156_16543.gz /home/gpadmin/backups/gpseg1/backups/20171103/20171103153156/gpbackup_1_20171103153156_16524.gz /home/gpadmin/backups/gpseg-1/backups/20171103/20171103153156/gpbackup_20171103153156_config.yaml /home/gpadmin/backups/gpseg-1/backups/20171103/20171103153156/gpbackup_20171103153156_predata.sql /home/gpadmin/backups/gpseg-1/backups/20171103/20171103153156/gpbackup_20171103153156_global.sql /home/gpadmin/backups/gpseg-1/backups/20171103/20171103153156/gpbackup_20171103153156_postdata.sql /home/gpadmin/backups/gpseg-1/backups/20171103/20171103153156/gpbackup_20171103153156_report /home/gpadmin/backups/gpseg-1/backups/20171103/20171103153156/gpbackup_20171103153156_toc.yaml
Restoring from Backup
$ dropdb demo $ gprestore -timestamp 20171103152558 -createdb 20171103:15:45:30 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Restore Key = 20171103152558 20171103:15:45:31 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Creating database 20171103:15:45:44 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Database creation complete 20171103:15:45:44 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Restoring pre-data metadata from /gpmaster/gpsne-1/backups/20171103/20171103152558/gpbackup_20171103152558_predata.sql 20171103:15:45:45 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Pre-data metadata restore complete 20171103:15:45:45 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Restoring data 20171103:15:45:45 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Data restore complete 20171103:15:45:45 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Restoring post-data metadata from /gpmaster/gpsne-1/backups/20171103/20171103152558/gpbackup_20171103152558_postdata.sql 20171103:15:45:45 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Post-data metadata restore complete
$ dropdb demo $ gprestore -backupdir /home/gpadmin/backups/ -timestamp 20171103153156 -createdb 20171103:15:51:02 gprestore:gpadmin:0ee2f5fb02c9:017819-[INFO]:-Restore Key = 20171103153156 ... 20171103:15:51:17 gprestore:gpadmin:0ee2f5fb02c9:017819-[INFO]:-Post-data metadata restore complete
gprestore does not attempt to restore global metadata for the Greenplum System by default. If this is required, include the -globals argument.
$ gprestore -backupdir /home/gpadmin/backups/ -timestamp 20171103153156 -createdb -jobs 8
Test the number of parallel connections with your backup set to determine the ideal number for fast data recovery.
Filtering the Contents of a Backup
$ gpbackup -dbname demo -include-schema wikipedia $ gpbackup -dbname demo -exclude-schema twitter
$ gpbackup -dbname demo -include-schema wikipedia -include-schema twitter
wikipedia.articles twitter.message
beer."IPA" "Wine".riesling "Wine"."sauvignon blanc" water.tonic
$ gpbackup -dbname demo -include-table-file /home/gpadmin/table-list.txt
Filtering by Leaf Partition
demo=# CREATE TABLE sales (id int, date date, amt decimal(10,2)) DISTRIBUTED BY (id) PARTITION BY RANGE (date) ( PARTITION Jan17 START (date '2017-01-01') INCLUSIVE , PARTITION Feb17 START (date '2017-02-01') INCLUSIVE , PARTITION Mar17 START (date '2017-03-01') INCLUSIVE , PARTITION Apr17 START (date '2017-04-01') INCLUSIVE , PARTITION May17 START (date '2017-05-01') INCLUSIVE , PARTITION Jun17 START (date '2017-06-01') INCLUSIVE , PARTITION Jul17 START (date '2017-07-01') INCLUSIVE , PARTITION Aug17 START (date '2017-08-01') INCLUSIVE , PARTITION Sep17 START (date '2017-09-01') INCLUSIVE , PARTITION Oct17 START (date '2017-10-01') INCLUSIVE , PARTITION Nov17 START (date '2017-11-01') INCLUSIVE , PARTITION Dec17 START (date '2017-12-01') INCLUSIVE END (date '2018-01-01') EXCLUSIVE ); NOTICE: CREATE TABLE will create partition "sales_1_prt_jan17" for table "sales" NOTICE: CREATE TABLE will create partition "sales_1_prt_feb17" for table "sales" NOTICE: CREATE TABLE will create partition "sales_1_prt_mar17" for table "sales" NOTICE: CREATE TABLE will create partition "sales_1_prt_apr17" for table "sales" NOTICE: CREATE TABLE will create partition "sales_1_prt_may17" for table "sales" NOTICE: CREATE TABLE will create partition "sales_1_prt_jun17" for table "sales" NOTICE: CREATE TABLE will create partition "sales_1_prt_jul17" for table "sales" NOTICE: CREATE TABLE will create partition "sales_1_prt_aug17" for table "sales" NOTICE: CREATE TABLE will create partition "sales_1_prt_sep17" for table "sales" NOTICE: CREATE TABLE will create partition "sales_1_prt_oct17" for table "sales" NOTICE: CREATE TABLE will create partition "sales_1_prt_nov17" for table "sales" NOTICE: CREATE TABLE will create partition "sales_1_prt_dec17" for table "sales" CREATE TABLE
public.sales_1_prt_oct17 public.sales_1_prt_nov17 public.sales_1_prt_dec17
$ gpbackup -dbname demo -include-table-file last-quarter.txt -leaf-partition-data
Configuring Email Notifications
gpbackup will send out status email notifications after a back up operation completes, if you place a file named mail_contacts in the home directory of the Greenplum database superuser (gpadmin) or in the same directory as the gpbackup utility ($GPHOME/bin).
This file must contain one email address per line. gpbackup issues a warning if it cannot locate a mail_contacts file in either location. If both locations have a mail_contacts file, then the one in $HOME takes precedence.
Understanding Backup Files
A complete backup set for gpbackup includes multiple metadata files, supporting files, and CSV data files, each designated with the timestamp at which the backup was created.
File name | Description |
---|---|
gpbackup_<YYYYMMDDHHMMSS>_global.sql | Contains DDL for objects that are global to the Greenplum Cluster, and not
owned by a specific database within the cluster. These objects include:
Note that global metadata is not restored by default. You must include the -globals options to gprestore to restore global metadata. |
gpbackup_<YYYYMMDDHHMMSS>_predata.sql | Contains DDL for objects in the backed-up database (specified with
-dbname) that must be created prior to restoring the
actual data. These objects include:
|
gpbackup_<YYYYMMDDHHMMSS>_postdata.sql | Contains DDL for objects in the backed-up database (specified with
-dbname) that must be created after restoring the data.
These objects include:
|
gpbackup_<YYYYMMDDHHMMSS>_toc.yaml | Contains metadata for locating object DDL in the _predata.sql and _postdata.sql files. This file also contains the table names and OIDs used for locating the corresponding table data in CSV data files that are created on each segment. See Segment Data Files. |
gpbackup_<YYYYMMDDHHMMSS>_report | Contains information about the backup operation that is used to populate the
email notice (if configured) that is sent after the backup completes. This file
contains information such as:
|
gpbackup_<YYYYMMDDHHMMSS>_config.yaml | Contains metadata about the execution of the particular backup task,
including:
|
Segment Data Files
By default, each segment creates one compressed CSV file for each table that is backed up on the segment. The files are stored in <seg_dir>/backups/YYYYMMDD/YYYYMMDDHHMMSS/. If you specify a custom backup directory, segment data files are copied to this same file path as a subdirectory of the backup directory. If you include the -leaf-partition-data option, gpbackup creates one data file for each leaf partition of a partitioned table, instead of just one table for file.
- <content_id> is the content ID of the segment.
- <YYYYMMDDHHMMSS> is the timestamp of the gpbackup operation.
- <oid> is the object ID of the table. The metadata file gpbackup_<YYYYMMDDHHMMSS>_toc.yaml references this <oid> to locate the data for a specific table in a schema.