gpbackup

gpbackup

Create a Greenplum Database backup for use with the gprestore utility.

Synopsis

gpbackup --dbname database_name
   [--backup-dir directory]
   [--compression-level level]
   [--data-only]
   [--debug]
   [--exclude-schema schema_name]
   [--exclude-table schema.table]
   [--exclude-table-file file_name]
   [--include-schema schema_name]
   [--include-table schema.table]
   [--include-table-file file_name]
   [--incremental [--from-timestamp backup-timestamp]]
   [--jobs int]
   [--leaf-partition-data]
   [--metadata-only]
   [--no-compression]
   [--plugin-config config_file_location]
   [--quiet]
   [--single-data-file]
   [--verbose]
   [--version]
   [--with-stats]

gpbackup --help 

Description

The gpbackup utility backs up the contents of a database into a collection of metadata files and data files that can be used to restore the database at a later time using gprestore. When you back up a database, you can specify table level and schema level filter options to back up specific tables. For example, you can combine schema level and table level options to back up all the tables in a schema except for a single table.

By default, gpbackup backs up objects in the specified database as well as global Greenplum Database system objects. You can optionally supply the --with-globals option with gprestore to restore global objects. See Objects Included in a Backup or Restore for additional information.

gpbackup stores the object metadata files and DDL files for a backup in the Greenplum Database master data directory by default. Greenplum Database segments use the COPY ... ON SEGMENT command to store their data for backed-up tables in compressed CSV data files, located in each segment's data directory. See Understanding Backup Files for additional information.

You can add the --backup-dir option to copy all backup files from the Greenplum Database master and segment hosts to an absolute path for later use. Additional options are provided to filter the backup set in order to include or exclude specific tables.

You can create an incremental backup with the --incremental option. Incremental backups are efficient when the total amount of data in append-optimized tables or table partitions that changed is small compared to the data that has not changed. See Creating Incremental Backups with gpbackup and gprestore for information about incremental backups.

With the default --jobs option (1 job), each gpbackup operation uses a single transaction on the Greenplum Database master host. The COPY ... ON SEGMENT command performs the backup task in parallel on each segment host. The backup process acquires an ACCESS SHARE lock on each table that is backed up. During the table locking process, the database should be in a quiescent state.

When a back up operation completes, gpbackup returns a status code. See Return Codes.

The gpbackup utility cannot be run while gpexpand is initializing new segments. Backups created before the expansion cannot be restored with gprestore after the cluster expansion is completed.

gpbackup can send status email notifications after a back up operation completes. You specify when the utility sends the mail and the email recipients in a configuration file. See Configuring Email Notifications.

Note: This utility uses secure shell (SSH) connections between systems to perform its tasks. In large Greenplum Database deployments, cloud deployments, or deployments with a large number of segments per host, this utility may exceed the host's maximum threshold for unauthenticated connections. Consider updating the SSH MaxStartups configuration parameter to increase this threshold. For more information about SSH configuration options, refer to the SSH documentation for your Linux distribution.

Options

--dbname database_name
Required. Specifies the database to back up.
--backup-dir directory
Optional. Copies all required backup files (metadata files and data files) to the specified directory. You must specify directory as an absolute path (not relative). If you do not supply this option, metadata files are created on the Greenplum Database master host in the $MASTER_DATA_DIRECTORY/backups/YYYYMMDD/YYYYMMDDhhmmss/ directory. Segment hosts create CSV data files in the <seg_dir>/backups/YYYYMMDD/YYYYMMDDhhmmss/ directory. When you specify a custom backup directory, files are copied to these paths in subdirectories of the backup directory.
You cannot combine this option with the option --plugin-config.
--compression-level level
Optional. Specifies the gzip compression level (from 1 to 9) used to compress data files. The default is 1. Note that gpbackup uses compression by default.
--data-only
Optional. Backs up only the table data into CSV files, but does not backup metadata files needed to recreate the tables and other database objects.
--debug
Optional. Displays verbose debug messages during operation.
--exclude-schema schema_name
Optional. Specifies a database schema to exclude from the backup. You can specify this option multiple times to exclude multiple schemas. You cannot combine this option with the option --include-schema, or a table filtering option such as --include-table. See Filtering the Contents of a Backup or Restore for more information.
--exclude-table schema.table
Optional. Specifies a table to exclude from the backup. The table must be in the format <schema-name>.<table-name>. If a table or schema name uses any character other than a lowercase letter, number, or an underscore character, then you must include that name in double quotes. You can specify this option multiple times. You cannot combine this option with the option --exclude-schema, or another a table filtering option such as --include-table.
You cannot use this option in combination with --leaf-partition-data. Although you can specify leaf partition names, gpbackup ignores the partition names.
See Filtering the Contents of a Backup or Restore for more information.
--exclude-table-file file_name
Optional. Specifies a text file containing a list of tables to exclude from the backup. Each line in the text file must define a single table using the format <schema-name>.<table-name>. The file must not include trailing lines. If a table or schema name uses any character other than a lowercase letter, number, or an underscore character, then you must include that name in double quotes. You cannot combine this option with the option --exclude-schema, or another a table filtering option such as --include-table.
You cannot use this option in combination with --leaf-partition-data. Although you can specify leaf partition names in a file specified with --exclude-table-file, gpbackup ignores the partition names.
See Filtering the Contents of a Backup or Restore for more information.
--include-schema schema_name
Optional. Specifies a database schema to include in the backup. You can specify this option multiple times to include multiple schemas. If you specify this option, any schemas that are not included in subsequent --include-schema options are omitted from the backup set. You cannot combine this option with the options --exclude-schema, --include-table, or --include-table-file. See Filtering the Contents of a Backup or Restore for more information.
--include-table schema.table
Optional. Specifies a table to include in the backup. The table must be in the format <schema-name>.<table-name>. If a table or schema name uses any character other than a lowercase letter, number, or an underscore character, then you must include that name in single quotes. See Schema and Table Names for information about characters that are supported in schema and table names.
You can specify this option multiple times. You cannot combine this option with a schema filtering option such as --include-schema, or another table filtering option such as --exclude-table-file.
You can also specify the qualified name of a sequence or a view.
If you specify this option, the utility does not automatically back up dependent objects. You must also explicitly specify dependent objects that are required. For example if you back up a view, you must also back up the tables that the view uses. If you back up a table that uses a sequence, you must also back up the sequence.
You can optionally specify a table leaf partition name in place of the table name, to include only specific leaf partitions in a backup with the --leaf-partition-data option. When a leaf partition is backed up, the leaf partition data is backed up along with the metadata for the partitioned table.
See Filtering the Contents of a Backup or Restore for more information.
--include-table-file file_name
Optional. Specifies a text file containing a list of tables to include in the backup. Each line in the text file must define a single table using the format <schema-name>.<table-name>. The file must not include trailing lines. See Schema and Table Names for information about characters that are supported in schema and table names.
Any tables not listed in this file are omitted from the backup set. You cannot combine this option with a schema filtering option such as --include-schema, or another table filtering option such as --exclude-table-file.
You can also specify the qualified name of a sequence or a view.
If you specify this option, the utility does not automatically back up dependent objects. You must also explicitly specify dependent objects that are required. For example if you back up a view, you must also specify the tables that the view uses. If you specify a table that uses a sequence, you must also specify the sequence.
You can optionally specify a table leaf partition name in place of the table name, to include only specific leaf partitions in a backup with the --leaf-partition-data option. When a leaf partition is backed up, the leaf partition data is backed up along with the metadata for the partitioned table.
See Filtering the Contents of a Backup or Restore for more information.
--incremental
Specify this option to add an incremental backup to an incremental backup set. A backup set is a full backup and one or more incremental backups. The backups in the set must be created with a consistent set of backup options to ensure that the backup set can be used in a restore operation.
By default, gpbackup attempts to find the most recent existing backup with a consistent set of options. If the backup is a full backup, the utility creates a backup set. If the backup is an incremental backup, the utility adds the backup to the existing backup set. The incremental backup is added as the latest backup in the backup set. You can specify --from-timestamp to override the default behavior.
--from-timestamp backup-timestamp
Optional. Specifies the timestamp of a backup. The specified backup must have backup options that are consistent with the incremental backup that is being created. If the specified backup is a full backup, the utility creates a backup set. If the specified backup is an incremental backup, the utility adds the incremental backup to the existing backup set.
You must specify --leaf-partition-data with this option. You cannot combine this option with --data-only or --metadata-only.
A backup is not created and the utility returns an error if the backup cannot add the backup to an existing incremental backup set or cannot use the backup to create a backup set.
For information about creating and using incremental backups, see Creating Incremental Backups with gpbackup and gprestore.
--jobs int
Optional. Specifies the number of jobs to run in parallel when backing up tables. By default, gpbackup uses 1 job (database connection). Increasing this number can improve the speed of backing up data. When running multiple jobs, each job backs up tables in a separate transaction. For example, if you specify --jobs 2, the utility creates two processes, each process starts a single transaction, and the utility backs up the tables in parallel using the two processes.
Important: If you specify a value higher than 1, the database must be in a quiescent state while the utility acquires a lock on the tables that are being backed up. If database operations are being performed on tables that are being backed up during the table locking process, consistency between tables that are backed up in different transactions cannot be guaranteed.
You cannot use this option in combination with the options --metadata-only, --single-data-file, or --plugin-config.
--leaf-partition-data
Optional. For partitioned tables, creates one data file per leaf partition instead of one data file for the entire table (the default). Using this option also enables you to specify individual leaf partitions to include in a backup, with the --include-table-file option. You cannot use this option in combination with --exclude-table-file or --exclude-table.
--metadata-only
Optional. Creates only the metadata files (DDL) needed to recreate the database objects, but does not back up the actual table data.
--no-compression
Optional. Do not compress the table data CSV files.
--plugin-config config-file_location
Specifies the location of a storage plugin configuration file, a YAML-formatted text file. The file contains configuration information for th plugin application that gpbackup is to use during the backup operation.
If you specify the --plugin-config option when you back up a database, you must also specify this option when you restore the database from the backup.
You cannot combine this option with the option --backup-dir.
See Using Backup Storage Plugins for information about available backup storage plugins and plugin configuration file specifications.
--quiet
Optional. Suppress all non-warning, non-error log messages.
--single-data-file
Optional. Create a single data file on each segment host for all tables backed up on that segment. By default, each gpbackup creates one compressed CSV file for each table that is backed up on the segment.
Note: If you use the --single-data-file option to combine table backups into a single file per segment, you cannot set the gprestore option --jobs to a value higher than 1 to perform a parallel restore operation.
--verbose
Optional. Print verbose log messages.
--version
Optional. Print the version number and exit.
--with-stats
Optional. Include query plan statistics in the backup set.
--help
Displays the online help.

Return Codes

One of these codes is returned after gpbackup completes.
  • 0 – Backup completed with no problems.
  • 1 – Backup completed with non-fatal errors. See log file for more information.
  • 2 – Backup failed with a fatal error. See log file for more information.

Schema and Table Names

When specifying the table filtering option --include-table or --include-table-file to list tables to be backed up, the gpbackup utility supports backing up schemas or tables when the name contains upper-case characters or these special characters.

~ # $ % ^ & * ( ) _ - + [ ] { } > < \ | ; : / ? ! ,

If a name contains an upper-case or special character and is specified on the command line with --include-table, the name must be enclosed in single quotes.
gpbackup --dbname test --include-table 'my#1schema'.'my_$42_Table'
When the table is listed in a file for use with --include-table-file, single quotes are not required. For example, this is the contents of a text file that is used with --include-table-file to back up two tables.
my#1schema.my_$42_Table
my#1schema.my_$590_Table
Note: The --include-table and --include-table-file options do not support schema or table names that contain the character double quote ("), period (.) , newline (\n), or space ( ).

Using Backup Storage Plugins

You can configure the Greenplum Database gpbackup and gprestore utilities to use a storage plugin to process backup files during a backup or restore operation. For example, during a backup operation, the plugin sends the backup files to a remote location. During a restore operation, the plugin retrieves the files from the remote location.

Greenplum Database includes a plugin to store and retrieve backups to Amazon Simple Storage Service (Amazon S3), or to S3-compatible services such as Dell EMC Elastic Cloud Storage (ECS) and Minio.

The commercial release of Pivotal Greenplum Database includes a DD Boost storage plugin to back up to a Dell EMC Data Domain storage appliance. You can also replicate a backup on a separate, remote Data Domain system for disaster recovery.

You can create a custom plugin to save backups to other storage services using the Backup/Restore Storage Plugin API. See Backup/Restore Storage Plugin API (Beta) for information about developing a custom storage plugin.

To use a storage plugin application, you specify the location of the plugin and the login and backup location in a configuration file. When you run gpbackup or gprestore, you specify the configuration file with the option --plugin-config.

If you perform a backup operation with the gpbackup option --plugin-config, you must also specify the --plugin-config option when you restore the backup with gprestore.

Plugin configuration files use the YAML 1.1 document format and implement their own schema for specifying the location of the Greenplum Database storage plugin, connection credentials, storage locations, and other required parameters.

The configuration file must be a valid YAML document. The gpbackup and gprestore utilities process the control file document in order and use indentation (spaces) to determine the document hierarchy and the relationships of the sections to one another. The use of white space is significant. White space should not be used simply for formatting purposes, and tabs should not be used at all.

S3 Storage Plugin Configuration File Format

This is the structure of a S3 storage plugin configuration file.

executablepath: <absolute-path-to-gpbackup_s3_plugin> 
options: 
  region: <aws-region> 
  endpoint: <S3-endpoint> 
  aws_access_key_id: <aws-user-id> 
  aws_secret_access_key: <aws-user-id-key> 
  bucket: <s3-bucket> 
  folder: <s3-location>
  encryption: [on|off]
executablepath
Required. Absolute path to the plugin executable. For example, the Pivotal Greenplum Database installation location is $GPHOME/bin/gpbackup_s3_plugin. The plugin must be in the same location on every Greenplum Database host.
options
Required. Begins the S3 storage plugin options section.
region
Required for AWS S3. If connecting to an S3 compatible service, this option is not required.
endpoint
Required for an S3 compatible service. Specify this option to connect to an S3 compatible service such as ECS. The plugin connects to the specified S3 endpoint (hostname or IP address) to access the S3 compatible data store.
If this option is specified, the plugin ignores the region option and does not use AWS to resolve the endpoint. When this option is not specified, the plugin uses the region to determine AWS S3 endpoint.
aws_access_key_id
Optional. The S3 ID to access the S3 bucket location that stores backup files.
If this parameter is not specified, S3 authentication information from the session environment is used. See S3 Storage Plugin Notes.
aws_secret_access_key
Required only if you specify aws_access_key_id. The S3 passcode for the S3 ID to access the S3 bucket location.
bucket
Required. The name of the S3 bucket in the AWS region or S3 compatible data store. The bucket must exist.
folder
Required. The S3 location for backups. During a backup operation, the plugin creates the S3 location if it does not exist in the S3 bucket.
encryption
Optional. Enable or disable use of Secure Sockets Layer (SSL) when connecting to an S3 location. Default value is on, use connections that are secured with SSL. Set this option to off to connect to an S3 compatible service that is not configured to use SSL.
Any value other than off is treated as on.

S3 Storage Plugin Notes

The S3 storage plugin application must be in the same location on every Greenplum Database host. The configuration file is required only on the master host.

When you perform a backup with the S3 storage plugin, the plugin stores the backup files in this location in the S3 bucket.

<folder>/backups/<datestamp>/<timestamp>

Where folder is the location you specified in the S3 configuration file, and datestamp and timestamp are the backup date and time stamps.

Using Amazon S3 to back up and restore data requires an Amazon AWS account with access to the Amazon S3 bucket. These are the Amazon S3 bucket permissions required for backing up and restoring data.
  • Upload/Delete for the S3 user ID that uploads the files
  • Open/Download and View for the S3 user ID that accesses the files

If aws_access_key_id and aws_secret_access_key are not specified in the configuration file, the S3 plugin uses S3 authentication information from the system environment of the session running the backup operation. The S3 plugin searches for the information in these sources, using the first available source.

  1. The environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
  2. The authentication information set with the AWS CLI command aws configure.
  3. The credentials of the Amazon EC2 IAM role if the backup is run from an EC2 instance.
For information about Amazon S3, see Amazon S3.

DD Boost Storage Plugin Configuration File Format

This is the structure of a DD Boost storage plugin configuration file.

executablepath: <absolute-path-to-gpbackup_ddboost_plugin>
options: 
  hostname: "<data-domain-host>"
  username: "<ddboost-ID>"
  password_encryption: "on" | "off"
  password: "<ddboost-pwd>"
  storage_unit: "<data-domain-id>"
  directory: "<data-domain-dir>"
  replication: "on" | "off"
  remote_hostname: "<remote-dd-host>"
  remote_username: "<remote-ddboost-ID>"
  remote_password_encryption: "on" | "off"
  remote_password: "<remote-dd-pwd>"
  remote_storage_unit: "<remote-dd-ID>"
  remote_directory: "<remote-dd-dir>"
executablepath
Required. Absolute path to the plugin executable. For example, the Pivotal Greenplum Database installation location is $GPHOME/bin/gpbackup_ddboost_plugin. The plugin must be in the same location on every Greenplum Database host.
options
Required. Begins the DD Boost storage plugin options section.
hostname
Required. The IP address or hostname of the host. There is a 30-character limit.
username
Required. The Data Domain Boost user name. There is a 30-character limit.
password_encryption
Optional. Specifies whether the password option value is encrypted. Default value is off. Use the gpbackup_manager encrypt-password command to encrypt the plain-text password for the DD Boost user. If the replication option is on, gpbackup_manager also encrypts the remote Data Domain user's password. Copy the encrypted password(s) from the gpbackup_manager output to the password options in the configuration file.
password
Required. The password for the DD Boost user to access the Data Domain storage unit. If the password_encryption option is on, this is an encrypted password.
storage-unit
Required. A valid storage unit name for the Data Domain system that is used for backup and restore operations.
directory
Required. The location for the backup files, configuration files, and global objects on the Data Domain system. The location on the system is /<data-domain-dir> in the storage unit of the system.
During a backup operation, the plugin creates the directory location if it does not exist in the storage unit and stores the backup in this directory /<data-domain-dir>/YYYYMMDD/YYYYMMDDHHMMSS/.
replication
Optional. Enables or disables backup replication with DD Boost managed file replication when gpbackup performs a backup operation. Value is either on or off. Default value is off, backup replication is disabled. When the value is on, the DD Boost plugin replicates the backup on the Data Domain system that you specify with the remote_* options.
The replication option and remote_* options are ignored when performing a restore operation with gprestore. The remote_* options are ignored if replication is off.
remote_hostname
Required if replication is on. The IP address or hostname of the Data Domain system that is used for remote backup storage. There is a 30-character limit.
remote_username
Required if replication is on. The Data Domain Boost user name that accesses the remote Data Domain system. There is a 30-character limit.
remote_password_encryption
Optional. Specifies whether the remote_password option value is encrypted. The default value is off. To set up password encryption use the gpbackup_manager encrypt-password command to encrypt the plain-text passwords for the DD Boost user. If the replication parameter is on, gpbackup_manager also encrypts the remote Data Domain user's password. Copy the encrypted passwords from the gpbackup_manager output to the password options in the configuration file.
remote_password
Required if replication is on. The password for the DD Boost user to access the Data Domain storage unit on the remote system. If the remote_password_encryption option is on, this is an encrypted password.
remote_storage-unit
Required if replication is on. A valid storage unit name for the remote Data Domain system that is used for backup replication.
remote_directory
Required if replication is on. The location for the replicated backup files, configuration files, and global objects on the remote Data Domain system. The location on the system is /<remote-dd-dir> in the storage unit of the remote system.
During a backup operation, the plugin creates the directory location if it does not exist in the storage unit of the remote Data Domain system and stores the replicated backup in this directory /<remote-dd-dir>/YYYYMMDD/YYYYMMDDHHMMSS/.

DD Boost Storage Plugin Notes

Dell EMC DD Boost is integrated with Pivotal Greenplum Database and requires a DD Boost license. Open source Greenplum Database cannot use the DD Boost software, but can back up to a Dell EMC Data Domain system mounted as an NFS share on the Greenplum master and segment hosts.

The DD Boost storage plugin application must be in the same location on every Greenplum Database host. The configuration file is required only on the master host.

When you perform a backup with the DD Boost storage plugin, the plugin stores the backup files in this location in the Data Domain storage unit.

<directory>/backups/<datestamp>/<timestamp>

Where <directory> is the location you specified in the DD Boost configuration file, and <datestamp> and <timestamp> are the backup date and time stamps.

When performing a backup operation with replication, the Data Domain system where the backup is stored must have access to the remote Data Domain system where the replicated backup is stored.

Performing a backup operation with replication increases the time required to perform a backup. The backup set is copied to the local Data Domain system, and then replicated on the remote Data Domain system using DD Boost managed file replication. The backup operation completes after the backup set is replicated on the remote system.

Examples

Backup all schemas and tables in the "demo" database, including global Greenplum Database system objects statistics:
$ gpbackup --dbname demo
Backup all schemas and tables in the "demo" database except for the "twitter" schema:
$ gpbackup --dbname demo --exclude-schema twitter
Backup only the "twitter" schema in the "demo" database:
$ gpbackup --dbname demo --include-schema twitter
Backup all schemas and tables in the "demo" database, including global Greenplum Database system objects and query statistics, and copy all backup files to the /home/gpadmin/backup directory:
$ gpbackup --dbname demo --with-stats --backup-dir /home/gpadmin/backup

This example uses --include-schema with --exclude-table to back up a schema except for a single table.

$ gpbackup --dbname demo --include-schema mydata --exclude-table mydata.addresses

You cannot use the option --exclude-schema with a table filtering option such as --include-table.

This is an example S3 storage plugin configuration file that is used in the next gpbackup example command. The name of the file is s3-test-config.yaml.

executablepath: $GPHOME/bin/gpbackup_s3_plugin
options: 
  region: us-west-2
  aws_access_key_id: test-s3-user
  aws_secret_access_key: asdf1234asdf
  bucket: gpdb-backup
  folder: test/backup3
This gpbackup example backs up the database demo using the S3 storage plugin. The absolute path to the S3 storage plugin configuration file is /home/gpadmin/s3-test.
gpbackup --dbname demo --plugin-config /home/gpadmin/s3-test-config.yaml

The S3 storage plugin writes the backup files to this S3 location in the AWS region us-west-2.

gpdb-backup/test/backup3/backups/YYYYMMDD/YYYYMMDDHHMMSS/

This is an example DD Boost storage plugin configuration file that is used in the next gpbackup example command. The name of the file is ddboost-test-config.yaml.

executablepath: $GPHOME/bin/gpbackup_ddboost_plugin
options: 
  hostname: "192.0.2.230"
  username: "test-ddb-user"
  password: "asdf1234asdf"
  storage_unit: "gpdb-backup"
  directory: "test/backup"
This gpbackup example backs up the database demo using the DD Boost storage plugin. The absolute path to the DD Boost storage plugin configuration file is /home/gpadmin/ddboost-test-config.yml.
gpbackup --dbname demo --single-data-file --plugin-config /home/gpadmin/ddboost-test-config.yaml

The DD Boost storage plugin writes the backup files to this directory of the Data Domain storage unit gpdb-backup.

/test/backup/YYYYMMDD/YYYYMMDDHHMMSS/
This is an example DD Boost storage plugin configuration file that enables replication.
executablepath: $GPHOME/bin/gpbackup_ddboost_plugin
options:
  hostname: "192.0.2.230"
  username: "test-ddb-user"
  password: "asdf1234asdf"
  storage_unit: "gpdb-backup"
  directory: "test/backup"
  replication: "on"
  remote_hostname: "192.0.3.20"
  remote_username: "test-dd-remote"
  remote_password: "qwer2345erty"
  remote_storage_unit: "gpdb-remote"
  remote_directory: "test/replication"
To restore from the replicated backup in the previous example, you can run gprestore with the DD Boost storage plugin and specify a configuration file with this information.
executablepath: $GPHOME/bin/gpbackup_ddboost_plugin
options:
  hostname: "192.0.3.20"
  remote_username: "test-dd-remote"
  remote_password: "qwer2345erty"
  storage_unit: "gpdb-remote"
  directory: "test/replication"