Using the DD Boost Storage Plugin with gpbackup and gprestore

Using the DD Boost Storage Plugin with gpbackup and gprestore

Note: The DD Boost storage plugin is available only in the commercial release of Pivotal Greenplum Database.

Dell EMC Data Domain Boost (DD Boost) is Dell EMC software that can be used with the gpbackup and gprestore utilities to perform faster backups to the Dell EMC Data Domain storage appliance. You can also replicate a backup on a separate, remote Data Domain system for disaster recovery.

To use the DD Boost storage plugin application, you first create a configuration file to specify the location of the plugin, the DD Boost login, and the backup location. When you run gpbackup or gprestore, you specify the configuration file with the option --plugin-config. For information about the configuration file, see DD Boost Storage Plugin Configuration File Format.

If you perform a backup operation with the gpbackup option --plugin-config, you must also specify the --plugin-config option when you restore the backup with gprestore.

To replicate a backup set on a separate Data Domain system for disaster recovery, add the backup replication options to the configuration file. Set the replication option to on and add the options that the plugin uses to access the remote Data Domain system that stores the replicated backup. During the backup operation, the DD Boost storage plugin replicates the backup set on the remote Data Domain system with DD Boost managed file replication.

To restore data from a replicated backup, you can use gprestore with the DD Boost storage plugin and specify the location of the backup in the DD Boost configuration file.

DD Boost Storage Plugin Configuration File Format

The configuration file specifies the absolute path to the Greenplum Database DD Boost storage plugin executable, DD Boost connection credentials, and Data Domain location.

The DD Boost storage plugin configuration file uses the YAML 1.1 document format and implements its own schema for specifying the DD Boost information.

The configuration file must be a valid YAML document. The gpbackup and gprestore utilities process the configuration file document in order and use indentation (spaces) to determine the document hierarchy and the relationships of the sections to one another. The use of white space is significant. White space should not be used simply for formatting purposes, and tabs should not be used at all.

This is the structure of a DD Boost storage plugin configuration file.

executablepath: <absolute-path-to-gpbackup_ddboost_plugin>
options: 
  hostname: "<data-domain-host>"
  username: "<ddboost-ID>"
  password: "<ddboost-pwd>"
  storage_unit: "<data-domain-id>"
  directory: "<data-domain-dir>"
  replication: "on" | "off"
  remote_hostname: "<remote-dd-host>"
  remote_username: "<remote-ddboost-ID>"
  remote_password: "<remote-dd-pwd>"
  remote_storage_unit: "<remote-dd-ID>"
  remote_directory: "<remote-dd-dir>"
executablepath
Required. Absolute path to the plugin executable. For example, the Pivotal Greenplum Database installation location is $GPHOME/bin/gpbackup_ddboost_plugin. The plugin must be in the same location on every Greenplum Database host.
options
Required. Begins the DD Boost storage plugin options section.
hostname
Required. The IP address or hostname of the host. There is a 30-character limit.
username
Required. The Data Domain Boost user name. There is a 30-character limit.
password
Required. The passcode for the DD Boost user to access the Data Domain storage unit.
storage-unit
Required. A valid storage unit name for the Data Domain system that is used for backup and restore operations.
directory
Required. The location for the backup files, configuration files, and global objects on the Data Domain system. The location on the system is /<data-domain-dir> in the storage unit of the system.
During a backup operation, the plugin creates the directory location if it does not exist in the storage unit and stores the backup in this directory /<data-domain-dir>/YYYYMMDD/YYYYMMDDHHMMSS/.
replication
Optional. Enables or disables backup replication with DD Boost managed file replication when gpbackup performs a backup operation. Value is either on or off. Default value is off, backup replication is disabled. When the value is on, the DD Boost plugin replicates the backup on the Data Domain system that you specify with the remote_* options.
The replication option and remote_* options are ignored when performing a restore operation with gprestore. The remote_* options are ignored if replication is off.
remote_hostname
Required if replication is on. The IP address or hostname of the Data Domain system that is used for remote backup storage. There is a 30-character limit.
remote_username
Required if replication is on. The Data Domain Boost user name that accesses the remote Data Domain system. There is a 30-character limit.
remote_password
Required if replication is on. The passcode for the DD Boost user to access the Data Domain storage unit on the remote system.
remote_storage-unit
Required if replication is on. A valid storage unit name for the remote Data Domain system that is used for backup replication.
remote_directory
Required if replication is on. The location for the replicated backup files, configuration files, and global objects on the remote Data Domain system. The location on the system is /<remote-dd-dir> in the storage unit of the remote system.
During a backup operation, the plugin creates the directory location if it does not exist in the storage unit of the remote Data Domain system and stores the replicated backup in this directory /<remote-dd-dir>/YYYYMMDD/YYYYMMDDHHMMSS/.

Example

This is an example DD Boost storage plugin configuration file that is used in the next gpbackup example command. The name of the file is ddboost-test-config.yaml.

executablepath: $GPHOME/bin/gpbackup_ddboost_plugin
options: 
  hostname: "192.0.2.230"
  username: "test-ddb-user"
  password: "asdf1234asdf"
  storage_unit: "gpdb-backup"
  directory: "test/backup"
This gpbackup example backs up the database demo using the DD Boost storage plugin. The absolute path to the DD Boost storage plugin configuration file is /home/gpadmin/ddboost-test-config.yml.
gpbackup --dbname demo --single-data-file --plugin-config /home/gpadmin/ddboost-test-config.yaml

The DD Boost storage plugin writes the backup files to this directory of the Data Domain storage unit gpdb-backup.

/test/backup/YYYYMMDD/YYYYMMDDHHMMSS/
This is an example DD Boost storage plugin configuration file that enables replication.
executablepath: $GPHOME/bin/gpbackup_ddboost_plugin
options:
  hostname: "192.0.2.230"
  username: "test-ddb-user"
  password: "asdf1234asdf"
  storage_unit: "gpdb-backup"
  directory: "test/backup"
  replication: "on"
  remote_hostname: "192.0.3.20"
  remote_username: "test-dd-remote"
  remote_password: "qwer2345erty"
  remote_storage_unit: "gpdb-remote"
  remote_directory: "test/replication"
To restore from the replicated backup in the previous example, you can run gprestore with the DD Boost storage plugin and specify a configuration file with this information.
executablepath: $GPHOME/bin/gpbackup_ddboost_plugin
options:
  hostname: "192.0.3.20"
  remote_username: "test-dd-remote"
  remote_password: "qwer2345erty"
  storage_unit: "gpdb-remote"
  directory: "test/replication"

Notes

Dell EMC DD Boost is integrated with Pivotal Greenplum Database and requires a DD Boost license. Open source Greenplum Database cannot use the DD Boost software, but can back up to a Dell EMC Data Domain system mounted as an NFS share on the Greenplum master and segment hosts.

The DD Boost storage plugin application must be in the same location on every Greenplum Database host. The configuration file is required only on the master host.

When you perform a backup with the DD Boost storage plugin, the plugin stores the backup files in this location in the Data Domain storage unit.

<directory>/backups/<datestamp>/<timestamp>

Where <directory> is the location you specified in the DD Boost configuration file, and <datestamp> and <timestamp> are the backup date and time stamps.

When performing a backup operation with replication, the Data Domain system where the backup is stored must have access to the remote Data Domain system where the replicated backup is stored.

Performing a backup operation with replication increases the time required to perform a backup. The backup set is copied to the local Data Domain system, and then replicated on the remote Data Domain system using DD Boost managed file replication. The backup operation completes after the backup set is replicated on the remote system.