Backing Up Databases with Data Domain Boost

Backing Up Databases with Data Domain Boost

EMC Data Domain Boost (DD Boost) is EMC software that can be used with the gpcrondump and gpdbrestore utilities to perform faster backups to the EMC Data Domain storage appliance. Data Domain performs deduplication on the data it stores, so after the initial backup operation, the appliance stores only pointers to data that is unchanged. This reduces the size of backups on disk. When DD Boost is used with gpchrondump, Greenplum Database participates in the deduplication process, reducing the volume of data sent over the network. When you restore files from the Data Domain system with Data Domain Boost, some files are copied to the master local disk and are restored from there, and others are restored directly.

With Data Domain Boost managed file replication, you can replicate Greenplum Database backup images that are stored on a Data Domain system for disaster recover purposes. The gpmfr utility manages the Greenplum Database backup sets that are on the primary and a remote Data Domain system. For information about gpmfr, see the Greenplum Database Utility Guide.

Managed file replication requires network configuration when a replication network is being used between two Data Domain systems:

  • The Greenplum Database system requires the Data Domain login credentials to be configured using gpcrondump. Credentials must be created for both the local and remote Data Domain systems.
  • When the non-management network interface is used for replication on the Data Domain systems, static routes must be configured on the systems to pass the replication data traffic to the correct interfaces.

Do not use Data Domain Boost with pg_dump or pg_dumpall.

Refer to Data Domain Boost documentation for detailed information.

Important: For incremental back up sets, a full backup and the associated incremental backups must be on a single device. For example, a backup set must all be on a file system. The backup set cannot have some backups on the local file system and others on a Data Domain system.
Note: You can use a Data Domain server as an NFS file system (without Data Domain Boost) to perform incremental backups.

Data Domain Boost Requirements

Using Data Domain Boost requires the following.

  • Data Domain Boost in included only with Pivotal Greenplum Database.
  • Purchase and install EMC Data Domain Boost and Replicator licenses on the Data Domain systems.
  • Obtain sizing recommendations for Data Domain Boost. Make sure the Data Domain system supports sufficient write and read streams for the number of segment hosts in your Greenplum cluster.

Contact your EMC Data Domain account representative for assistance.

One-Time Data Domain Boost Credential Setup

There is a one-time process to set up credentials to use Data Domain Boost. Credential setup connects one Greenplum Database instance to one Data Domain instance. If you are using the gpcrondump --replicate option or Data Domain Boost managed file replication capabilities for disaster recovery purposes, you must set up credentials for both the local and remote Data Domain systems.

To set up credentials, run gpcrondump with the following options:

--ddboost-host ddboost_hostname --ddboost-user ddboost_user
--ddboost-backupdir backup_directory

To remove credentials, run gpcrondump with the --ddboost-config-remove option.

To manage credentials for the remote Data Domain system that is used for backup replication, include the --ddboost-remote option with the other gpcrondump options. For example, the following options set up credentials for a Data Domain system that is used for backup replication. The system IP address is 192.0.2.230, the user ID is ddboostmyuser, and the location for the backups on the system is GPDB/gp_production:

--ddboost-host 192.0.2.230 --ddboost-user ddboostmyuser
--ddboost-backupdir gp_production --ddboost-remote

For details, see gpcrondump in the Greenplum Database Utility Guide.

If you use two or more network connections to connect to the Data Domain system, use gpcrondump to set up the login credentials for the Data Domain hostnames associated with the network interfaces. To perform this setup for two network connections, run gpcrondump with the following options:

--ddboost-host ddboost_hostname1
--ddboost-host ddboost_hostname2 --ddboost-user ddboost_user
--ddboost-backupdir backup_directory

Configuring Data Domain Boost for Greenplum Database

After you set up credentials for Data Domain Boost on the Greenplum Database, perform the following tasks in Data Domain to allow Data Domain Boost to work with Greenplum Database:

Configuring Distributed Segment Processing in Data Domain

Configure the distributed segment processing option on the Data Domain system. The configuration applies to all the DCA servers and the Data Domain Boost plug-in installed on them. This option is enabled by default, but verify that it is enabled before using Data Domain Boost backups:

# ddboost option show

To enable or disable distributed segment processing:

# ddboost option set distributed-segment-processing {enabled | disabled}

Configuring Advanced Load Balancing and Link Failover in Data Domain

If you have multiple network connections on a network subnet, you can create an interface group to provide load balancing and higher network throughput on your Data Domain system. When a Data Domain system on an interface group receives data from the media server clients, the data transfer is load balanced and distributed as separate jobs on the private network. You can achieve optimal throughput with multiple 1 GbE connections.

Note: To ensure that interface groups function properly, use interface groups only when using multiple network connections on the same networking subnet.

To create an interface group on the Data Domain system, create interfaces with the net command. If interfaces do not already exist, add the interfaces to the group, and register the Data Domain system with the backup application.

  1. Add the interfaces to the group:
    # ddboost ifgroup add interface 192.0.2.1
    # ddboost ifgroup add interface 192.0.2.2
    # ddboost ifgroup add interface 192.0.2.3
    # ddboost ifgroup add interface 192.0.2.4
    
    Note: You can create only one interface group and this group cannot be named.
  2. Select one interface on the Data Domain system to register with the backup application. Create a failover aggregated interface and register that interface with the backup application.
    Note: You do not have to register one of the ifgroup interfaces with the backup application. You can use an interface that is not part of the ifgroup to register with the backup application.
  3. Enable ddboost on the Data Domain system:
    # ddboost ifgroup enable
  4. Verify the Data Domain system configuration as follows:
    # ddboost ifgroup show config
    Results similar to the following are displayed.
    Interface
    -------------
    192.0.2.1
    192.0.2.2
    192.0.2.3
    192.0.2.4
    -------------
    

You can add or delete interfaces from the group at any time.

Note: Manage Advanced Load Balancing and Link Failover (an interface group) using the ddboost ifgroup command or from the Enterprise Manager Data Management > DD Boost view.

Export the Data Domain Path to the DCA Network

The commands and options in this topic apply to DDOS 5.0.x and 5.1.x. See the Data Domain documentation for details.

Use the following Data Domain commands to export the /backup/ost directory to the DCA for Data Domain Boost backups.
# nfs add /backup/ost 192.0.2.0/24, 198.51.100.0/24 (insecure)
Note: The IP addresses refer to the Greenplum system working with the Data Domain Boost system.

Create the Data Domain Login Credentials for the DCA

Create a username and password for the DCA to access the DD Boost Storage Unit (SU) at the time of backup and restore:

# user add user [password password] [priv {admin | security | user}]

Backup Options for Data Domain Boost

Specify the gpcrondump options to match the setup.

Data Domain Boost backs up files to the Data Domain system. Status and report files remain on the local disk.

To configure Data Domain Boost to remove old backup directories before starting a backup operation, specify a gpcrondump backup expiration option:

  • The -c option clears all backup directories.
  • The -o option clears the oldest backup directory.

To remove the oldest dump directory, specify gpcrondump --ddboost with the -o option. For example, if your retention period is 30 days, use gpcrondump --ddboost with the -o option on day 31.

Use gpcrondump --ddboost with the -c option to clear out all the old dump directories in db_dumps. The -c option deletes all dump directories that are at least one day old.

Using CRON to Schedule a Data Domain Boost Backup

  1. Ensure the One-Time Data Domain Boost Credential Setup is complete.
  2. Add the option --ddboost to the gpcrondump option:
    gpcrondump -x mydatabase -z -v --ddboost 
Important: Do not use compression with Data Domain Boost backups. The -z option turns backup compression off.

Some of the options available in gpcrondump have different implications when using Data Domain Boost. For details, see gpcrondump in the Greenplum Database Utility Reference.

Restoring From a Data Domain System with Data Domain Boost

  1. Ensure the One-Time Data Domain Boost Credential Setup is complete.
  2. Add the option --ddboost to the gpdbrestore command:
    $ gpdbrestore -t backup_timestamp -v -ddboost
Note: Some of the gpdbrestore options available have different implications when using Data Domain. For details, see gpdbrestore in the Greenplum Database Utility Reference.