Migrating Data from Greenplum 5

Migrating Data from Greenplum 5

You can migrate data from Greenplum Database 5 to Greenplum Database 6 using standard backup and restore procedures (gpbackup and gprestore) or by using gpcopy.

Note: Upgrading a Pivotal Greenplum Database 5 system directly to Pivotal Greenplum Database 6 is not supported.
Follow these general guidelines to migrate your Greenplum 5 data:
  • If you have configured PXF in your Greenplum Database 5 installation, review Migrating PXF from Greenplum 5 to plan for the PXF migration.
  • Greenplum 6 removes the previously-deprecated gphdfs protocol. If you have created external tables that use gphdfs, remove the external table definitions and (optionally) recreate them to use Pivotal Extension Framework (PXF) before you migrate the data to Greenplum 6. Refer to Migrating gphdfs External Tables to PXF in the PXF documentation for the migration procedure.
  • Make sure that you have a complete backup of all data in the Greenplum 5 cluster, and that you can successfully restore the Greenplum 5 cluster if necessary.
  • Install and initialize a new Greenplum 6 cluster using the version 6 gpinitsystem utility.
    Note: Unless you modify file locations manually, gprestore only supports restoring data to a cluster that has the same number of hosts and the same number of segments per host, with each segment having the same content_id as the segment in the original cluster. Use gpcopy if you need to migrate data to a different-sized Greenplum 6 cluster.
  • If you intend to install Greenplum 6 on the same hardware as your Greenplum 5 system, you will need enough disk space to accommodate over 5 times the original data set (2 full copies of the primary and mirror data sets, plus the original backup data in ASCII format) in order to migrate data with gpbackup and gprestore. Keep in mind that the ASCII backup data will require more disk space than the original data, which may be stored in compressed binary format. Offline backup solutions such as Dell EMC Data Domain or Veritas NetBackup can reduce the required disk space on each host.

    If you attempt to migrate your data on the same hardware but run out of free space, gpcopy provides the --truncate-source-after option to truncate each source table after copying the table to the destination cluster and validating the copy succeeded. This reduces the amount of free space needed to migrate clusters that reside on the same hardware. See Migrating Data with gpcopy for more information.

  • Use the Greenplum 6 gprestore utility to load the Greenplum 5 backup data into the new cluster.
  • If you use gpcopy to migrate data, initiate the gpcopy operation from the Greenplum 5 cluster. See Migrating Data with gpcopy for more information.
  • After migrating data you may need to modify SQL scripts, administration scripts, and user-defined functions as necessary to account for changes in Greenplum 6. Look in the Pivotal Greenplum 6.0.0 Release Notes for features and changes that may necessitate post-migration tasks.