gpcopy

A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 5.x documentation.

gpcopy

The gpcopy utility copies objects from databases in a source Greenplum Database system to databases in a destination Greenplum Database system.

Note: The gpcopy utility is available only in the commercial release of Pivotal Greenplum Database.

Synopsis

gpcopy
   { --full |
     { --dbname database1[, database2 ... ] 
        [ --dest-dbname dest-db1[, dest-db2 ... ] ] } |
     --include-table db.schema.table[, db.schema1.table1 ... ] 
        [ --dest-table db.schema.table[, db.schema1.table1 ... ] |
     --include-table-file table-file1
        [ --include-table-file table-file2] ... ] | 
     --include-table-json json-table-file1
        [ --include-table-json json-table-file2] ... ] }
   [ --metadata-only ]
   [ --exclude-table db.schema.table[, db.schema1.table1 ... ] ]
   [ --exclude-table-file table-file1 ]
      [ --exclude-table-file table-file2 ] ... ] ]

   { --dest-host dest_host [ --dest-port dest_port ]
      [ --dest-user dest_user ] }
   [ --source-host source_host [ --source-port source_port ]
   [ --source-user source_user ] ]
   [ --jobs int ]
   [ --on-segment-threshold int ]
   [ --parallelize-leaf-partitions ] 
   [ --data-port-range lower_port-upper_port ]

   { --skip-existing | --truncate | --drop | --append }
   [ --analyze ]
   [ --no-compression ]
   [ --no-distribution-check ]
   [ --truncate-source-after [--yes ] ]
   [ --validate type ]

   [ --dry-run ]
   [ --quiet | --debug ]

gpcopy --version
gpcopy --help

Description

The gpcopy utility copies database objects from a source Greenplum Database system to a destination system. You can perform one of the following types of copy operations:

  • Copy a Greenplum Database system with the --full option. This option copies all database objects including, tables, table data, indexes, views, users, roles, functions, and resource queues for all user-defined databases to a different destination system.
  • Copy a set of user-defined database tables to a destination system.
    • The --dbname option copies all user-defined tables, table data, and re-creates the table indexes from specified databases.
    • The --include-table --include-table-file, or --include-table-json option copies a specified set of user-defined tables, table data, and re-creates the table indexes.
    • The --exclude-table and --exclude-table-file options exclude a specified set of user-defined tables and table data to be copied.
  • Copy only the database schemas with the --metadata-only option.
When running gpcopy, you must specify the data to copy from the source database and how to manage data in the destination database.
  • You must use one and only one of these options to specify the data to be copied from the source database: --full, --dbname, --include-table, --include-table-file, or --include-table-json.
  • You must use one of these options to specify how to manage data in the destination database: --skip-existing, --truncate, --drop, or --append.

    If you specify both the --append and --validation options, validation of source table data fails if a destination table contains data.

If you specify the option --truncate-source-after, you must also specify the --validate option. When --truncate-source-after is specified, gpcopy truncates the source table after the table data is copied and destination table data has been validated.

Prerequisites

The user IDs connecting to the source and destination Greenplum Database systems, the --source-user and --dest-user, must have appropriate access to the systems.

If Kerberos authentication is enabled for Greenplum Database, gpcopy can authenticate with Kerberos. Run the kinit command to obtain a ticket-granting ticket from the KDC server before you execute gpcopy. The PGKRBSRVNAME environment variable specifies the Kerberos service name for Greenplum Database. If your Greenplum Database service name is different than the default (postgres) set the PGKRBSRVNAME environment variable with the correct service name before you run gpcopy. See Using Kerberos Authentication for information about enabling Kerberos authentication with Greenplum Database.

The source and destination Greenplum Database segment hosts need to be able to communicate with each other. To ensure that the segment hosts can communicate, you can use a tool such as the Linux netperf utility.

When the --full option is specified, resource groups and table spaces are copied, however, the utility does not configure the destination system. You must configure the system to use resource groups and create filespaces for the tablespaces.

Options for Choosing Data to Copy

gpcopy provides a range of options to define the scope of data that is copied. You can choose options to perform a full Greenplum cluster migration, copy specific databases or tables, or only portions of a table using a SQL query. Additional options enable you to exclude certain tables from being copied, or to change the destination database into which a table's data is copied. The special --metadata-only instructs gpcopy to create the necessary schema for the selected source tables, but copy no table data.

You must use at least one of the options --full, --dbname, --include-table, --include-table-file, or --include-table-json. Use other options as needed to exclude data from the copy or to change the destination database for copied tables.

--full
This option performs a migration of a Greenplum Database source system to a destination system.
A migration copies all database objects including, tables, indexes, views, roles, functions, user-defined types (UDT), resource queues, and resource groups for all user-defined databases. The default databases, postgres, template0, and template1, are not copied.
This option cannot be specified with the --dbname, --include-table, --include-table-file, or --include-table-json options.
--dbname database
A source database to copy. To copy multiple databases to the destination system specify a comma-separated list of databases with no spaces between the names. All the user-defined tables and table data are copied to the destination system.
If the source database does not exist, gpcopy returns an error and quits. If a destination database does not exist a database is created.
Not valid with the --full, --include-table, --include-table-file, or --include-table-json options.
Alternatively, you can copy a set of tables with the --include-table, --include-table-file, or --include-table-json option.
--dest-dbname database
To copy a database to a different destination database, specify the name of the destination database. For multiple databases, specify a comma-separated list of databases with no spaces between the names. The number of database names must match the number of names specified in the --dbname option. The utility copies the source databases to the destination databases in the listed order. In this example, db1 is copied to destdb1, db2 is copied to destdb2, and db3 is copied to db3.
gpcopy --dest-host mdw-2 --dbname=db1,db2,db3 --dest-dbname=destdb1,destdb2,db3 --drop
If the source database does not exist, gpcopy returns an error and quits. If a destination database does not exist a database is created.
Valid only with the --dbname option.
--include-table db.schema.table
One or more tables from the source database system to copy. You must provide fully-qualified table names (database.schema.table). You cannot specify views or system catalog tables. To copy multiple tables, include a comma-separated list of table names or use regular expressions to describe a set of tables. You can optionally use --dest-table to change the databases into which tables in --include-table are copied.
You can use Go language regular expressions in the database, schema, and table portions of the fully-qualified table name to define a set of input tables. The regular expression pattern must be enclosed in slashes (/RE_pattern/). For example, --include-table mytest.public.demo/.*/ specifies all tables that begin with demo in the mytest database in the public schema.
The following two examples for the --include-table option are equivalent. They both specify a set of tables that begins with demo and ends with zero or more digits.
--include-table testdb.schema1.demo/[0-9]*/
--include-table testdb.schema1./demo[0-9]*/
Regular expression capture groups in the database portion of the fully-qualified name can be referenced in --dest-table to change the destination database for a table. (Using capture groups to change the schema or table name is not currently supported.)
If the source table does not exist, gpcopy returns an error and quits.
If the destination table or database does not exist, it is created. Only the table and table data are copied, not dependent objects. Indexes are re-created only if the --drop option is specified. Dependent objects are not copied.
This option is not allowed with the options: --full, --dbname, --include-table-file, or --include-table-json.
--dest-table db.schema.table
(Optional.) Changes the database where tables defined with --include-table are copied. Greenplum does not currently support changing the destination schema or table name.
You must provide fully-qualified table names (database.schema.table). Specify multiple tables using either a comma-separated list or by referencing regular expression capture groups that were defined with --include-table. If you use a comma-separated list of tables with --include-table, use the same number position in the --dest-table list to change the destination database of the corresponding table. For example, to move only the second table provided with an option like
--include-table mytest.public.table1,mytest.public.table2
use an option similar to:
--dest-table production.public.table1,production.public.table2
If you use Go language regular expressions capture groups in --include-table to define a set of tables, you can reference the capture groups in --dest-table to rename the destination database. Capture groups referenced in --include-table or --dest-table must use forward slash delimiters (/) to indicate regular expression processing. For example, to reference the capture group defined in the database string of
--include-table testdb/(\d+)/.myschema/(\d+)/.mytable/(\d+)/
use an option similar to:
--dest-table productiondb/$1/.myschema/$1/.mytable/$1/
Note that the capture group numbering starts at 1 and restarts for each component of the fully-qualified table name (database, schema, and table). Therefore, in the above example each capture group is referenced with /$1/ in --dest-table. You cannot rename the destination schema or table name components.
If regular expression rules would cause more than one source table name to be remapped to the same destination table, then the --skip-existing, --truncate, --drop, or --append options determine how gpcopy handles subsequent copy requests to the existing table.
--include-table-file table-file
The location and name of a text file that defines the tables and data to copy. To use multiple files, specify this option for each file.
--include-table-file <path_to_file1> --include-table-file <path_to_file2>
In the text file, specify a single fully qualified table per line (database.schema.table). You cannot specify views or system catalog tables.
You can use Go language regular expression syntax to select multiple tables. See the --dest-table option for information about using regular expressions to select tables.
This option cannot be specified with the --full, --dbname, or --include-table options.
--include-table-json json-table-file
The location and name of a JSON-format file that defines the tables and data to copy. In contrast to the text file used with --include-table-file, the JSON file can include a destination table name used to change the database into which the table data is copied.
The JSON file that you provide must define one or more objects with key-value pairs to describe a source table, an optional query of the source table data to copy, and an optional destination table to indicate the database where the data is copied. Place the name-value pairs for multiple table objects in a JSON array. For example:
[
     {
         "source": "database.schema.table",
         "sql": "query"
         "dest": "database.schema.table"
     },
     {
         ...     }
]
You cannot specify views or system catalog tables as source tables.
Any query that you provide must reference a single source table and output to the same columns of the source table.
If the query includes an ORDER BY clause, then the target Greenplum cluster must be the same size (the same number of segments) as the source cluster.
Keep in mind that certain characters must be escaped in strings in order for them to be parsed in JSON, specifically:
Character Escape Sequence
Backspace \b
Form Feed \f
Newline \n
Tab \t
Double Quote \"
Backslash \\
If the file cannot be parsed as JSON, gpcopy exits with an error.
You can use Go language regular expression syntax to select multiple tables. Capture groups defined in the source: key can be referenced in the dest: key to change the destination database where the data is copied. For example:
[
     {
         "source": "testdb/(\\d+)/.myschema/(\\d+)/.mytable/(\\d+)/",
         "dest": "productiondb/$1/.myschema/$1/.mytable/$1/"
     },
]
See the --dest-table option for more information about using regular expressions and capture groups.
Note: If you use regular expressions to copy multiple tables, you cannot include the sql: key.
This option cannot be specified with the --full, --dbname, --include-table options. You cannot use this option with --parallelize-leaf-partitions if the JSON file includes a sql: key that queries a partitioned table.
--metadata-only
Create only the schemas specified by the command. Data is not transferred.
If specified with the --full option, gpcopy replicates the complete database schema, including all tables, indexes, views, user-defined types (UDT), and user-defined functions (UDF) for the source databases. No data is transferred.
If you specify databases with the --dbname option or tables with the --include-table --include-table-file, or --include-table-json options, gpcopy creates only the tables and indexes. No data is transferred.
This option cannot be used with --truncate option.
--exclude-table db.schema.table
A table from the source database system to exclude from transfer. The fully qualified table name must be specified (database.schema.table).
To exclude multiple tables, specify a comma-separated list of table names.
A set of tables can be specified using the Go language regular expression syntax. See the --include-table option for information about using regular expressions.
Only the specified tables are excluded, not dependent objects. You cannot specify views or system catalog tables.
This option must be specified with one of these options: --full, --dbname, --include-table, --include-table-file, or --include-table-json. If the option --exclude-table results in no table to copy, the database or schema is not created in the destination system.
--exclude-table-file table-file
The location and name of file containing a list of fully qualified table names to exclude from copying to the destination system. In the text file, specify a single fully qualified table per line (database.schema.table). To specify multiple files, specify this option for each file.
--exclude-table-file <path_to_file1> --exclude-table-file <path_to_file2>
In the file, a set of tables can be specified using the Go language regular expression syntax. See the --include-table option for information about using regular expressions.
If a source table does not exist, gpcopy displays a warning.
Only the specified tables are excluded. You cannot specify views or system catalog tables.
This option must be specified with one of these options: --full, --dbname, --include-table, --include-table-file, or --include-table-json. If the option --exclude-table results in no tables to copy, the database or schema is not created in the destination system.

Connection Options

The following options specify connection information for the destination and source Greenplum clusters. Only --dest-host is required. --jobs, --on-segment-threshold, and --parallelize-leaf-partitions affect the number of simultaneous connections used for data transfer. --data-port-range defines the ports used for data transfer to destination segments or the destination master.
Note: gpcopy does not currently support SSL encryption for its connections.
--dest-host dest_host
Required. The destination Greenplum Database master segment hostname or IP address.
--dest-port dest_port
The destination Greenplum Database master segment port number. If --dest-port is not specified, then the default is 5432.
--dest-user dest_user
The user ID that is used to connect to the destination Greenplum master. If not specified, the default is gpadmin.
--source-host source_host
The source Greenplum Database master segment host name or IP address. If not specified, the default host is the system running gpcopy (127.0.0.1).
--source-port source_port
The source Greenplum Database master port number. If not specified, the default is 5432.
--source-user source_user
The user ID that is used to connect to the source Greenplum Database system. If not specified, the default is gpadmin.
--jobs int
The number processes that gpcopy runs in parallel. The default is 4. The range is from 1 to 64.
The option --jobs produces 2*n+1 database connections. The default value, 4, creates 9 connections.
If you increase this option, ensure that Greenplum Database systems are configured with a sufficient maximum concurrent connection value to accommodate the gpcopy connections and other concurrent connections such as user connections. See the Greenplum Database server configuration parameter max_connections.
--on-segment-threshold int
Specifies the number of rows that determines when gpcopy copies a table using the Greenplum Database source and destination master instead of the source and destination segment instances. The default value is 10000 rows. If a table contains 10000 rows or less, the table is copied using the Greenplum Database master.
The value -1 disables copying tables using the master. All tables are copied using the segment instances.
For smaller tables, copying tables using the Greenplum Database master is more efficient than using segment instances.
--parallelize-leaf-partitions
If specified, the utility copies the leaf partition tables of a partitioned table in parallel. The default is to copy the partitioned table as a single table based on the root partition table.
If the --validate option is also specified, the utility validates each leaf partition table during the copy process and then validates the entire partitioned table.
This option cannot be specified with --include-table-json if the JSON file includes a sql: key that queries a partitioned table.
--data-port-range lower_port-upper_port
A range of port numbers to use on Greenplum Database destination hosts for data transfer. This applies to destination segment hosts or, if data is transferred using the master segment, only the master segment host. gpcopy uses the first available port specified in the range (inclusive). lower_port must be greater than or equal to 1024 (to avoid reserved system ports), and upper_port must be a greater value.
The number of ports specified by the range must be greater than or equal to the number of parallel processes created with --jobs, if specified.
If --data-port-range is not specified, then gpcopy uses any available port.

Options for Configuring how Data is Copied

gpcopy provides additional options that affect the way data is copied between systems. You must use one of these options to specify how to manage data in the destination database: --skip-existing, --truncate, --drop, or --append. Other options can be included as necessary, for example, to perform additional data validation.

--skip-existing
Specify this option to skip copying a table from the source database if the table already exists in the destination database.
At most, only one of the options can be specified: --skip-existing, --truncate, --drop, or --append.
--truncate
Specify this option to truncate the table that is in the destination database if it already exists.
At most, only one of the options can be specified: --skip-existing, --truncate, --drop, or --append.
--drop
Specify this option to drop the table that is in the destination database if it already exists. Before copying table data, gpcopy drops the table and creates it again.
At most, only one of the options can be specified: --skip-existing, --truncate, --drop, or --append.
--append
Append data to the table in the destination database if it already exists.
At most, only one of the options can be specified: --skip-existing, --truncate, --drop, or --append.
--analyze
Run the ANALYZE command on non-system tables. The default is to not run the ANALYZE command. The operation is performed for each table after the table data is copied.
--no-compression
If specified, data is transferred without compression. By default, gpcopy compresses data during transfer from the source to the destination database when copying data to a different host.
The utility does not compress data when copying data to the same host.
--no-distribution-check
Specify this option to disable table data distribution checking. By default, gpcopy performs data distribution checking to ensure data is distributed to segment instances correctly. If distribution checking fails, the table copy fails.
The utility does not support table data distribution checking when copying a partitioned table that is defined with a leaf table that is an external table or if a leaf table is defined with a distribution policy that is different from the root partitioned table.
Warning: Before you perform a gpcopy operation with the --no-distribution-check option, ensure that you have a backup of the destination database and that the distribution policies of the tables that are being copied are the same in the source and destination database. Copying data into segment instances with incorrect data distribution can cause incorrect query results and can cause database corruption.
--truncate-source-after
Specify this option to truncate the table that is in the source database after gpcopy copies the table and validates the table data in the destination database.
If you specify this option, you must also specify the --validate option.
--yes
Optional. Automatic confirmation to truncate source table data after copying and validating table data. The prompt to truncate source tables does not appear. The default is to prompt to confirm truncating source table data after copying and validating the data.
--validate type
Perform data validation on the table data in the destination database after the table data is copied. These are the supported type of validation.
  • count - compares row counts between source and destination table data.
  • md5xor - calculates the MD5 value of all rows, then performs an XOR over the MD5 values
If you specify the --append option, and the destination table contains data, validation fails for the table.
If validation for a table fails, gpcopy rolls back the destination table.

Additional Options

--dry-run
When you specify this option, gpcopy generates a list of the migration operations that would have been performed with the specified options. The data is not migrated.
The information is displayed at the command line and written to the log file.
--quiet
If specified, suppress status messages at the command prompt. The messages are sent only to the log file. Higher level messages such as warning and error messages are still displayed.
This option cannot be specified with the --debug option.
--debug
If specified, debug messages are displayed at the command prompt.
This option cannot be specified with the --quiet option.
--version
Displays the version of this utility.
--help
Displays the online help.

Notes

If a gpcopy command specifies an invalid option, or specifies a source table or database that does not exist, the utility returns an error and quits. No data is copied.

The source table data can change while the data is being copied. A lock is not acquired on the source table when data is copied.

The utility cannot copy a row with a width greater than 1GB (a PostgreSQL limitation).

If you copy a set of database tables with the --dbname, --include-table, --include-table-file, or --include-table-json options, and the destination database does not exist, the utility creates the database before copying the tables. If the destination database exists, the utility creates the tables in the database if required.

The gpcopy utility does not copy dependent database objects unless you specify the --full option. For example, if a table has a default value on a column that is a user-defined function, that function must exist in the destination system database when using the --dbname, --include-table, --include-table-file, or --include-table-json options.

When copying tables, sequences defined for tables are considered table data and are copied. A sequence is copied if a table is created with a serial column or if a sequence is specified as a default value. The sequences are reset if you specify the --truncate option.

The utility re-creates table indexes only with --full or --drop options.

The gpcopy utility does not copy configuration files such as postgresql.conf and pg_hba.conf. You must set up the destination system configuration separately.

The gpcopy utility does not copy external objects such as Greenplum Database extensions, third party jar files, and shared object files. You must install the external objects separately.

gpcopy does not currently support SSL encryption for its connections.

Specifying Table Names with Special Characters

When you list a table with the option --include-table or --exclude-table, and the table name or schema name contains single quote ('), double quote (") , or backslash (\), you must escape the character with a backslash (\). These are the escaped characters: \', \", \\. If the table or schema name contains a period (.), comma (,), or a space character, you must enclose the entire name in single quotes (') and enclose the table and schema in double quotes (") at the command prompt shell. In this example, the fully qualified name of the table is testdb.schema'1.table"test.
--include-table 'testdb."schema\'1"."table\"test"'
For fully qualified table names listed in a file that is used with the options --include-table-file, --include-table-json, or --exclude-table-file, if the table name or schema name contains a period (.), the name must be enclosed in double quotes ("). In this example, the table table.test is in the testdb, and belongs to the schema schema.1.
testdb."schema.1"."table.test"

Copying Partitioned Tables

When copying data for a partitioned table, if a leaf partition has been exchanged with an external table, that leaf partition is created, but data is not copied.

If you specify copying leaf partitions of a partitioned table with an option such as --include-table or --exclude-table, gpcopy creates the partitioned table if it does not exist. Only the data for the specified leaf partitions are added to the partitioned table. Specifying individual leaf partitions is useful when the entire partitioned tabled does not need to be copied.

gpcopy does not support table data distribution checking when copying a partitioned table that is defined with a leaf table that is an external table or if a leaf table is defined with a distribution policy that is different from the root partitioned table. You can copy those tables in a gpcopy operation and specify the option --no-distribution-check to disable checking of data distribution.

Warning: Before you perform a gpcopy operation with the --no-distribution-check option, ensure that you have a backup of the destination database and that the distribution policies of the tables that are being copied are the same in the source and destination database. Copying data into segment instances with incorrect data distribution can cause incorrect query results and can cause database corruption.

Handling gpcopy Errors

When gpcopy encounters errors and quits or is cancelled by the user, current copy operations on tables in the destination database are rolled back. Copy operations that have completed are not rolled back.

If an error occurs during the process of copying a table, or table validation fails, gpcopy continues copying the other specified tables. After gpcopy finishes, it displays a list of tables where errors occurred or validation failed and displays a gpcopy command. You can use the provided command to retry copying the failed tables.

The gpcopy utility logs messages in log file gpcopy_date.log in the ~/gpAdminLogs directory on the master host. If you run multiple gpcopy commands on the same day, the utility appends messages to that day's log file.

After gpcopy completes, it displays a summary of the operations performed. If the utility fails to copy tables, they are highlighted in summary, and there is a gpcopy command provided in summary for user to just copy the failed tables. The information is displayed at the command prompt and in the gpcopy log file.

After resolving the issues that caused the copy operations to fail, you can run the provided command to copy the tables that failed in the previous gpcopy command.

Database Connections Created by gpcopy

The option --jobs produces 2*n+1 database connections. The default value 4, creates 9 connections.

If you increase this option, ensure that both source and destination Greenplum Database is configured with a sufficient maximum concurrent connection value to accommodate the gpcopy connections and other concurrent connections such as user connections. See the Greenplum Database server configuration parameter max_connections.

Examples

This command copies all user created databases in a source system to a destination system with the --full option. And drops the table and creates it again if it already exists in the destination.

gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
    --dest-host demohost --dest-port 1234 --dest-user gpuser \
    --full --drop

This command copies the specified databases in a source system to a destination system with the --dbname option. The --truncate option truncates the table data before copying table data from the source table.

gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
   --dest-host demohost --dest-port 1234 --dest-user gpuser \
   --dbname database1, database2 --truncate

This command copies the specified tables in a source system to a destination system with the --include-table option. The --skip-existing option skips the table if it already exists in the destination database.

gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
   --dest-host demohost --dest-port 1234 --dest-user gpuser \
   --include-table database.schema.table1, database.schema.table2 --skip-existing

This command copies the tables from the source database to the destination system, excluding the tables specified in the specified tables in /home/gpuser/mytables with --exclude-table-file option. The --truncate option truncates tables that already exist in the destination system. With the options --analyze and --validate count, the utility performs an ANALYZE operation on the copied tables, and validates the copied table data by comparing row counts between source and destination tables.

gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
   --dest-host demohost --dest-port 1234 --dest-user gpuser \
   --dbname database1 --exclude-table-file /home/gpuser/mytables \
   --truncate --analyze --validate count

This command specifies the --full and --metadata-only options to copy the complete database schema, including all tables, indexes, views, user-defined types (UDT), and user-defined functions (UDF) from all the source databases. No data is copied, The --drop option specifies that the table is dropped in the destination database before it is created again if the table exists in both the source and destination database.

gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
   --dest-host demohost --dest-port 1234 --dest-user gpuser \
   --full --metadata-only --drop

This command copies the specified databases in a source system to a destination system with the --dbname option and specifies 8 parallel processes with the --jobs option. The command specifies the --truncate option to truncate the table and create it again if it already exists in the destination database, and uses uses ports in the range 2000-2010 for the parallel process connections.

gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
   --dest-host demohost --dest-port 1234 --dest-user gpuser \
   --dbname database1, database2 --truncate --jobs 8 --data-port-range 2000-2010

This command copies the specified database in a source system to a destination system with the --dbname option and specifies 16 parallel processes with the --jobs option. The --truncate option truncates the table and creates it again if it already exists in the destination database. The --truncate-source-after option truncates the tables in source database after that table data has been validated in the destination database.

gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
   --dest-host demohost --dest-port 1234 --dest-user gpuser \
   --dbname database1 --truncate --jobs 16 --truncate-source-after --validate count

In the previous example, if --truncate was not specified and the destination table contained data, validation would fail.

This is an example table file that uses regular expressions.

"test1.arc/.*/./.*/"
"test1.c/(..)/y./.*/"

In the first line, the regular expressions for the schemas, arc/.*/, and for the tables, /.*/, limit the transfer to all tables with the schema names that start with arc.

In the second line, the regular expressions for the schemas, c/(..)/y, and for the tables, /.*/, limit the transfer to all tables with the schema names that are four characters long and that start with c and end with y, for example, crty.

When the command is run, tables in the database test1 that satisfy either condition are copied to the destination database.

See Also

For information about migrating data, see the Greenplum Database Administrator Guide.