Expanding a Greenplum System

Expanding a Greenplum System

To scale up performance and storage capacity, you expand the system by adding hosts to the array.

This chapter provides information about adding resources to an existing Greenplum Database system to scale performance and storage capacity. The expansion process requires planning and careful execution.

This chapter focuses on software aspects of expansion and gives a general overview for preparing hardware platforms and focuses on software aspects of expansion. To configure hardware resources for Greenplum Database expansion, work with Greenplum Database platform engineers.

Planning Greenplum System Expansion

This topic provides a synopsis and checklist for the system expansion process. Careful planning helps ensure a successful system expansion. To minimize risk and downtime for Greenplum Database, prepare all new hardware and plan each step of the expansion procedure.

Planning Table Redistribution describes performance considerations for large-scale systems.

System Expansion Overview

System expansion consists of the following phases. Perform these operations with the system offline. The gpexpand utility shuts down the database during initialization if an administrator has not already done so.

  • Adding and testing new hardwarePlanning New Hardware Platforms describes general considerations for deploying new hardware. For more information about hardware platforms, consult Greenplum platform engineers. After you provision the new hardware platforms and set up their networks, run performance tests using Greenplum utilities.
  • Initializing new segments — After you install Greenplum Database on new hardware, initialize new segments using gpexpand. gpexpand creates a data directory, copies user tables from all existing databases on the new segments, and captures metadata for each table in an expansion schema for status tracking. After this process completes, the expansion operation is committed and irrevocable.
  • Redistributing tables — At initialization, gpexpand nullifies hash distribution policies on tables in all existing databases, except for parent tables of a partitioned table, and sets the distribution policy for all tables to random distribution.

    Users can access Greenplum Database after initialization completes and the system is back online, but they may experience performance degradation on systems that rely heavily on hash distribution of tables.

    During redistribution, normal operations such as ETL jobs, user queries, and reporting can continue, though users might experience slower response times.

    When a table has a random distribution policy, Greenplum Database cannot enforce unique constraints (such as PRIMARY KEY). This can affect your ETL and loading processes until table redistribution completes because duplicate rows do not issue a constraint violation error.

To complete system expansion, you must run gpexpand to redistribute data tables across the newly added segments. Depending on the size and scale of your system, redistribution can be accomplished in a single session during low-use hours, or you can divide the process into batches over an extended period. Each table or partition is unavailable for read or write operations during redistribution. As each table is redistributed across the new segments, database performance should incrementally improve until it exceeds pre-expansion performance levels.

In a typical operation, you run the gpexpand utility four times with different options during the complete expansion process.

  1. To interactively create an expansion input file:
    gpexpand -f hosts_file
  2. To initialize segments and create expansion schema:
    gpexpand -i input_file -D database_name
  3. To redistribute tables:
    gpexpand -d duration
  4. To remove the expansion schema:
    gpexpand -c

You may need to run gpexpand several times to complete the expansion in large-scale systems that require multiple redistribution sessions. gpexpand can benefit from explicit table redistribution ranking; see Planning Table Redistribution.

For information about the gpexpand utility and the other utilities that are used for system expansion, see the Greenplum Database Utility Guide.

System Expansion Checklist

This checklist summarizes the tasks for a system expansion.

Online Pre-Expansion Tasks

* System is up and available



Devise and execute a plan for ordering, building, and networking new hardware platforms.



Devise a database expansion plan. Map the number of segments per host, schedule the offline period for testing performance and creating the expansion schema, and schedule the intervals for table redistribution.



Perform a complete schema dump.



Install Greenplum Database binaries on new hosts.



Copy SSH keys to the new hosts (gpssh-exkeys).



Validate the operating system environment of the new hardware (gpcheck).



Validate disk I/O and memory bandwidth of the new hardware (gpcheckperf).



Validate that the master data directory has no extremely large files in the pg_log or gpperfmon/data directories.



Validate that there are no catalog issues (gpcheckcat).



Prepare an expansion input file (gpexpand).

Offline Expansion Tasks

* The system is locked and unavailable to all user activity during this process.



Validate the operating system environment of the combined existing and new hardware (gpcheck).



Validate disk I/O and memory bandwidth of the combined existing and new hardware (gpcheckperf).



Initialize new segments into the array and create an expansion schema (gpexpand-i input_file).

Online Expansion and Table Redistribution

* System is up and available



Before you start table redistribution, stop any automated snapshot processes or other processes that consume disk space.



Redistribute tables through the expanded system (gpexpand).



Remove expansion schema (gpexpand -c).



Run analyze to update distribution statistics.

During the expansion, use gpexpand -a, and post-expansion, use analyze.

Planning New Hardware Platforms

A deliberate, thorough approach to deploying compatible hardware greatly minimizes risk to the expansion process.

Hardware resources and configurations for new segment hosts should match those of the existing hosts. Work with Greenplum Platform Engineering before making a hardware purchase to expand Greenplum Database.

The steps to plan and set up new hardware platforms vary for each deployment. Some considerations include how to:

  • Prepare the physical space for the new hardware; consider cooling, power supply, and other physical factors.
  • Determine the physical networking and cabling required to connect the new and existing hardware.
  • Map the existing IP address spaces and developing a networking plan for the expanded system.
  • Capture the system configuration (users, profiles, NICs, and so on) from existing hardware to use as a detailed list for ordering new hardware.
  • Create a custom build plan for deploying hardware with the desired configuration in the particular site and environment.

After selecting and adding new hardware to your network environment, ensure you perform the burn-in tasks described in Verifying OS Settings.

Planning New Segment Initialization

Expanding Greenplum Database requires a limited period of system down time. During this period, run gpexpand to initialize new segments into the array and create an expansion schema.

The time required depends on the number of schema objects in the Greenplum system and other factors related to hardware performance. In most environments, the initialization of new segments requires less than thirty minutes offline.

Note: After you begin initializing new segments, you can no longer restore the system using backup files created for the pre-expansion system. When initialization successfully completes, the expansion is committed and cannot be rolled back.

Planning Mirror Segments

If your existing array has mirror segments, the new segments must have mirroring configured. If there are no mirrors configured for existing segments, you cannot add mirrors to new hosts with the gpexpand utility.

For Greenplum Database arrays with mirror segments, ensure you add enough new host machines to accommodate new mirror segments. The number of new hosts required depends on your mirroring strategy:

  • Spread Mirroring — Add at least one more host to the array than the number of segments per host. The number of separate hosts must be greater than the number of segment instances per host to ensure even spreading.
  • Grouped Mirroring — Add at least two new hosts so the mirrors for the first host can reside on the second host, and the mirrors for the second host can reside on the first. For more information, see About Segment Mirroring.

Increasing Segments Per Host

By default, new hosts are initialized with as many primary segments as existing hosts have. You can increase the segments per host or add new segments to existing hosts.

For example, if existing hosts currently have two segments per host, you can use gpexpand to initialize two additional segments on existing hosts for a total of four segments and four new segments on new hosts.

The interactive process for creating an expansion input file prompts for this option; the input file format allows you to specify new segment directories manually, also. For more information, see Creating an Input File for System Expansion.

About the Expansion Schema

At initialization, gpexpand creates an expansion schema. If you do not specify a database at initialization (gpexpand -D), the schema is created in the database indicated by the PGDATABASE environment variable.

The expansion schema stores metadata for each table in the system so its status can be tracked throughout the expansion process. The expansion schema consists of two tables and a view for tracking expansion operation progress:

  • gpexpand.status
  • gpexpand.status_detail
  • gpexpand.expansion_progress

Control expansion process aspects by modifying gpexpand.status_detail. For example, removing a record from this table prevents the system from expanding the table across new segments. Control the order in which tables are processed for redistribution by updating the rank value for a record. For more information, see Ranking Tables for Redistribution.

Planning Table Redistribution

Table redistribution is performed while the system is online. For many Greenplum systems, table redistribution completes in a single gpexpand session scheduled during a low-use period. Larger systems may require multiple sessions and setting the order of table redistribution to minimize performance impact. Pivotal recommends completing the table redistribution in one session if possible.

Important: To perform table redistribution, your segment hosts must have enough disk space to temporarily hold a copy of your largest table. All tables are unavailable for read and write operations during redistribution.

The performance impact of table redistribution depends on the size, storage type, and partitioning design of a table. Per table, redistributing a table with gpexpand takes as much time as a CREATE TABLE AS SELECT operation does. When redistributing a terabyte-scale fact table, the expansion utility can use much of the available system resources, with resulting impact on query performance or other database workloads.

Managing Redistribution in Large-Scale Greenplum Systems

You can manage the order in which tables are redistributed by adjusting their ranking. See Ranking Tables for Redistribution. Manipulating the redistribution order can help adjust for limited disk space and restore optimal query performance.

When planning the redistribution phase, consider the impact of the exclusive lock taken on each table during redistribution. User activity on a table can delay its redistribution. Tables are unavailable during redistribution.

  • Systems with Abundant Free Disk Space

    In systems with abundant free disk space (required to store a copy of the largest table), you can focus on restoring optimum query performance as soon as possible by first redistributing important tables that queries use heavily. Assign high ranking to these tables, and schedule redistribution operations for times of low system usage. Run one redistribution process at a time until large or critical tables have been redistributed.

  • Systems with Limited Free Disk Space

    If your existing hosts have limited disk space, you may prefer to first redistribute smaller tables (such as dimension tables) to clear space to store a copy of the largest table. Disk space on the original segments will increase as each table is redistributed across the expanded array. When enough free space exists on all segments to store a copy of the largest table, you can redistribute large or critical tables. Redistribution of large tables requires exclusive locks; schedule this procedure for off-peak hours.

  • Also consider the following:
    • Run multiple parallel redistribution processes during off-peak hours to maximize available system resources.
    • When running multiple processes, operate within the connection limits for your Greenplum system. For information about limiting concurrent connections, see Limiting Concurrent Connections.

Redistributing Append-Optimized and Compressed Tables

gpexpand redistributes append-optimized and compressed append-optimized tables at different rates from heap tables. The CPU capacity required to compress and decompress data tends to increase the impact on system performance. For similar-sized tables with similar data, you may find overall performance differences like the following:
  • Uncompressed append-optimized tables expand 10% faster than heap tables
  • zlib-compressed append-optimized tables expand at a significantly slower rate than uncompressed append-optimized tables, potentially up to 80% slower.
  • Systems with data compression such as ZFS/LZJB take longer to redistribute.
Important: If your system nodes use data compression, use identical compression on new nodes to avoid disk space shortage.

Redistributing Tables with Primary Key Constraints

There is a time period during which primary key constraints cannot be enforced between the initialization of new segments and successful table redistribution. Duplicate data inserted into tables during this time prevents the expansion utility from redistributing the affected tables.

After a table is redistributed, the primary key constraint is properly enforced again. If an expansion process violates constraints, the expansion utility logs errors and displays warnings when it completes. To fix constraint violations, perform either:
  • Clean up duplicate data in the primary key columns, and re-run gpexpand.
  • Drop the primary key constraints, and re-run gpexpand.

Redistributing Tables with User-Defined Data Types

You cannot perform redistribution with the expansion utility on tables with dropped columns of user-defined data types. To redistribute tables with dropped columns of user-defined types, first re-create the table using CREATE TABLE AS SELECT. After this process removes the dropped columns, redistribute the table with gpexpand.

Redistributing Partitioned Tables

Because the expansion utility can process each individual partition on a large table, an efficient partition design reduces the performance impact of table redistribution. Only the child tables of a partitioned table are set to a random distribution policy. The read/write lock for redistribution applies to only one child table at a time.

Redistributing Indexed Tables

Because the gpexpand utility must re-index each indexed table after redistribution, a high level of indexing has a large performance impact. Systems with intensive indexing have significantly slower rates of table redistribution.

Preparing and Adding Nodes

To prepare new system nodes for expansion, install the Greenplum Database software binaries, exchange the required SSH keys, and run performance tests. Pivotal recommends running performance tests on first the new nodes and then all nodes. Run the tests on all nodes with the system offline so user activity does not distort results.

Generally, Pivotal recommends running performance tests when an administrator modifies node networking or other special conditions in the system. For example, if you will run the expanded system on two network clusters, run tests on each cluster.

This topic describes how to run Greenplum administrative utilities to verify your new nodes are ready for integration into the existing Greenplum system.

Adding New Nodes to the Trusted Host Environment

New nodes must exchange SSH keys with the existing nodes to enable Greenplum administrative utilities to connect to all segments without a password prompt. Pivotal recommends performing the key exchange process twice.

First perform the process as root, for administration convenience, and then as the user gpadmin, for management utilities. Perform the following tasks in order:

  1. To exchange SSH keys as root
  2. To create the gpadmin user
  3. To exchange SSH keys as the gpadmin user
Note: The Greenplum Database segment host naming convention is sdwN where sdw is a prefix and N is an integer. For example, on a Greenplum Database DCA system, segment host names would be sdw1, sdw2 and so on. For hosts with multiple interfaces, the convention is to append a dash (-) and number to the host name. For example, sdw1-1 and sdw1-2 are the two interface names for host sdw1.

To exchange SSH keys as root

  1. Create a host file with the existing host names in your array and a separate host file with the new expansion host names. For existing hosts, you can use the same host file used to set up SSH keys in the system. In the files, list all hosts (master, backup master, and segment hosts) with one name per line and no extra lines or spaces. Exchange SSH keys using the configured host names for a given host if you use a multi-NIC configuration. In this example, mdw is configured with a single NIC, and sdw1, sdw2, and sdw3 are configured with 4 NICs:
    mdw
    sdw1-1
    sdw1-2
    sdw1-3
    sdw1-4
    sdw2-1
    sdw2-2
    sdw2-3
    sdw2-4
    sdw3-1
    sdw3-2
    sdw3-3
    sdw3-4
  2. Log in as root on the master host, and source the greenplum_path.sh file from your Greenplum installation.
    $ su - 
    # source /usr/local/greenplum-db/greenplum_path.sh
  3. Run the gpssh-exkeys utility referencing the host list files. For example:
    # gpssh-exkeys -f /home/gpadmin/existing_hosts_file -x 
    /home/gpadmin/new_hosts_file
  4. gpssh-exkeys checks the remote hosts and performs the key exchange between all hosts. Enter the root user password when prompted. For example:
    ***Enter password for root@hostname: <root_password>

To create the gpadmin user

  1. Use gpssh to create the gpadmin user on all the new segment hosts (if it does not exist already). Use the list of new hosts you created for the key exchange. For example:
    # gpssh -f new_hosts_file '/usr/sbin/useradd gpadmin -d 
    /home/gpadmin -s /bin/bash'
  2. Set a password for the new gpadmin user. On Linux, you can do this on all segment hosts simultaneously using gpssh. For example:
    # gpssh -f new_hosts_file 'echo gpadmin_password | passwd 
    gpadmin --stdin'
  3. Verify the gpadmin user has been created by looking for its home directory:
    # gpssh -f new_hosts_file ls -l /home

To exchange SSH keys as the gpadmin user

  1. Log in as gpadmin and run the gpssh-exkeys utility referencing the host list files. For example:
    # gpssh-exkeys -e /home/gpadmin/existing_hosts_file -x 
    /home/gpadmin/new_hosts_file
  2. gpssh-exkeys will check the remote hosts and perform the key exchange between all hosts. Enter the gpadmin user password when prompted. For example:
    ***Enter password for gpadmin@hostname: <gpadmin_password>

Verifying OS Settings

Use the gpcheck utility to verify all new hosts in your array have the correct OS settings to run Greenplum Database software.

To run gpcheck

  1. Log in on the master host as the user who will run your Greenplum Database system (for example, gpadmin).
    $ su - gpadmin
  2. Run the gpcheck utility using your host file for new hosts. For example:
    $ gpcheck -f new_hosts_file

Validating Disk I/O and Memory Bandwidth

Use the gpcheckperf utility to test disk I/O and memory bandwidth.

To run gpcheckperf

  1. Run the gpcheckperf utility using the host file for new hosts. Use the -d option to specify the file systems you want to test on each host. You must have write access to these directories. For example:
    $ gpcheckperf -f new_hosts_file -d /data1 -d /data2 -v 
  2. The utility may take a long time to perform the tests because it is copying very large files between the hosts. When it is finished, you will see the summary results for the Disk Write, Disk Read, and Stream tests.

For a network divided into subnets, repeat this procedure with a separate host file for each subnet.

Integrating New Hardware into the System

Before initializing the system with the new segments, shut down the system with gpstop to prevent user activity from skewing performance test results. Then, repeat the performance tests using host files that include all nodes, existing and new:

Initializing New Segments

Use the gpexpand utility to initialize the new segments, create the expansion schema, and set a system-wide random distribution policy for the database. The utility performs these tasks by default the first time you run it with a valid input file on a Greenplum Database master. Subsequently, it will detect if an expansion schema has been created and, if so, perform table redistribution.

Creating an Input File for System Expansion

To begin expansion, gpexpand requires an input file containing information about the new segments and hosts. If you run gpexpand without specifying an input file, the utility displays an interactive interview that collects the required information and automatically creates an input file.

If you create the input file using the interactive interview, you may specify a file with a list of expansion hosts in the interview prompt. If your platform or command shell limits the length of the host list, specifying the hosts with -f may be mandatory.

Creating an input file in Interactive Mode

Before you run gpexpand to create an input file in interactive mode, ensure you know:

  • The number of new hosts (or a hosts file)
  • The new hostnames (or a hosts file)
  • The mirroring strategy used in existing hosts, if any
  • The number of segments to add per host, if any

The utility automatically generates an input file based on this information, dbid, content ID, and data directory values stored in gp_segment_configuration, and saves the file in the current directory.

To create an input file in interactive mode
  1. Log in on the master host as the user who will run your Greenplum Database system; for example, gpadmin.
  2. Run gpexpand. The utility displays messages about how to prepare for an expansion operation, and it prompts you to quit or continue.

    Optionally, specify a hosts file using -f. For example:

    $ gpexpand -f /home/gpadmin/new_hosts_file
  3. At the prompt, select Y to continue.
  4. Unless you specified a hosts file using -f, you will be prompted to enter hostnames. Enter a comma separated list of the hostnames of the new expansion hosts. Do not include interface hostnames. For example:
    > sdw4, sdw5, sdw6, sdw7

    To add segments to existing hosts only, enter a blank line at this prompt. Do not specify localhost or any existing host name.

  5. Enter the mirroring strategy used in your system, if any. Options are spread|grouped|none. The default setting is grouped.

    Ensure you have enough hosts for the selected grouping strategy. For more information about mirroring, see Planning Mirror Segments.

  6. Enter the number of new primary segments to add, if any. By default, new hosts are initialized with the same number of primary segments as existing hosts. Increase segments per host by entering a number greater than zero. The number you enter will be the number of additional segments initialized on all hosts. For example, if existing hosts currently have two segments each, entering a value of 2 initializes two more segments on existing hosts, and four segments on new hosts.
  7. If you are adding new primary segments, enter the new primary data directory root for the new segments. Do not specify the actual data directory name, which is created automatically by gpexpand based on the existing data directory names.

    For example, if your existing data directories are as follows:

    /gpdata/primary/gp0
    /gpdata/primary/gp1

    then enter the following (one at each prompt) to specify the data directories for two new primary segments:

    /gpdata/primary
    /gpdata/primary

    When the initialization runs, the utility creates the new directories gp2 and gp3 under /gpdata/primary.

  8. If you are adding new mirror segments, enter the new mirror data directory root for the new segments. Do not specify the data directory name; it is created automatically by gpexpand based on the existing data directory names.

    For example, if your existing data directories are as follows:

    /gpdata/mirror/gp0
    /gpdata/mirror/gp1

    enter the following (one at each prompt) to specify the data directories for two new mirror segments:

    /gpdata/mirror
    /gpdata/mirror

    When the initialization runs, the utility will create the new directories gp2 and gp3 under /gpdata/mirror.

    These primary and mirror root directories for new segments must exist on the hosts, and the user running gpexpand must have permissions to create directories in them.

    After you have entered all required information, the utility generates an input file and saves it in the current directory. For example:

    gpexpand_inputfile_yyyymmdd_145134

Expansion Input File Format

Pivotal recommends using the interactive interview process to create your own input file unless your expansion scenario has atypical needs.

The format for expansion input files is:

hostname:address:port:fselocation:dbid:content:preferred_role:replication_port

For example:

sdw5:sdw5-1:50011:/gpdata/primary/gp9:11:9:p:53011
sdw5:sdw5-2:50012:/gpdata/primary/gp10:12:10:p:53011
sdw5:sdw5-2:60011:/gpdata/mirror/gp9:13:9:m:63011
sdw5:sdw5-1:60012:/gpdata/mirror/gp10:14:10:m:63011

For each new segment, this format of expansion input file requires the following:

Table 1. Data for the expansion configuration file
Parameter Valid Values Description
hostname Hostname Hostname for the segment host.
port An available port number Database listener port for the segment, incremented on the existing segment port base number.
fselocation Directory name The data directory (filespace) location for a segment as per the pg_filespace_entry system catalog.
dbid Integer. Must not conflict with existing dbid values. Database ID for the segment. The values you enter should be incremented sequentially from existing dbid values shown in the system catalog gp_segment_configuration. For example, to add four nodes to an existing ten-segment array with dbid values of 1-10, list new dbid values of 11, 12, 13 and 14.
content Integer. Must not conflict with existing content values. The content ID of the segment. A primary segment and its mirror should have the same content ID, incremented sequentially from existing values. For more information, see content in the reference for gp_segment_configuration.
preferred_role p | m Determines whether this segment is a primary or mirror. Specify pfor primary and mfor mirror.
replication_port An available port number File replication port for the segment, incremented on the existing segment replication_port base number.

Running gpexpand to Initialize New Segments

After you have created an input file, run gpexpand to initialize new segments. The utility automatically stops Greenplum Database segment initialization and restarts the system when the process finishes.

To run gpexpand with an input file

  1. Log in on the master host as the user who will run your Greenplum Database system; for example, gpadmin.
  2. Run the gpexpand utility, specifying the input file with -i. Optionally, use -D to specify the database in which to create the expansion schema. For example:
    $ gpexpand -i input_file -D database1

    The utility detects if an expansion schema exists for the Greenplum Database system. If a schema exists, remove it with gpexpand -c before you start a new expansion operation. See Removing the Expansion Schema.

    When the new segments are initialized and the expansion schema is created, the utility prints a success message and exits.

When the initialization process completes, you can connect to Greenplum Database and view the expansion schema. The schema resides in the database you specified with -D or in the database specified by the PGDATABASE environment variable. For more information, see About the Expansion Schema.

Rolling Back an Failed Expansion Setup

You can roll back an expansion setup operation only if the operation fails. To roll back a failed expansion setup, use the following command, specifying the database that contains the expansion schema:

gpexpand --rollback -D database_name

Redistributing Tables

After creating an expansion schema, you can bring Greenplum Database back online and redistribute tables across the entire array with gpexpand. Target low-use hours when the utility's CPU usage and table locks have minimal impact on operations. Rank tables to redistribute the largest or most critical tables in preferential order.

Note: When redistributing data, Greenplum Database must be running in production mode. Greenplum Database cannot be restricted mode or in master mode. The gpstart options -R or -m cannot be specified to start Greenplum Database.

While table redistribution is underway:

  • Any new tables or partitions created are distributed across all segments exactly as they would be under normal operating conditions.
  • Queries can access all segments, even if the relevant data is not yet redistributed to tables on the new segments.
  • The table or partition being redistributed is locked and unavailable for read or write operations. When its redistribution completes, normal operations resume.

Ranking Tables for Redistribution

For large systems, Pivotal recommends controlling table redistribution order. Adjust tables' rank values in the expansion schema to prioritize heavily-used tables and minimize performance impact. Available free disk space can affect table ranking; see Managing Redistribution in Large-Scale Greenplum Systems.

To rank tables for redistribution by updating rank values in gpexpand.status_detail, connect to Greenplum Database using psql or another supported client. Update gpexpand.status_detail with commands such as:

=> UPDATE gpexpand.status_detail SET rank= 10; 
=> UPDATE gpexpand.status_detail SET rank=1 WHERE fq_name = 'public.lineitem';
=> UPDATE gpexpand.status_detail SET rank=2 WHERE fq_name = 'public.orders';

These commands lower the priority of all tables to 10 and then assign a rank of 1 to lineitem and a rank of 2 to orders. When table redistribution begins, lineitem is redistributed first, followed by orders and all other tables in gpexpand.status_detail. To exclude a table from redistribution, remove the table from gpexpand.status_detail.

Redistributing Tables Using gpexpand

To redistribute tables with gpexpand

  1. Log in on the master host as the user who will run your Greenplum Database system; for example, gpadmin.
  2. Run the gpexpand utility. You can use the -d or -e option to define the expansion session time period. For example, to run the utility for up to 60 consecutive hours:
    $ gpexpand -d 60:00:00

    The utility redistributes tables until the last table in the schema completes or it reaches the specified duration or end time. gpexpand updates the status and time in gpexpand.status when a session starts and finishes.

Monitoring Table Redistribution

You can query the expansion schema during the table redistribution process. The view gpexpand.expansion_progress provides a current progress summary, including the estimated rate of table redistribution and estimated time to completion. You can query the table gpexpand.status_detail for per-table status information.

Viewing Expansion Status

After the first table completes redistribution, gpexpand.expansion_progress calculates its estimates and refreshes them based on all tables' redistribution rates. Calculations restart each time you start a table redistribution session with gpexpand. To monitor progress, connect to Greenplum Database using psql or another supported client; query gpexpand.expansion_progress with a command like the following:

=# select * from gpexpand.expansion_progress;
             name             |         value
------------------------------+-----------------------
 Bytes Left                   | 5534842880
 Bytes Done                   | 142475264
 Estimated Expansion Rate     | 680.75667095996092 MB/s
 Estimated Time to Completion | 00:01:01.008047
 Tables Expanded              | 4
 Tables Left                  | 4
(6 rows)

Viewing Table Status

The table gpexpand.status_detail stores status, time of last update, and more facts about each table in the schema. To see a table's status, connect to Greenplum Database using psql or another supported client and query gpexpand.status_detail:

=> SELECT status, expansion_started, source_bytes FROM 
gpexpand.status_detail WHERE fq_name = 'public.sales';
  status   |     expansion_started      | source_bytes
-----------+----------------------------+--------------
 COMPLETED | 2009-02-20 10:54:10.043869 |   4929748992
(1 row)

Removing the Expansion Schema

You can safely remove the expansion schema after the expansion operation is complete and verified. To run another expansion operation on a Greenplum system, first remove the existing expansion schema.

To remove the expansion schema

  1. Log in on the master host as the user who will be running your Greenplum Database system (for example, gpadmin).
  2. Run the gpexpand utility with the -c option. For example:
    $ gpexpand -c 
    $ 
    Note: Some systems require two returns.