Expanding a Greenplum System
A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 4.x documentation.
Expanding a Greenplum System
To scale up performance and storage capacity, you expand the system by adding hosts to the array.
This chapter provides information about adding resources to an existing Greenplum Database system to scale performance and storage capacity. The expansion process requires planning and careful execution.
This chapter focuses on software aspects of expansion and gives a general overview for preparing hardware platforms and focuses on software aspects of expansion. To configure hardware resources for Greenplum Database expansion, work with Greenplum Database platform engineers.
Planning Greenplum System Expansion
This topic provides a synopsis and checklist for the system expansion process. Careful planning helps ensure a successful system expansion. To minimize risk and downtime for Greenplum Database, prepare all new hardware and plan each step of the expansion procedure.
Planning Table Redistribution describes performance considerations for large-scale systems.
System Expansion Overview
System expansion consists of the following phases. Perform these operations with the system offline. The gpexpand utility shuts down the database during initialization if an administrator has not already done so.
- Adding and testing new hardware —Planning New Hardware Platforms describes general considerations for deploying new hardware. For more information about hardware platforms, consult Greenplum platform engineers. After you provision the new hardware platforms and set up their networks, run performance tests using Greenplum utilities.
- Initializing new segments — After you install Greenplum Database on new hardware, initialize new segments using gpexpand. gpexpand creates a data directory, copies user tables from all existing databases on the new segments, and captures metadata for each table in an expansion schema for status tracking. After this process completes, the expansion operation is committed and irrevocable.
Redistributing tables — At initialization,
gpexpand nullifies hash distribution policies on tables in all
existing databases, except for parent tables of a partitioned table, and sets the
distribution policy for all tables to random distribution.
Users can access Greenplum Database after initialization completes and the system is back online, but they may experience performance degradation on systems that rely heavily on hash distribution of tables.
During redistribution, normal operations such as ETL jobs, user queries, and reporting can continue, though users might experience slower response times.
When a table has a random distribution policy, Greenplum Database cannot enforce unique constraints (such as PRIMARY KEY). This can affect your ETL and loading processes until table redistribution completes because duplicate rows do not issue a constraint violation error.
To complete system expansion, you must run gpexpand to redistribute data tables across the newly added segments. Depending on the size and scale of your system, redistribution can be accomplished in a single session during low-use hours, or you can divide the process into batches over an extended period. Each table or partition is unavailable for read or write operations during redistribution. As each table is redistributed across the new segments, database performance should incrementally improve until it exceeds pre-expansion performance levels.
In a typical operation, you run the gpexpand utility four times with different options during the complete expansion process.
- To interactively create an expansion input
gpexpand -f hosts_file
- To initialize segments and create expansion
gpexpand -i input_file -D database_name
- To redistribute
gpexpand -d duration
- To remove the expansion schema:
You may need to run gpexpand several times to complete the expansion in large-scale systems that require multiple redistribution sessions. gpexpand can benefit from explicit table redistribution ranking; see Planning Table Redistribution.
For information about the gpexpand utility and the other utilities that are used for system expansion, see the Greenplum Database Utility Guide.
System Expansion Checklist
Online Pre-Expansion Tasks* System is up and available
|Devise and execute a plan for ordering, building, and networking new hardware platforms.|
|Devise a database expansion plan. Map the number of segments per host, schedule the offline period for testing performance and creating the expansion schema, and schedule the intervals for table redistribution.|
|Perform a complete schema dump.|
|Install Greenplum Database binaries on new hosts.|
|Copy SSH keys to the new hosts (gpssh-exkeys).|
|Validate the operating system environment of the new hardware (gpcheck).|
|Validate disk I/O and memory bandwidth of the new hardware (gpcheckperf).|
|Validate that the master data directory has no extremely large files in the pg_log or gpperfmon/data directories.|
|Validate that there are no catalog issues (gpcheckcat).|
|Prepare an expansion input file (gpexpand).|
Offline Expansion Tasks* The system is locked and unavailable to all user activity during this process.
|Validate the operating system environment of the combined existing and new hardware (gpcheck).|
|Validate disk I/O and memory bandwidth of the combined existing and new hardware (gpcheckperf).|
|Initialize new segments into the array and create an expansion schema (gpexpand-i input_file).|
Online Expansion and Table Redistribution* System is up and available
|Before you start table redistribution, stop any automated snapshot processes or other processes that consume disk space.|
|Redistribute tables through the expanded system (gpexpand).|
|Remove expansion schema (gpexpand -c).|
|Run analyze to update distribution statistics.
During the expansion, use gpexpand -a, and post-expansion, use analyze.
Planning New Hardware Platforms
A deliberate, thorough approach to deploying compatible hardware greatly minimizes risk to the expansion process.
Hardware resources and configurations for new segment hosts should match those of the existing hosts. Work with Greenplum Platform Engineering before making a hardware purchase to expand Greenplum Database.
The steps to plan and set up new hardware platforms vary for each deployment. Some considerations include how to:
- Prepare the physical space for the new hardware; consider cooling, power supply, and other physical factors.
- Determine the physical networking and cabling required to connect the new and existing hardware.
- Map the existing IP address spaces and developing a networking plan for the expanded system.
- Capture the system configuration (users, profiles, NICs, and so on) from existing hardware to use as a detailed list for ordering new hardware.
- Create a custom build plan for deploying hardware with the desired configuration in the particular site and environment.
After selecting and adding new hardware to your network environment, ensure you perform the burn-in tasks described in Verifying OS Settings.
Planning New Segment Initialization
Expanding Greenplum Database requires a limited period of system down time. During this period, run gpexpand to initialize new segments into the array and create an expansion schema.
The time required depends on the number of schema objects in the Greenplum system and other factors related to hardware performance. In most environments, the initialization of new segments requires less than thirty minutes offline.
Planning Mirror Segments
If your existing array has mirror segments, the new segments must have mirroring configured. If there are no mirrors configured for existing segments, you cannot add mirrors to new hosts with the gpexpand utility.
For Greenplum Database arrays with mirror segments, ensure you add enough new host machines to accommodate new mirror segments. The number of new hosts required depends on your mirroring strategy:
- Spread Mirroring — Add at least one more host to the array than the number of segments per host. The number of separate hosts must be greater than the number of segment instances per host to ensure even spreading.
- Grouped Mirroring — Add at least two new hosts so the mirrors for the first host can reside on the second host, and the mirrors for the second host can reside on the first. For more information, see About Segment Mirroring
Increasing Segments Per Host
By default, new hosts are initialized with as many primary segments as existing hosts have. You can increase the segments per host or add new segments to existing hosts.
For example, if existing hosts currently have two segments per host, you can use gpexpand to initialize two additional segments on existing hosts for a total of four segments and four new segments on new hosts.
The interactive process for creating an expansion input file prompts for this option; the input file format allows you to specify new segment directories manually, also. For more information, see Creating an Input File for System Expansion.
About the Expansion Schema
At initialization, gpexpand creates an expansion schema. If you do not specify a database at initialization (gpexpand -D), the schema is created in the database indicated by the PGDATABASE environment variable.
The expansion schema stores metadata for each table in the system so its status can be tracked throughout the expansion process. The expansion schema consists of two tables and a view for tracking expansion operation progress:
Control expansion process aspects by modifying gpexpand.status_detail. For example, removing a record from this table prevents the system from expanding the table across new segments. Control the order in which tables are processed for redistribution by updating the rank value for a record. For more information, see Ranking Tables for Redistribution.
Planning Table Redistribution
Table redistribution is performed while the system is online. For many Greenplum systems, table redistribution completes in a single gpexpand session scheduled during a low-use period. Larger systems may require multiple sessions and setting the order of table redistribution to minimize performance impact. Pivotal recommends completing the table redistribution in one session if possible.
The performance impact of table redistribution depends on the size, storage type, and partitioning design of a table. Per table, redistributing a table with gpexpand takes as much time as a CREATE TABLE AS SELECT operation does. When redistributing a terabyte-scale fact table, the expansion utility can use much of the available system resources, with resulting impact on query performance or other database workloads.
Managing Redistribution in Large-Scale Greenplum Systems
You can manage the order in which tables are redistributed by adjusting their ranking. See Ranking Tables for Redistribution. Manipulating the redistribution order can help adjust for limited disk space and restore optimal query performance.
When planning the redistribution phase, consider the impact of the exclusive lock taken on each table during redistribution. User activity on a table can delay its redistribution. Tables are unavailable during redistribution.
Systems with Abundant Free Disk Space
In systems with abundant free disk space (required to store a copy of the largest table), you can focus on restoring optimum query performance as soon as possible by first redistributing important tables that queries use heavily. Assign high ranking to these tables, and schedule redistribution operations for times of low system usage. Run one redistribution process at a time until large or critical tables have been redistributed.
Systems with Limited Free Disk Space
If your existing hosts have limited disk space, you may prefer to first redistribute smaller tables (such as dimension tables) to clear space to store a copy of the largest table. Disk space on the original segments will increase as each table is redistributed across the expanded array. When enough free space exists on all segments to store a copy of the largest table, you can redistribute large or critical tables. Redistribution of large tables requires exclusive locks; schedule this procedure for off-peak hours.
Also consider the following:
- Run multiple parallel redistribution processes during off-peak hours to maximize available system resources.
- When running multiple processes, operate within the connection limits for your Greenplum system. For information about limiting concurrent connections, see Limiting Concurrent Connections.
Redistributing Append-Optimized and Compressed Tables
- Uncompressed append-optimized tables expand 10% faster than heap tables
- zlib-compressed append-optimized tables expand at a significantly slower rate than uncompressed append-optimized tables, potentially up to 80% slower.
- Systems with data compression such as ZFS/LZJB take longer to redistribute.
Redistributing Tables with Primary Key Constraints
There is a time period during which primary key constraints cannot be enforced between the initialization of new segments and successful table redistribution. Duplicate data inserted into tables during this time prevents the expansion utility from redistributing the affected tables.
- Clean up duplicate data in the primary key columns, and re-run gpexpand.
- Drop the primary key constraints, and re-run gpexpand.
Redistributing Tables with User-Defined Data Types
You cannot perform redistribution with the expansion utility on tables with dropped columns of user-defined data types. To redistribute tables with dropped columns of user-defined types, first re-create the table using CREATE TABLE AS SELECT. After this process removes the dropped columns, redistribute the table with gpexpand.
Redistributing Partitioned Tables
Because the expansion utility can process each individual partition on a large table, an efficient partition design reduces the performance impact of table redistribution. Only the child tables of a partitioned table are set to a random distribution policy. The read/write lock for redistribution applies to only one child table at a time.
Redistributing Indexed Tables
Because the gpexpand utility must re-index each indexed table after redistribution, a high level of indexing has a large performance impact. Systems with intensive indexing have significantly slower rates of table redistribution.
Preparing and Adding Nodes
To prepare new system nodes for expansion, install the Greenplum Database software binaries, exchange the required SSH keys, and run performance tests. Pivotal recommends running performance tests on first the new nodes and then all nodes. Run the tests on all nodes with the system offline so user activity does not distort results.
Generally, Pivotal recommends running performance tests when an administrator modifies node networking or other special conditions in the system. For example, if you will run the expanded system on two network clusters, run tests on each cluster.
This topic describes how to run Greenplum administrative utilities to verify your new nodes are ready for integration into the existing Greenplum system.
Adding New Nodes to the Trusted Host Environment
New nodes must exchange SSH keys with the existing nodes to enable Greenplum administrative utilities to connect to all segments without a password prompt. Pivotal recommends performing the key exchange process twice.
First perform the process as root, for administration convenience, and then as the user gpadmin, for management utilities. Perform the following tasks in order:
To exchange SSH keys as root
- Create a host file with the existing host names in your array and a
separate host file with the new expansion host names. For existing hosts, you can use
the same host file used to set up SSH keys in the system. In the files, list all hosts
(master, backup master, and segment hosts) with one name per line and no extra lines
or spaces. Exchange SSH keys using the configured host names for a given host if you
use a multi-NIC configuration. In this example, mdw is configured
with a single NIC, and sdw1, sdw2, and
sdw3 are configured with 4
mdw sdw1-1 sdw1-2 sdw1-3 sdw1-4 sdw2-1 sdw2-2 sdw2-3 sdw2-4 sdw3-1 sdw3-2 sdw3-3 sdw3-4
- Log in as root on the master host, and source the
greenplum_path.sh file from your Greenplum
$ su - # source /usr/local/greenplum-db/greenplum_path.sh
- Run the gpssh-exkeys utility referencing the host
list files. For
# gpssh-exkeys -f /home/gpadmin/existing_hosts_file -x /home/gpadmin/new_hosts_file
- gpssh-exkeys checks the remote hosts and performs the key exchange
between all hosts. Enter the root user password when prompted. For
***Enter password for root@hostname: <root_password>
To create the gpadmin user
- Use gpssh to create the gpadmin user
on all the new segment hosts (if it does not exist already). Use the list of new hosts
you created for the key exchange. For
# gpssh -f new_hosts_file '/usr/sbin/useradd gpadmin -d /home/gpadmin -s /bin/bash'
- Set a password for the new gpadmin user. On Linux,
you can do this on all segment hosts simultaneously using gpssh. For
# gpssh -f new_hosts_file 'echo gpadmin_password | passwd gpadmin --stdin'
- Verify the gpadmin user has been created by looking
for its home
# gpssh -f new_hosts_file ls -l /home
To exchange SSH keys as the gpadmin user
- Log in as gpadmin and run the
gpssh-exkeys utility referencing the host list files. For
# gpssh-exkeys -e /home/gpadmin/existing_hosts_file -x /home/gpadmin/new_hosts_file
gpssh-exkeys will check the remote hosts and perform
the key exchange between all hosts. Enter the gpadmin user password
when prompted. For
***Enter password for gpadmin@hostname: <gpadmin_password>
Verifying OS Settings
Use the gpcheck utility to verify all new hosts in your array have the correct OS settings to run Greenplum Database software.
To run gpcheck
- Log in on the master host as the user who will run your Greenplum
Database system (for example,
$ su - gpadmin
- Run the gpcheck utility using your host file for new
hosts. For example:
$ gpcheck -f new_hosts_file
Validating Disk I/O and Memory Bandwidth
Use the gpcheckperf utility to test disk I/O and memory bandwidth.
To run gpcheckperf
- Run the gpcheckperf utility using the host file for
new hosts. Use the -d option to specify the file systems you want to
test on each host. You must have write access to these directories. For
$ gpcheckperf -f new_hosts_file -d /data1 -d /data2 -v
- The utility may take a long time to perform the tests because it is copying very large files between the hosts. When it is finished, you will see the summary results for the Disk Write, Disk Read, and Stream tests.
For a network divided into subnets, repeat this procedure with a separate host file for each subnet.
Integrating New Hardware into the System
Before initializing the system with the new segments, shut down the system with gpstop to prevent user activity from skewing performance test results. Then, repeat the performance tests using host files that include all nodes, existing and new:
Initializing New Segments
Use the gpexpand utility to initialize the new segments, create the expansion schema, and set a system-wide random distribution policy for the database. The utility performs these tasks by default the first time you run it with a valid input file on a Greenplum Database master. Subsequently, it will detect if an expansion schema has been created and, if so, perform table redistribution.
Creating an Input File for System Expansion
To begin expansion, gpexpand requires an input file containing information about the new segments and hosts. If you run gpexpand without specifying an input file, the utility displays an interactive interview that collects the required information and automatically creates an input file.
If you create the input file using the interactive interview, you may specify a file with a list of expansion hosts in the interview prompt. If your platform or command shell limits the length of the host list, specifying the hosts with -f may be mandatory.
Creating an input file in Interactive Mode
Before you run gpexpand to create an input file in interactive mode, ensure you know:
- The number of new hosts (or a hosts file)
- The new hostnames (or a hosts file)
- The mirroring strategy used in existing hosts, if any
- The number of segments to add per host, if any
The utility automatically generates an input file based on this information, dbid, content ID, and data directory values stored in gp_segment_configuration, and saves the file in the current directory.
To create an input file in interactive mode
- Log in on the master host as the user who will run your Greenplum Database system; for example, gpadmin.
- Run gpexpand. The utility displays messages about
how to prepare for an expansion operation, and it prompts you to quit or
Optionally, specify a hosts file using -f. For example:
$ gpexpand -f /home/gpadmin/new_hosts_file
- At the prompt, select Y to continue.
- Unless you specified a hosts file using -f, you
will be prompted to enter hostnames. Enter a comma separated list of the hostnames
of the new expansion hosts. Do not include interface hostnames. For
> sdw4, sdw5, sdw6, sdw7
To add segments to existing hosts only, enter a blank line at this prompt. Do not specify localhost or any existing host name.
- Enter the mirroring strategy used in your system, if any. Options
are spread|grouped|none. The default setting is
Ensure you have enough hosts for the selected grouping strategy. For more information about mirroring, see Planning Mirror Segments.
- Enter the number of new primary segments to add, if any. By default, new hosts are initialized with the same number of primary segments as existing hosts. Increase segments per host by entering a number greater than zero. The number you enter will be the number of additional segments initialized on all hosts. For example, if existing hosts currently have two segments each, entering a value of 2 initializes two more segments on existing hosts, and four segments on new hosts.
- If you are adding new primary segments, enter the new primary data
directory root for the new segments. Do not specify the actual data directory name,
which is created automatically by gpexpand based on the existing
data directory names.
For example, if your existing data directories are as follows:
then enter the following (one at each prompt) to specify the data directories for two new primary segments:
When the initialization runs, the utility creates the new directories gp2 and gp3 under /gpdata/primary.
- If you are adding new mirror segments, enter the new mirror data
directory root for the new segments. Do not specify the data directory name; it is
created automatically by gpexpand based on the existing data
For example, if your existing data directories are as follows:
enter the following (one at each prompt) to specify the data directories for two new mirror segments:
When the initialization runs, the utility will create the new directories gp2 and gp3 under /gpdata/mirror.
These primary and mirror root directories for new segments must exist on the hosts, and the user running gpexpand must have permissions to create directories in them.
After you have entered all required information, the utility generates an input file and saves it in the current directory. For example:
Expansion Input File Format
Pivotal recommends using the interactive interview process to create your own input file unless your expansion scenario has atypical needs.
The format for expansion input files is:
sdw5:sdw5-1:50011:/gpdata/primary/gp9:11:9:p:53011 sdw5:sdw5-2:50012:/gpdata/primary/gp10:12:10:p:53011 sdw5:sdw5-2:60011:/gpdata/mirror/gp9:13:9:m:63011 sdw5:sdw5-1:60012:/gpdata/mirror/gp10:14:10:m:63011
For each new segment, this format of expansion input file requires the following:
|hostname||Hostname||Hostname for the segment host.|
|port||An available port number||Database listener port for the segment, incremented on the existing segment port base number.|
|fselocation||Directory name||The data directory (filespace) location for a segment as per the pg_filespace_entry system catalog.|
|dbid||Integer. Must not conflict with existing dbid values.||Database ID for the segment. The values you enter should be incremented sequentially from existing dbid values shown in the system catalog gp_segment_configuration. For example, to add four nodes to an existing ten-segment array with dbid values of 1-10, list new dbid values of 11, 12, 13 and 14.|
|content||Integer. Must not conflict with existing content values.||The content ID of the segment. A primary segment and its mirror should have the same content ID, incremented sequentially from existing values. For more information, see content in the reference for gp_segment_configuration.|
|preferred_role||p | m||Determines whether this segment is a primary or mirror. Specify pfor primary and mfor mirror.|
|replication_port||An available port number||File replication port for the segment, incremented on the existing segment replication_port base number.|
Running gpexpand to Initialize New Segments
After you have created an input file, run gpexpand to initialize new segments. The utility automatically stops Greenplum Database segment initialization and restarts the system when the process finishes.
To run gpexpand with an input file
- Log in on the master host as the user who will run your Greenplum Database system; for example, gpadmin.
- Run the gpexpand utility, specifying the input file
with -i. Optionally, use -D to specify the database
in which to create the expansion schema. For
$ gpexpand -i input_file -D database1
The utility detects if an expansion schema exists for the Greenplum Database system. If a schema exists, remove it with gpexpand -c before you start a new expansion operation. See Removing the Expansion Schema.
When the new segments are initialized and the expansion schema is created, the utility prints a success message and exits.
When the initialization process completes, you can connect to Greenplum Database and view the expansion schema. The schema resides in the database you specified with -D or in the database specified by the PGDATABASE environment variable. For more information, see About the Expansion Schema.
Rolling Back an Failed Expansion Setup
You can roll back an expansion setup operation only if the operation fails. To roll back a failed expansion setup, use the following command, specifying the database that contains the expansion schema:
gpexpand --rollback -D database_name
After creating an expansion schema, you can bring Greenplum Database back online and redistribute tables across the entire array with gpexpand. Target low-use hours when the utility's CPU usage and table locks have minimal impact on operations. Rank tables to redistribute the largest or most critical tables in preferential order.
While table redistribution is underway:
- Any new tables or partitions created are distributed across all segments exactly as they would be under normal operating conditions.
- Queries can access all segments, even if the relevant data is not yet redistributed to tables on the new segments.
- The table or partition being redistributed is locked and unavailable for read or write operations. When its redistribution completes, normal operations resume.
Ranking Tables for Redistribution
For large systems, Pivotal recommends controlling table redistribution order. Adjust tables' rank values in the expansion schema to prioritize heavily-used tables and minimize performance impact. Available free disk space can affect table ranking; see Managing Redistribution in Large-Scale Greenplum Systems.
To rank tables for redistribution by updating rank values in gpexpand.status_detail, connect to Greenplum Database using psql or another supported client. Update gpexpand.status_detail with commands such as:
=> UPDATE gpexpand.status_detail SET rank= 10; => UPDATE gpexpand.status_detail SET rank=1 WHERE fq_name = 'public.lineitem'; => UPDATE gpexpand.status_detail SET rank=2 WHERE fq_name = 'public.orders';
These commands lower the priority of all tables to 10 and then assign a rank of 1 to lineitem and a rank of 2 to orders. When table redistribution begins, lineitem is redistributed first, followed by orders and all other tables in gpexpand.status_detail. To exclude a table from redistribution, remove the table from gpexpand.status_detail.
Redistributing Tables Using gpexpand
To redistribute tables with gpexpand
- Log in on the master host as the user who will run your Greenplum Database system; for example, gpadmin.
- Run the gpexpand utility. You can use the
-d or -e option to define the expansion session
time period. For example, to run the utility for up to 60 consecutive
$ gpexpand -d 60:00:00
The utility redistributes tables until the last table in the schema completes or it reaches the specified duration or end time. gpexpand updates the status and time in gpexpand.status when a session starts and finishes.
Monitoring Table Redistribution
You can query the expansion schema during the table redistribution process. The view gpexpand.expansion_progress provides a current progress summary, including the estimated rate of table redistribution and estimated time to completion. You can query the table gpexpand.status_detail for per-table status information.
Viewing Expansion Status
After the first table completes redistribution, gpexpand.expansion_progress calculates its estimates and refreshes them based on all tables' redistribution rates. Calculations restart each time you start a table redistribution session with gpexpand. To monitor progress, connect to Greenplum Database using psql or another supported client; query gpexpand.expansion_progress with a command like the following:
=# select * from gpexpand.expansion_progress; name | value ------------------------------+----------------------- Bytes Left | 5534842880 Bytes Done | 142475264 Estimated Expansion Rate | 680.75667095996092 MB/s Estimated Time to Completion | 00:01:01.008047 Tables Expanded | 4 Tables Left | 4 (6 rows)
Viewing Table Status
The table gpexpand.status_detail stores status, time of last update, and more facts about each table in the schema. To see a table's status, connect to Greenplum Database using psql or another supported client and query gpexpand.status_detail:
=> SELECT status, expansion_started, source_bytes FROM gpexpand.status_detail WHERE fq_name = 'public.sales'; status | expansion_started | source_bytes -----------+----------------------------+-------------- COMPLETED | 2009-02-20 10:54:10.043869 | 4929748992 (1 row)
Removing the Expansion Schema
You can safely remove the expansion schema after the expansion operation is complete and verified. To run another expansion operation on a Greenplum system, first remove the existing expansion schema.
To remove the expansion schema
- Log in on the master host as the user who will be running your Greenplum Database system (for example, gpadmin).
- Run the gpexpand utility with the -c
option. For example:
$ gpexpand -c $Note: Some systems require two returns.