Enabling High Availability Features

A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 4.x documentation.

Enabling High Availability Features

This chapter describes the high-availability features of Greenplum Database and the process to recover a segment or master instance.

For information about the Greenplum Database utilities that are used to enable high availability, see the Greenplum Database Utility Guide.

Overview of High Availability in Greenplum Database

Greenplum Database includes features to ensure maximum uptime and high availability of your system. The following topics summarize these features:

Overview of Segment Mirroring

When Greenplum Database High Availability is enabled, there are two types of segments: primary and mirror. Each primary segment has one corresponding mirror segment. A primary segment receives requests from the master to make changes to the segment's database and then replicates those changes to the corresponding mirror. If a primary segment becomes unavailable, database queries fail over to the mirror segment.

Segment mirroring employs a physical file replication scheme—data file I/O at the primary is replicated to the secondary so that the mirror's files are identical to the primary's files. Data in Greenplum Database are represented with tuples, which are packed into blocks. Database tables are stored in disk files consisting of one or more blocks. A change to a tuple changes the block it is saved in, which is then written to disk on the primary and copied over the network to the mirror. The mirror updates the corresponding block in its copy of the file.

For heap tables, blocks are saved in an in-memory cache until they are evicted to make room for newly changed blocks. This allows the system to read or update a block in memory multiple times without performing expensive disk I/O. When the block is evicted from the cache, it is written to disk and replicated to the secondary. While the block is held in cache, the primary and mirror have different images of the block. However, the databases are still consistent because the transaction log has been replicated. If a mirror takes over for a failed primary, the transactions in its log are applied to the database tables.

Other database objects—for example filespaces, which are tablespaces internally represented with directories—also use file replication to perform various file operations in a synchronous way.

Append-optimized tables do not use the in-memory caching mechanism. Changes made to append-optimized table blocks are replicated to the mirror immediately. Typically, file write operations are asynchronous, while opening, creating, and synchronizing files are "sync-replicated," which means the primary blocks until it receives the acknowledgment from the secondary.

To configure mirroring, your Greenplum Database system must have enough nodes for a primary segment and its mirror to reside on different hosts. Only primary segments are active during database operations.

Figure 1. Segment Data Mirroring in Greenplum Database

If a primary segment fails, the file replication process stops and the mirror segment automatically starts as the active segment instance. The now active mirror’s system state becomes Change Tracking, which means the mirror maintains a system table and change-log of all blocks updated while the primary segment is unavailable. When the failed primary segment is repaired and ready to be brought back online, an administrator initiates a recovery process and the system goes into Resynchronization state. The recovery process applies the logged changes to the repaired primary segment. The system state changes to Synchronized when the recovery process completes.

If the mirror segment fails or becomes inaccessible while the primary is active, the primary's system state changes to Change Tracking, and it tracks changes to be applied to the mirror when it is recovered.

Overview of Master Mirroring

You can deploy a backup or mirror of the master instance on a separate host machine or on the same host machine. A backup master or standby master serves as a warm standby if the primary master becomes nonoperational. You create a standby master from the primary master while the primary is online.

The primary master continues to provide service to users while a transactional snapshot of the primary master instance is taken. While the transactional snapshot is taken and deployed on the standby master, changes to the primary master are also recorded. After the snapshot is deployed on the standby master, the updates are deployed to synchronize the standby master with the primary master.

Once the primary master and standby master are synchronized, the standby master is kept up to date by the walsender and walreceiver a replication processes. The walreceiver is a standby master process. The walsender process is a primary master process. The two processes use WAL based streaming replication to keep the primary and standby masters synchronized.

Since the master does not house user data, only system catalog tables are synchronized between the primary and standby masters. When these tables are updated, changes are automatically copied to the standby master to keep it current with the primary.

Figure 2. Master Mirroring in Greenplum Database

If the primary master fails, the replication process stops, and an administrator can activate the standby master. Upon activation of the standby master, the replicated logs reconstruct the state of the primary master at the time of the last successfully committed transaction. The activated standby then functions as the Greenplum Database master, accepting connections on the port specified when standby master was initalized.

Overview of Fault Detection and Recovery

The Greenplum Database server (postgres) subprocess named ftsprobe handles fault detection. ftsprobe monitors the Greenplum array; it connects to and scans all segments and database processes at intervals that you can configure.

If ftsprobe cannot connect to a segment, it marks the segment as “down” in the Greenplum Database system catalog. The segment remains nonoperational until an administrator initiates the recovery process.

With mirroring enabled, Greenplum Database automatically fails over to a mirror copy if a primary copy becomes unavailable. The system is operational if a segment instance or host fails provided all data is available on the remaining active segments.

To recover failed segments, a Greenplum administrator runs the gprecoverseg recovery utility. This utility locates the failed segments, verifies they are valid, and compares the transactional state with the currently active segment to determine changes made while the segment was offline. gprecoverseg synchronizes the changed database files with the active segment and brings the segment back online. Administrators perform the recovery while Greenplum Database is up and running.

With mirroring disabled, the system automatically shuts down if a segment instance fails. Administrators manually recover all failed segments before operations resume.

Enabling Mirroring in Greenplum Database

You can configure your Greenplum Database system with mirroring at setup time using gpinitsystem or enable mirroring later using gpaddmirrors and gpinitstandby. This topic assumes you are adding mirrors to an existing system that was initialized without mirrors.

You can enable the following types of mirroring:

Enabling Segment Mirroring

Mirror segments allow database queries to fail over to a backup segment if the primary segment is unavailable. To configure mirroring, your Greenplum Database system must have enough nodes to allow the mirror segment to reside on a different host than its primary. By default, mirrors are configured on the same array of hosts as the primary segments. You may choose a completely different set of hosts for your mirror segments so they do not share machines with any of your primary segments.

To add segment mirrors to an existing system (same hosts as primaries)

  1. Allocate the data storage area for mirror data on all segment hosts. The data storage area must be different from your primary segments’ file system location.
  2. Use gpssh-exkeys to ensure that the segment hosts can SSH and SCP to each other without a password prompt.
  3. Run the gpaddmirrors utility to enable mirroring in your Greenplum Database system. For example, to add 10000 to your primary segment port numbers to calculate the mirror segment port numbers:
    $ gpaddmirrors -p 10000

    Where -p specifies the number to add to your primary segment port numbers.

To add segment mirrors to an existing system (different hosts from primaries)

  1. Ensure the Greenplum Database software is installed on all hosts. See the Greenplum Database Installation Guide for detailed installation instructions.
  2. Allocate the data storage area for mirror data on all segment hosts.
  3. Use gpssh-exkeys to ensure the segment hosts can SSH and SCP to each other without a password prompt.
  4. Create a configuration file that lists the host names, ports, and data directories on which to create mirrors. To create a sample configuration file to use as a starting point, run:
    $ gpaddmirrors -o filename

    The format of the mirror configuration file is:

    filespaceOrder=[filespace1_fsname[:filespace2_fsname:...] 
    mirror[content]=content:address:port:mir_replication_port:
    pri_replication_port:fselocation[:fselocation:...]

    For example, a configuration for two segment hosts and two segments per host, with no additional filespaces configured besides the default pg_system filespace):

    filespaceOrder=
    mirror0=0:sdw1:sdw1-1:52001:53001:54001:/gpdata/mir1/gp0
    mirror1=1:sdw1:sdw1-2:52002:53002:54002:/gpdata/mir1/gp1
    mirror2=2:sdw2:sdw2-1:52001:53001:54001:/gpdata/mir1/gp2
    mirror3=3:sdw2:sdw2-2:52002:53002:54002:/gpdata/mir1/gp3
    
  5. Run the gpaddmirrors utility to enable mirroring in your Greenplum Database system:
    $ gpaddmirrors -i mirror_config_file

    Where -i names the mirror configuration file you just created.

Enabling Master Mirroring

You can configure a new Greenplum Database system with a standby master using gpinitsystem or enable it later using gpinitstandby. This topic assumes you are adding a standby master to an existing system that was initialized without one.

To add a standby master to an existing system

  1. Ensure the standby master host is installed and configured: gpadmin system user created, Greenplum Database binaries installed, environment variables set, SSH keys exchanged, and data directory created. See the Greenplum Database Installation Guide for detailed installation instructions.
  2. Run the gpinitstandby utility on the currently active primary master host to add a standby master host to your Greenplum Database system. For example:
    $ gpinitstandby -s smdw

    Where -s specifies the standby master host name.

  3. To switch operations to a standby master, see Recovering a Failed Master.

You can display the information in the Greenplum Database system view pg_stat_replication. The view lists information about the walsender process that is used for Greenplum Database master mirroring. For example, this command displays the process ID and state of the walsender process:

$ psql dbname -c 'SELECT procpid, state FROM pg_stat_replication;'

For information about the pg_stat_replication system view, see the Greenplum Database Reference Guide.

Detecting a Failed Segment

With mirroring enabled, Greenplum Database automatically fails over to a mirror segment when a primary segment goes down. Provided one segment instance is online per portion of data, users may not realize a segment is down. If a transaction is in progress when a fault occurs, the in-progress transaction rolls back and restarts automatically on the reconfigured set of segments.

If the entire Greenplum Database system becomes nonoperational due to a segment failure (for example, if mirroring is not enabled or not enough segments are online to access all user data), users will see errors when trying to connect to a database. The errors returned to the client program may indicate the failure. For example:

ERROR: All segment databases are unavailable

Enabling Alerts and Notifications

To receive notifications of system events such as segment failures, enable email and/or SNMP alerts. See Enabling System Alerts and Notifications.

Checking for Failed Segments

With mirroring enabled, you may have failed segments in the system without interruption of service or any indication that a failure has occurred. You can verify the status of your system using the gpstate utility. gpstate provides the status of each individual component of a Greenplum Database system, including primary segments, mirror segments, master, and standby master.

To check for failed segments

  1. On the master, run the gpstate utility with the -e option to show segments with error conditions:
    $ gpstate -e

    Segments in Change Tracking mode indicate the corresponding mirror segment is down. When a segment is not in its preferred role, the segment does not operate in the role to which it was assigned at system initialization. This means the system is in a potentially unbalanced state, as some segment hosts may have more active segments than is optimal for top system performance.

    See Recovering From Segment Failures for instructions to fix this situation.

  2. To get detailed information about a failed segment, check the gp_segment_configuration catalog table. For example:
    $ psql -c "SELECT * FROM gp_segment_configuration WHERE status='d';"
  3. For failed segment instances, note the host, port, preferred role, and data directory. This information will help determine the host and segment instances to troubleshoot.
  4. To show information about mirror segment instances, run:
    $ gpstate -m

Checking the Log Files

Log files can provide information to help determine an error’s cause. The master and segment instances each have their own log file in pg_log of the data directory. The master log file contains the most information and you should always check it first.

Use the gplogfilter utility to check the Greenplum Database log files for additional information. To check the segment log files, run gplogfilter on the segment hosts using gpssh.

To check the log files

  1. Use gplogfilter to check the master log file for WARNING, ERROR, FATAL or PANIC log level messages:
    $ gplogfilter -t
  2. Use gpssh to check for WARNING, ERROR, FATAL, or PANIC log level messages on each segment instance. For example:
    $ gpssh -f seg_hosts_file -e 'source 
    /usr/local/greenplum-db/greenplum_path.sh ; gplogfilter -t 
    /data1/primary/*/pg_log/gpdb*.log' > seglog.out
    

Recovering a Failed Segment

If the master cannot connect to a segment instance, it marks that segment as down in the Greenplum Database system catalog. The segment instance remains offline until an administrator takes steps to bring the segment back online. The process for recovering a failed segment instance or host depends on the failure cause and whether or not mirroring is enabled. A segment instance can be unavailable for many reasons:

  • A segment host is unavailable; for example, due to network or hardware failures.
  • A segment instance is not running; for example, there is no postgres database listener process.
  • The data directory of the segment instance is corrupt or missing; for example, data is not accessible, the file system is corrupt, or there is a disk failure.

Figure 3 shows the high-level steps for each of the preceding failure scenarios.

Figure 3. Segment Failure Troubleshooting Matrix

Recovering From Segment Failures

Segment host failures usually cause multiple segment failures: all primary or mirror segments on the host are marked as down and nonoperational. If mirroring is not enabled and a segment goes down, the system automatically becomes nonoperational.

To recover with mirroring enabled

  1. Ensure you can connect to the segment host from the master host. For example:
    $ ping failed_seg_host_address
  2. Troubleshoot the problem that prevents the master host from connecting to the segment host. For example, the host machine may need to be restarted or replaced.
  3. After the host is online and you can connect to it, run the gprecoverseg utility from the master host to reactivate the failed segment instances. For example:
    $ gprecoverseg
  4. The recovery process brings up the failed segments and identifies the changed files that need to be synchronized. The process can take some time; wait for the process to complete. During this process, database write activity is suspended.
  5. After gprecoverseg completes, the system goes into Resynchronizing mode and begins copying the changed files. This process runs in the background while the system is online and accepting database requests.
  6. When the resynchronization process completes, the system state is Synchronized. Run the gpstate utility to verify the status of the resynchronization process:
    $ gpstate -m

When a primary segment goes down, the mirror activates and becomes the primary segment. After running gprecoverseg, the currently active segment remains the primary and the failed segment becomes the mirror. The segment instances are not returned to the preferred role that they were given at system initialization time. This means that the system could be in a potentially unbalanced state if segment hosts have more active segments than is optimal for top system performance. To check for unbalanced segments and rebalance the system, run:

$ gpstate -e

All segments must be online and fully synchronized to rebalance the system. Database sessions remain connected during rebalancing, but queries in progress are canceled and rolled back.

  1. Run gpstate -m to ensure all mirrors are Synchronized.
    $ gpstate -m
  2. If any mirrors are in Resynchronizing mode, wait for them to complete.
  3. Run gprecoverseg with the -r option to return the segments to their preferred roles.
    $ gprecoverseg -r
  4. After rebalancing, run gpstate -e to confirm all segments are in their preferred roles.
    $ gpstate -e

In a double fault, both a primary segment and its mirror are down. This can occur if hardware failures on different segment hosts happen simultaneously. Greenplum Database is unavailable if a double fault occurs. To recover from a double fault:

  1. Restart Greenplum Database:
    $ gpstop -r
  2. After the system restarts, run gprecoverseg:
    $ gprecoverseg
  3. After gprecoverseg completes, use gpstate to check the status of your mirrors:
    $ gpstate -m
  4. If you still have segments in Change Tracking mode, run a full copy recovery:
    $ gprecoverseg -F
  1. Ensure you can connect to the segment host from the master host. For example:
    $ ping failed_seg_host_address
    
  2. Troubleshoot the problem that is preventing the master host from connecting to the segment host. For example, the host machine may need to be restarted.
  3. After the host is online, verify that you can connect to it and restart Greenplum Database. For example:
    $ gpstop -r
  4. Run the gpstate utility to verify that all segment instances are online:
    $ gpstate

If a segment host is not recoverable and you lost one or more segments, recreate your Greenplum Database system from backup files. See Backing Up and Restoring Databases.

When a segment host is not recoverable

If a host is nonoperational, for example, due to hardware failure, recover the segments onto a spare set of hardware resources. If mirroring is enabled, you can recover a segment from its mirror onto an alternate host using gprecoverseg. For example:

$ gprecoverseg -i recover_config_file

Where the format of recover_config_file is:

filespaceOrder=[filespace1_name[:filespace2_name:...]failed_host_address:
port:fselocation [recovery_host_address:port:replication_port:fselocation
[:fselocation:...]]

For example, to recover to a different host than the failed host without additional filespaces configured (besides the default pg_system filespace):

filespaceOrder=
sdw5-2:50002:/gpdata/gpseg2 sdw9-2:50002:53002:/gpdata/gpseg2

The gp_segment_configuration and pg_filespace_entry system catalog tables can help determine your current segment configuration so you can plan your mirror recovery configuration. For example, run the following query:

=# SELECT dbid, content, hostname, address, port, 
   replication_port, fselocation as datadir 
   FROM gp_segment_configuration, pg_filespace_entry 
   WHERE dbid=fsedbid 
   ORDER BY dbid;

The new recovery segment host must be pre-installed with the Greenplum Database software and configured exactly as the existing segment hosts.

Recovering a Failed Master

If the primary master fails, log replication stops. Use gpactivatestandby to activate the standby master. Upon activation of the standby master, Greenplum Database reconstructs the master host state at the time of the last successfully committed transaction.

To activate the standby master

  1. Ensure a standby master host is configured for the system. See Enabling Master Mirroring.
  2. Run the gpactivatestandby utility from the standby master host you are activating. For example:
    $ gpactivatestandby -d /data/master/gpseg-1

    Where -d specifies the data directory of the master host you are activating.

    After you activate the standby, it becomes the active or primary master for your Greenplum Database array.

  3. After the utility finishes, run gpstate to check the status:
    $ gpstate -f

    The newly activated master’s status should be Active. If you configured a new standby host, its status is Passive. If not configured, its status is Not Configured.

  4. After switching to the newly active master host, run ANALYZE on it. For example:
    $ psql dbname -c 'ANALYZE;'
  5. Optional: If you did not specify a new standby host when running the gpactivatestandby utility, use gpinitstandby to configure a new standby master at a later time. Run gpinitstandby on your active master host. For example:
    $ gpinitstandby -s new_standby_master_hostname

Restoring Master Mirroring After a Recovery

After you activate a standby master for recovery, the standby master becomes the primary master. You can continue running that instance as the primary master if it has the same capabilities and dependability as the original master host.

You must initialize a new standby master to continue providing master mirroring unless you have already done so while activating the prior standby master. Run gpinitstandby on the active master host to configure a new standby master.

You may restore the primary and standby master instances on the original hosts. This process swaps the roles of the primary and standby master hosts, and it should be performed only if you strongly prefer to run the master instances on the same hosts they occupied prior to the recovery scenario.

To restore the master and standby instances on original hosts (optional)

  1. Ensure the original master host is in dependable running condition; ensure the cause of the original failure is fixed.
  2. Initialize a standby master on the original master host. For example:
    $ gpinitstandby -s original_master_hostname
  3. Run the gpactivatestandby utility from the original master host (currently a standby master). For example:
    $ gpactivatestandby -d $MASTER_DATA_DIRECTORY

    Where -d specifies the data directory of the host you are activating.

  4. After the utility finishes, run gpstate to check the status:
    $ gpstate -f

    Verify the original primary master status is Active, and the standby master status is Not Configured.

  5. After the original master host runs the primary Greenplum Database master, you can initialize a standby master on the original standby master host. For example:
    $ gpinitstandby -s original_standby_master_hostname

You can display the information in the Greenplum Database system view pg_stat_replication. The view lists information about the walsender process that is used for Greenplum Database master mirroring. For example, this command displays the process ID and state of the walsender process:

$ psql dbname -c 'SELECT procpid, state FROM pg_stat_replication;'