Installing the Greenplum Database Software

Installing the Greenplum Database Software

Describes how to install the Greenplum Database software binaries on all of the hosts that will comprise your Greenplum Database system, how to enable passwordless SSH for the gpadmin user, and how to verify the installation.

Perform the following tasks in order:

  1. Installing Greenplum Database
  2. Enabling Passwordless SSH
  3. Confirm the software installation.
  4. Next Steps

Installing Greenplum Database

You must install Greenplum Database on each host machine of the Greenplum Database system. Pivotal distributes the Greenplum Database software as a downloadable package that you install on each host system with the operating system's package management system. You can download the package from Pivotal Network.

Before you begin installing Greenplum Database, be sure you have completed the steps in Configuring Your Systems to configure each of the master, standby master, and segment host machines for Greenplum Database.

Important: After installing Greenplum Database, you must set Greenplum Database environment variables. See Setting Greenplum Environment Variables.

See Example Ansible Playbook for an example script that shows how you can automate creating the gpadmin user and installing the Greenplum Database.

Follow these instructions to install Greenplum Database.

Important: You will need sudo or root user access to install from a pre-built binary.
  1. Download and copy the Greenplum Database package to the gpadmin user's home directory on the master, standby master, and every segment host machine. The distribution file name has the format greenplum-db-<version>-<platform>.rpm for RHEL and CentOS systems, or greenplum-db-<version>-<platform>.deb for Ubuntu systems, where <platform> is similar to rhel7-x86_64 (Red Hat 7 64-bit).
  2. With sudo (or as root), install the Greenplum Database package on each host machine using your system's package manager software.
    • For RHEL/CentOS systems, execute the yum command:
      $ sudo yum install ./greenplum-db-<version>-<platform>.rpm
    • For Ubuntu systems, execute the apt command:
      $ sudo apt install ./greenplum-db-<version>-<platform>.deb

    The yum or apt command installs software dependencies, copies the Greenplum Database software files into a version-specific directory, /usr/local/greenplum-db-<version>, and creates the symbolic link /usr/local/greenplum-db to the installation directory.

  3. Change the owner and group of the installed files to gpadmin:
    $ sudo chown -R gpadmin:gpadmin /usr/local/greenplum*

Enabling Passwordless SSH

The gpadmin user on each Greenplum host must be able to SSH from any host in the cluster to any other host in the cluster without entering a password or passphrase (called "passwordless SSH"). If you enable passwordless SSH from the master host to every other host in the cluster ("1-n passwordless SSH"), you can use the Greenplum Database gpssh-exkeys command-line utility to enable passwordless SSH from every host to every other host ("n-n passwordless SSH").

  1. Log in to the master host as the gpadmin user.
  2. Source the path file in the Greenplum Database installation directory.
    $ source /usr/local/greenplum-db-<version>/greenplum_path.sh
    Note: Add the above source command to the gpadmin user's .bashrc or other shell startup file so that the Greenplum Database path and environment variables are set whenever you log in as gpadmin.
  3. Use the ssh-copy-id command to add the gpadmin user's public key to the authorized_hosts SSH file on every other host in the cluster.
    $ ssh-copy-id smdw
    $ ssh-copy-id sdw1
    $ ssh-copy-id sdw2
    $ ssh-copy-id sdw3
    . . .
    This enables 1-n passwordless SSH. You will be prompted to enter the gpadmin user's password for each host. If you have the sshpass command on your system, you can use a command like the following to avoid the prompt.
    $ SSHPASS=<password> sshpass -e ssh-copy-id smdw
  4. In the gpadmin home directory, create a file named hostfile_exkeys that has the machine configured host names and host addresses (interface names) for each host in your Greenplum system (master, standby master, and segment hosts). Make sure there are no blank lines or extra spaces. Check the /etc/hosts file on your systems for the correct host names to use for your environment. For example, if you have a master, standby master, and three segment hosts with two unbonded network interfaces per host, your file would look something like this:
    mdw
    mdw-1
    mdw-2
    smdw
    smdw-1
    smdw-2
    sdw1
    sdw1-1
    sdw1-2
    sdw2
    sdw2-1
    sdw2-2
    sdw3
    sdw3-1
    sdw3-2
  5. Run the gpssh-exkeys utility with your hostfile_exkeys file to enable n-n passwordless SSH for the gpadmin user.
    $ gpssh-exkeys -f hostfile_exkeys

Confirming Your Installation

To make sure the Greenplum software was installed and configured correctly, run the following confirmation steps from your Greenplum master host. If necessary, correct any problems before continuing on to the next task.

  1. Log in to the master host as gpadmin:
    $ su - gpadmin
  2. Use the gpssh utility to see if you can log in to all hosts without a password prompt, and to confirm that the Greenplum software was installed on all hosts. Use the hostfile_exkeys file you used to set up passwordless SSH. For example:
    $ gpssh -f hostfile_exkeys -e 'ls -l /usr/local/greenplum-db-<version>'

    If the installation was successful, you should be able to log in to all hosts without a password prompt. All hosts should show that they have the same contents in their installation directories, and that the directories are owned by the gpadmin user.

    If you are prompted for a password, run the following command to redo the ssh key exchange:

    $ gpssh-exkeys -f hostfile_exkeys

About Your Greenplum Database Installation

  • greenplum_path.sh — This file contains the environment variables for Greenplum Database. See Setting Greenplum Environment Variables.
  • bin — This directory contains the Greenplum Database management utilities. This directory also contains the PostgreSQL client and server programs, most of which are also used in Greenplum Database.
  • docs/cli_help — This directory contains help files for Greenplum Database command-line utilities.
  • docs/cli_help/gpconfigs — This directory contains sample gpinitsystem configuration files and host files that can be modified and used when installing and initializing a Greenplum Database system.
  • ext — Bundled programs (such as Python) used by some Greenplum Database utilities.
  • include — The C header files for Greenplum Database.
  • lib — Greenplum Database and PostgreSQL library files.
  • sbin — Supporting/Internal scripts and programs.
  • share — Shared files for Greenplum Database.