Amazon EC2 Configuration (Amazon Web Services)

A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 4.x documentation.

Amazon EC2 Configuration (Amazon Web Services)

You can install and configure Greenplum Database on virtual servers provided by the Amazon Elastic Compute Cloud (Amazon EC2) web service. Amazon EC2 is a service provided by Amazon Web Services (AWS). The following overview information describes how to install Greenplum Database in an Amazon EC2 environment.

About Amazon EC2

You can use Amazon EC2 to launch as many virtual servers as you need, configure security and networking, and manage storage. An EC2 instance is a virtual server in the AWS cloud virtual computing environment.

EC2 instances are manged by AWS. AWS isolates your EC2 instances from other users in a virtual private cloud (VPC) and lets you control access to the instances. You can configure instance features such as operating system, network connectivity (network ports and protocols, IP address access), access to the to the Internet, and size and type of disk storage.

When you launch an instance, you use a preconfigured template for your instance, known as an Amazon Machine Image (AMI). The AMI packages the bits you need for your server (including the operating system and additional software). You can use images supplied by Amazon or use customized images. You launch instances in an Availability Zone of an AWS region. An Availability Zone is distinct location within a region that are engineered to be insulated from failures in other Availability Zones.

For information about Amazon EC2, see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html

Launching EC2 Instances with the EC2 Console

You can launch instances, configure, start, stop, and terminate (delete) virtual servers, with the Amazon EC2 Console. When you launch instances, you select these features.

Amazon Machine Image (AMI)
An AMI is a template that contains the software configuration (operating system, application server, and applications).
For Greenplum Database - Select an AMI that runs a supported operating system. See the Greenplum Database Release Notes for the release that you are installing.
Note: You create and launch a customized AMI, see About Amazon Machine Image (AMI)
EC2 Instance Type
A predefined set performance characteristics. Instance types comprise varying combinations of CPU, memory, default storage, and networking capacity. You can modify storage options when you add storage.
For Greenplum Database - The instance type must be an EBS-Optimized instance type when using Amazon EBS storage for Greenplum Database. See Configure storage for information about Greenplum Database storage requirements. For information about EBS-Optimized instances, see the Amazon documentation about EBS-Optimized instances.
For sufficient network performance, the instance type must also support EC2 enhanced networking. For information about EC2 enhanced networking, see the Amazon documentation about Enhanced Networking on Linux Instances.
The instances should be in a single VPC and subnet. Instances are always assigned a VPC internal IP address and can be assigned a public IP address for external and Internet access.
The internal IP address is used for Greenplum Database communication between hosts. You can also use the internal IP address to access an instance from another instance within the EC2 VPC. For information about configuring launched instances to communicate with each other, see Working with EC2 Instances.
A public IP address for the instance and an Internet gateway configured for the EC2 VPC are required for accessing the instance from an external source and for the instance to access the Internet. Internet access is required when installing Linux packages. When you launch a set of instances, you can enable or disable the automatic assignment of public IP addresses when the instances are started.
If automatic assignment of public IP addresses is enabled, instances are always assigned a public IP address when the instance starts. If automatic assignment of public IP addresses is disabled, you can associate a public IP address with the EC2 elastic IP, and temporarily associate public IP addresses to an instance to connect and configure the instance.
To control whether a public IP is assigned when launching an instance, see the Amazon documentation about Subnet Public IP Addressing.
EC2 Instance Details
Information about starting, running, and stopping EC2 instances, such as such as number of instances of the same AMI, network information, and EC2 VPC and subnet membership.
Configure storage
Adjust and add storage. For example, you can change the size of root volume and add volumes.
For Greenplum Database - Greenplum Database supports either EC2 instance store or Amazon EBS storage in a production environment.
  • EC2 instance store provides temporary block-level storage. This storage is located on disks that are physically attached to the host computer. With instance store, powering off the instance causes data loss. Soft reboots preserve instance store data. However, EC2 instance store can provide higher and more consistent I/O performance.
  • EBS storage provides block level storage volumes with long-term persistence. For EBS storage, the storage must be RAID of Amazon EBS volumes and mounted with the XFS file system for it to be a supported configuration. All other file systems are explicitly not supported by Pivotal.

    There are several classes of EBS. For Greenplum Database, select the EBS volume type gp2 or io1. See the Amazon documentation about Block Device Mapping.

For more information about the Amazon storage types, see Notes.

Create Tag
An optional label that consists of a case-sensitive key-value pair that is used for organizing searching a large number of EC2 resources.
Security Group
A set of firewall rules that control the network traffic for instances.
For external access to an instance with ssh, create a rule that enables ssh for inbound network traffic.

Working with EC2 Instances

After the EC2 instances have started, you connect to and configure the instances. The Instances page of the EC2 Console lists the running instances and network information. If the instance does not have a public IP address, you can create an Elastic IP and associate it with the instance. See About Amazon Elastic IP Addresses.

To access EC2 instances, AWS uses public-key cryptography to secure the login information for your instance. A Linux instance has no password; you use a key pair to log in to your instance securely. You specify the name of the key pair when you launch your instance, then provide the private key when you log in using SSH. See the Amazon documentation about EC2 Key Pairs.

A key pair consists of a public key that AWS stores, and a private key file that you store. Together, they allow you to connect to your instance securely.

This example logs into an into EC2 instance from an external location with the private key file my-test.pem and user ec2-user. The user ec2-user is the default user for some Linux AMI templates. This example assumes that the instance is configured with a public IP address 192.0.2.82 and that the pem file is the private key file that is used to access the instance.

ssh -i my-test.pem ec2-user@192.0.2.82
You can also copy the private key file to your home .ssh directory as the id_rsa file. This example, creates and configures the id_rsa file.
cp my-test.pem ~/.ssh/id_rsa
chmod 400 .ssh/id_rsa
You can also copy the id_rsa to your EC2 instances. This scp command copies the file to the .ssh directory of the ec2-user.
scp ~/.ssh/id_rsa ec2-user@192.0.2.86:~/.ssh/id_rsa
Note: gpssh-extkey is not used with Greenplum Database hosts that are EC2 instances. You must copy the private key file to the .ssh directory in the user home directory for each instance.

This example logs into an into EC2 instance using the id_rsa file.

ssh ec2-user@192.0.2.82

After the key file is installed on all Greenplum Database hosts you can use Greenplum Database utilities such as gpseginstall, gpssh, and gpscp that access multiple Greenplum Database hosts.

Before installing Greenplum Database, you configure the EC2 instances as you would a local host server machines. Configure the host operating system, configure host network information (for example, update the /etc/hosts file), set operating system parameters, and install operating system packages. For information about how to prepare your operating system environment for Greenplum Database, see Configuring Your Systems and Installing Greenplum.

These example commands use yum to install the Linux packages ed, unzip, and vim.
sudo yum install -y ed
sudo yum install -y unzip
sudo yum install -y vim
This example uploads Greenplum Database install file to an EC2 instance to the ec2-user home directory.
scp greenplum-db-4.3.7.2-build-2-RHEL5-x86_64.zip ec2-user@192.0.2.82:~/.
These example commands log into the instance and run the Greenplum Database install file that is in ec2-user home directory as the as ec2-user.
ssh ec2-user@192.0.2.82
unzip greenplum-db-4.3.7.2-build-2-RHEL5-x86_64.bin
./greenplum-db-4.3.7.2-build-2-RHEL5-x86_64.bin
This example command runs the gpseginstall utility that specifies the user as ec2-user. This example assumes the file my-hosts contains the instances that are used as Greenplum Database segment hosts and that the instances have been prepared for Greenplum Database.
gpseginstall -u ec2-user -f my-hosts
Note: During the Greenplum Database installation process, you might see ssh messages to confirm the authenticity of host connections. Enter yes to confirm the authenticity.

About Amazon Machine Image (AMI)

An Amazon Machine Image (AMI) is a template that contains a software configuration (for example, an operating system, an application server, and applications). From an AMI, you launch an instance, which is a copy of the AMI running as a virtual server in the cloud. You can launch multiple instances of an AMI.

After you launch an instance, it acts like a traditional host, and you can interact with it as you would any computer. You have complete control of your instances; you can use sudo to run commands that require root privileges.

You can create a customized Amazon EBS-backed Linux AMI from an instance that you've launched from an existing Amazon EBS-backed Linux AMI. After you've customized the instance to suit your needs, create and register a new AMI, which you can use to launch new instances with these customizations.

For information about AMI, see the Amazon documentation about AMIs.

About Amazon Elastic IP Addresses

An EC2 Elastic IP address is a public IP address that you can allocate (create) for your account. You can associate it to and disassociate it from instances as you require, and it's allocated to your account until you choose to release it.

Your default VPC is configured with an Internet gateway. When you allocate an EC2 Elastic IP address, AWS configures the VPC to allow internet access to the IP address using the gateway.

To enable an instance in your VPC to communicate with the Internet, it must have a public IP address or an EC2 Elastic IP address that's associated with a private IP address on your instance.

To ensure that your instances can communicate with the Internet, you must also attach an Internet gateway to your EC2 VPC. For information about VPC Internet Gateways, see the Amazon documentation about Internet gateways.

For information about EC2 Elastic IP addresses and how to use them, see see the Amazon documentation about Elastic IP Addresses.

Notes

  • The Greenplum Database utility gpssh-extkey is not used with Greenplum Database hosts that are EC2 instances. You must copy the private key file to the .ssh directory in the user home directory for each instance.
  • When you use Amazon EBS storage for Greenplum Database storage, the storage should be RAID of Amazon EBS volumes and mounted with the XFS file system for it to be a supported configuration.

    For information about EBS storage, see the Amazon documentation about Amazon EBS. Also, see the Amazon EC2 documentation for configuring the Amazon EBS volumes and managing storage and file systems used with EC2 instances.

  • For an EC2 instance with instance store, the virtual devices for instance store volumes are ephemeralN (n is between 0 and 23). On an instance running CentOS the instance store block device names appear as /dev/xvdletter.

    Two examples of EC2 instance types that were configured with instance store and that showed acceptable performance are the d2.8xlarge instance type configured with four raid0 volumes of 6 disks each, and the i2.8xlarge instance type configured with two raid0 volumes of 4 disks.

    For information about EC2 instance store, see the Amazon documentation about EC2 Instance Store.

  • These are default ports in a Greenplum Database environment. These ports need to be open in the security group to allow access from a source external to a VPC.
    Port Used by this application
    22 ssh - connect to host with ssh
    5432 Greenplum Database (master)
    28080 Greenplum Command Center
  • For a non-default VPC you can configure the VPC with an internet gateway for the VPC and allocate Elastic IP address for the VPC. AWS will automatically configure the Elastic IP for internet access. For information about EC2 internet gateways, see the Amazon documentation about Internet Gateways.
  • A placement group is a logical grouping of instances within a single Availability Zone. Using placement groups enables applications to participate in a low-latency, 10 Gbps network. Placement groups are recommended for applications that benefit from low network latency, high network throughput, or both.

    Placement Groups provide the ability for EC2 instances to separated from other instances. However, configuring instances are in different placement groups can improve performance but might create a configuration where an instance in a placement group cannot be replaced.

    See the Amazon documentation about Placement Groups.

  • Amazon EC2 provides enhanced networking capabilities using single root I/O virtualization (SR-IOV) on some instance types. Enabling enhanced networking on your instance results in higher performance (packets per second), lower latency, and lower jitter.

    To enable enhanced networking on your Red Hat and CentOS RHEL/CentOS instance, you must ensure that the kernel has the ixgbevf module version 2.14.2 or higher is installed and that the sriovNetSupport attribute is set.

    For information about EC2 enhanced networking, see the Amazon documentation about Enhanced Networking on Linux Instances.

References

References to AWS and EC2 features and related information.