One-time HDFS Protocol Installation
A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 4.x documentation.
One-time HDFS Protocol Installation
- Install Java 1.6 or later on all Greenplum Database hosts: master, segment, and standby master.
- Install a supported Hadoop distribution on all hosts. The distribution
must be the same on all hosts. For Hadoop installation information, see the Hadoop
distribution documentation.Greenplum Database supports the following Hadoop distributions:
Table 1. Hadoop Distributions Hadoop Distribution Version gp_hadoop_ target_version Pivotal HD3 Pivotal HD 3.0, 3.0.1 gphd-3.0 Pivotal HD 2.0, 2.1 Pivotal HD 1.01
gphd-2.0 Greenplum HD3 Greenplum HD 1.2 gphd-1.2 Greenplum HD 1.1 gphd-1.1 (default) Cloudera CDH 5.2, 5.3, 5.4.x - 5.8.x cdh5 CDH 5.0, 5.1 cdh4.1 Hortonworks Data Platform HDP 2.x hdp2 MapR2 MapR 4.x, MapR 5.x gpmr-1.2 Apache Hadoop 2.x hadoop2 Note:For the latest information regarding supported Hadoop distributions, see the Greenplum Database Release Notes for your release.1. Pivotal HD 1.0 is a distribution of Hadoop 2.0.
2. MapR requires the MapR client software.
3. Support for these Hadoop distributions have been deprecated and will be removed in a future release: Pivotal HD and Greenplum HD.
- After installation, ensure that the Greenplum system user (gpadmin) has read and execute access to the Hadoop libraries or to the Greenplum MR client.
- Set the following environment variables on all segments:
- JAVA_HOME – the Java home directory
- HADOOP_HOME – the Hadoop home directory
export JAVA_HOME=/usr/java/default export HADOOP_HOME=/usr/lib/gphd
The variables must be set in the ~gpadmin/.bashrc or the ~gpadmin/.bash_profile file so that the gpadmin user shell environment can locate the Java home and Hadoop home.
- Set the following Greenplum Database server configuration parameters
and restart Greenplum Database.
Table 2. Server Configuration Parameters for Hadoop Targets Configuration Parameter Description Default Value Set Classifications gp_hadoop_target_version The Hadoop target. Choose one of the following. gphd-1.1
gphd-1.2
gphd-2.0
gphd-3.0
gpmr-1.2
hadoop2
hdp2
cdh5
cdh4.1
gphd-1.1 master session
reloadgp_hadoop_home When using Pivotal HD, specify the installation directory for Hadoop. For example, the default installation directory is /usr/lib/gphd. When using Greenplum HD 1.2 or earlier, specify the same value as the HADOOP_HOME environment variable.
NULL master session
reload
gpconfig -c gp_hadoop_target_version -v 'gphd-2.0' gpconfig -c gp_hadoop_home -v '/usr/lib/gphd' gpstop -u
For information about the Greenplum Database utilities gpconfig and gpstop, see the Greenplum Database Utility Guide. - If needed, ensure that the CLASSPATH environment
variable generated by the $GPHOME/lib/hadoop/hadoop_env.sh file on
every Greenplum Database host contains the path to JAR files that contain Java
classes that are required for gphdfs.
For example, if gphdfs returns a class not found exception, ensure the JAR file containing the class is on every Greenplum Database host and update the $GPHOME/lib/hadoop/hadoop_env.sh file so that the CLASSPATH environment variable created by file contains the JAR file.