About gphdfs JVM Memory

About gphdfs JVM Memory

When Greenplum Database accesses external table data from an HDFS location with gphdfs protocol, each Greenplum Database segment on a host system starts a JVM for use by the protocol. The default JVM heapsize is 1GB and should be enough for most workloads

If the gphdfs JVM runs out of memory, the issue might be related to the density of tuples inside the Hadoop HDFS block assigned to the gphdfs segment worker. A higher density of tuples per block requires more gphdfs memory. HDFS block size is usually 128MB, 256MB, or 512MB depending on the Hadoop cluster configuration.

You can increase the JVM heapsize by changing GP_JAVA_OPT variable in the file $GPHOME/lib/hadoop/hadoop_env.sh. In this example line, the option -Xmx1000m specifies that the JVM consumes 1GB of virtual memory.

export GP_JAVA_OPT='-Xmx1000m -XX:+DisplayVMOutputToStderr'

The $GPHOME/lib/hadoop/hadoop_env.sh must be updated for every segment instance in the Greenplum Database system.

Important: Before increasing the gphdfs JVM memory, ensure that you have sufficient memory on the host. For example, 8 primary segments consume 8GB of virtual memory for the gphdfs JVM when using default. Increasing the Java -Xmx value to 2GB results in 16GB allocated in that environment of 8 segments per host.