Greenplum PL/Container Extension

A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 5.x documentation.

Greenplum PL/Container Extension

This section includes the following information:

Warning: PL/Container is an experimental feature and is not intended for use in a production environment. Experimental features are subject to change without notice in future releases.

PL/Container is compatible with Greenplum Database 5.2.0. PL/Container has not been tested for compatibility with Greenplum Database 5.1.0 or 5.0.0.

About the PL/Container Extension

The Greenplum Database PL/Container extension is an interface that allows Greenplum Database to interact with a Docker container to execute a user-defined function (UDF) in the container. Docker containers ensure the user code cannot access the file system of the source host. Also, containers are started with limited network access and cannot connect back to Greenplum Database or open any other external connections. For information about available UDF languages, see PL/Container Language Docker Images

Generally speaking, a Docker container is a Linux process that runs in a managed way by using Linux kernel features such as cgroups, namespaces and union file systems. A Docker image is the basis of a container. A Docker container is a running instance of a Docker image. When you start a Docker container you specify a Docker image. A Docker image is the collection of root filesystem changes and execution parameters that are used when you run a Docker container on the host system. An image does not have state and never changes. For information about Docker, see the Docker web site https://www.docker.com/.

Greenplum Database starts a container only on the first call to a function in that container. For example, consider a query that selects table data using all available segments, and applies a transformation to the data using a PL/Container function. In this case, Greenplum Database would start the Docker container only once on each segment, and then contact the running container to obtain the results.

After starting a full cycle of a query execution. The executor sends a call to the container. The container might respond with an SPI - SQL query executed by the container to get some data back from the database, returning the result to the query executor. For set-returning functions these steps might be executed many times.

The container shuts down when the connection to it is closed. This occurs when you close the Greenplum Database session that started the container. A container running in standby mode has almost no consumption of CPU resources as it is waiting on the socket. PL/Container memory consumption depends on the amount of data you cache in global dictionaries.

The PL/Container extension is available as an open source module. For information about the module, see the README file in the GitHub repository at https://github.com/greenplum-db/plcontainer.

PL/Container Language Docker Images

Pivotal provides two Docker images for customers, a Python image and an R image. The Docker images are available under pivotaldata organization in Docker Hub (https://hub.docker.com/r/pivotaldata/):

  • plc_python_shared - Docker image with Python 2.7.12 installed.

    The Python Data Science Module is also installed. The module contains a set python libraries related to data science. For information about the module, see Python Data Science Module Package.

  • plc_r_shared - A Docker image with container with R-3.3.3 installed.

    The R Data Science package is also installed. The package contains a set of R libraries related to data science. For information about the module, see R Data Science Library Package.

The Docker container tag represents the PL/Container extension release version (for example, 1.0.0). For example, the full container name for plc_python_shared is similar to pivotaldata/plc_python_shared:1.0.0, version 1.0.0. This is the name that is referred to in the default PL/Container configuration. Also, You can also create custom Docker images and add the image to the PL/Container configuration.

Prerequisites

Ensure your Greenplum Database system meets the following prerequisites:

  • PL/Container is supported on Pivotal Greenplum Database 5.2.x on Red Hat Enterprise Linux (RHEL) 7.x or 6.6+ (or later) and CentOS 7.x or 6.6+ (or later).
  • These are Docker host operating system prerequisites.

    RHEL or CentOS 7.x - Minimum supported Linux OS kernel version is 3.10. RHEL 7.x and CentOS 7.x use this kernel version.

    RHEL or CentOS 6.6+ - Minimum supported Linux OS kernel version 2.6.32-431

    You can check your kernel version with the command uname -r

    Note: The Red Hat provided, maintained, and supported version of Docker is only available on RHEL 7. Red Hat does not recommend running any version of Docker on any RHEL 6 releases. Docker feature developments are tied to RHEL7.x infrastructure components for kernel, devicemapper (thin provisioning, direct lvm), sVirt and systemd.
  • Docker is installed on Greenplum Database hosts (master, primary and all standby hosts)
    • For RHEL or CentOS 7.x - Docker 17.05
    • RHEL or CentOS 6.6+ - Docker 1.7

    See Installing Docker.

  • On each Greenplum Database host the gpadmin user should be part of the docker group for the user to be able to manage Docker images and containers.

Installing the PL/Container Extension

To use PL/Container languages, install PL/Container, install Docker images, and configure PL/Container to use the images.
  1. Ensure the Greenplum Database hosts meet the prerequisites, see Prerequisites.
  2. Install the PL/Container extension, see Installing the PL/Container Extension Package.
  3. Install Docker images and configure PL/Container, see Installing PL/Container Language Docker Images.

Installing the PL/Container Extension Package

Install the PL/Container extension with the Greenplum Database gppkg utility.

  1. Copy the PL/Container extension package to the Greenplum Database master host as the gpadmin user.
  2. Make sure Greenplum Database is up and running. If not, bring it up with this command.
    gpstart -a
  3. Run the package installation command.
    gppkg -i plcontainer-1.0.0-rhel7-x86_64.gppkg
  4. Source the file $GPHOME/greenplum_path.sh.
    source $GPHOME/greenplum_path.sh
  5. Restart Greenplum Database.
    gpstop -ra
  6. Enable PL/Container for specific databases by running
    psql -d your_database -f $GPHOME/share/postgresql/plcontainer/plcontainer_install.sql

    The SQL script registers the language plcontainer in the database creates PL/Container specific UDFs.

  7. Initialize PL/Container configuration on the Greenplum Database hosts by running the plcontainer configure command.
    plcontainer configure --reset

    The plcontainer utility is included with the PL/Container extension.

Installing PL/Container Language Docker Images

The PL/Container extension includes the plcontainer utility that installs Docker images in the host Docker repository and adds the installed image to the PL/Container configuration. The utility adds the Docker image to all Greenplum Database hosts and updates configuration information on all the hosts. For information about plcontainer, see plcontainer Utility.

Download the tar.gz file that contains the Docker images from Pivotal Network.
  • plcontainer-python-images-1.0.0-beta1.tar.gz
  • plcontainer-r-images-1.0.0-beta1.tar.gz

Install the Docker images on the Greenplum Database hosts. These examples use the plcontainer utility to install Docker images for Python and R and add the images to the PL/Container configuration. The utility installs the images and configures all the Greenplum Database hosts. The examples assume the Docker images are in /home/gpadmin.

This example runs plcontainer to install the Docker image for PL/Python and add the image to the PL/Container configuration.
plcontainer install -n plc_python_shared -i /home/gpadmin/plcontainer-python-images-0.9.3.tar.gz \
  -c  pivotaldata/plc_python_shared:1.0.0 -l python

This example runs plcontainer to install the Docker image for PL/R and add the image to the PL/Container configuration.

plcontainer install -n plc_r -i /home/gpadmin/plcontainer-r-images-0.9.3.tar.gz \
  -c pivotaldata/plc_r_shared:1.0.0 -l r

You can view the host system Docker repository with the docker images command. The image name specified with the -c option appears in the list of Docker images.

You can view the updated the PL/Container configuration file with the plcontainer configure -s command. A container element in the configuration XML file with the name specified with the -n option appears in the file.

Uninstalling PL/Container

When you remove support for the PL/Container extension, the plcontainer user-defined functions that you created in the database will no longer work.

Remove PL/Container Support for a Database

For a database that no long requires PL/Container languages, remove support for PL/Container. Run the plcontainer_uninstall.sql script as the gpadmin user. For example, this command removes the plcontainer language in the mytest database.

psql -d mytest -f $GPHOME/share/postgresql/plcontainer/plcontainer_uninstall.sql

The script drops the plcontainer language with CASCADE to drop functions that depend on the language.

Uninstalling PL/Container Extension

If no databases have plcontainer as a registered language, uninstall the Greenplum Database PL/Container extension with the gppkg utility.

  1. Use the Greenplum Database gppkg utility with the -r option to uninstall the PL/Container extension. This example uninstalls the PL/Container extension on a Linux system:
    $ gppkg -r plcontainer-1.0.0-rhel7

    You can run the gppkg utility with the options -q --all to list the installed extensions and their versions.

  2. Reload greenplum_path.sh.
    $ source $GPHOME/greenplum_path.sh
  3. Restart the database.
    $ gpstop -ra

Uninstall Docker Containers and Images

On the Greenplum Database hosts, uninstall the Docker containers and images that are no longer required.
  • The command docker ps -a lists the containers on a host. The command docker stop stops a container.
  • The command docker images lists the images on a host.
  • The command docker rmi removes images.
  • The command docker rm removes containers.

Using PL/Container Languages

When you have enabled the plcontainer language, you can create and run user-defined functions in the procedural languages supported by the PL/Container Docker images. To create a UDF that uses PL/Container, the UDF must have the these items.

  • The first line of the UDF must be # container: name
  • The LANGUAGE attribute must be plcontainer

The name is the name that PL/Container uses to identify the Docker container that runs the UDF. in the XML configuration file plcontainer_configuration.xml, there should be a container XML element with a corresponding name XML element that specifies the detail Docker container information. See Configuring PL/Container for information about how PL/Container maps the name to a Docker container.

The PL/Container configuration file is read only on the first invocation of a PL/Container function in each Greenplum Database session that runs PL/Container functions. You can force the configuration file to be re-read by performing a SELECT command on the view plcontainer_refresh_config during the session. For example, this SELECT command forces a the configuration file to be read.

select * from plcontainer_refresh_config;

Running the command executes a PL/Container function that updates the configuration on the master and segment instances.

Also, you can show all the configurations in the session by performing a SELECT command on the view plcontainer_show_config. For example, this SELECT command returns the PL/Container configurations.

select * from plcontainer_show_config;

Running the command executes a PL/Container function that displays configuration information from the master and segment instances.

Examples

This is an example of PL/Python function that runs using the plc_python_shared container:
CREATE OR REPLACE FUNCTION pylog100() RETURNS double precision AS $$
# container: plc_python_shared
import math
return math.log10(100)
$$ LANGUAGE plcontainer;
This is an example of a similar function using the plc_r_shared container:
CREATE OR REPLACE FUNCTION rlog100() RETURNS text AS $$
# container: plc_r_shared
return(log10(100))
$$ LANGUAGE plcontainer;

The PL/Container Docker container that you specify, plc_python_shared and plc_r_shared in the examples, are the name elements defined in plcontainer_config.xml file, and they are mapped to the image XML element that specifies the Docker image to be started. Removing a specific container XML element from the configuration file makes it impossible for end users to start the container.

About PL/Container Running PL/Python

In the Python language container, the module plpy is implemented. The module contains these methods:

  • plpy.execute(stmt) - Executes the query string stmt and returns query result in a list of dictionary objects. To be able to access the result fields ensure your query returns named fields.
  • plpy.prepare(stmt,[, argtypes]) - Prepares the execution plan for a query. It is called with a query string and a list of parameter types, if you have parameter references in the query.
  • plpy.execute(plan, ,[, argtypes]) - Executes a prepared plan.
  • plpy.debug(msg) - Send a DEBUG2 message to the Greenplum Database log.
  • plpy.log(msg) - Send a LOG message to the Greenplum Database log.
  • plpy.info(msg) - Send an INFO message to the Greenplum Database log.
  • plpy.notice(msg) - Send a NOTICE message to the Greenplum Database log.
  • plpy.warning(msg) - Send a WARNING message to the Greenplum Database log.
  • plpy.error(msg) - Send an ERROR message to the Greenplum Database log. An ERROR message raised in Greenplum Database causes the query execution process to stop and the transaction to rollback.
  • plpy.fatal(msg) - Send a FATAL message to the Greenplum Database log. A FATAL message causes Greenplum Database session to be closed and transaction to be rolled back.
  • plpy.subtransaction() - Manage plpy.execute calls in an explicit subtransaction. See Explicit Subtransactions in the PostgreSQL documentation for additional information about plpy.subtransaction().

Also, the Python module has two global dictionary objects that retain the data between function calls. They are named GD and SD. GD is used to share the data between all the function running within the same container, while SD is used for sharing the data between multiple calls of each separate function. Be aware that accessing the data is possible only within the same session, when the container process lives on a segment or master. Be aware that for idle sessions Greenplum Database terminates segment processes, which means the related containers would be shut down and the data from GD and SD lost.

For information about PL/Python, see Greenplum PL/Python Language Extension.

For information about the plpy methods, see https://www.postgresql.org/docs/8.4/static/plpython-database.htm.

About PL/Container Running PL/R

In the R language container, the module pg.spi is implemented. The module contains these methods:

  • pg.spi.exec(stmt) - Executes the query string stmt and returns query result in R data.frame. To be able to access the result fields make sure your query returns named fields.
  • pg.spi.prepare(stmt,[, argtypes]) - Prepares the execution plan for a query. It is called with a query string and a list of parameter types if you have parameter references in the query.
  • pg.spi.execp(plan, ,[, argtypes]) - Execute a prepared plan.
  • pg.spi.debug(msg) - Send a DEBUG2 message to the Greenplum Database log.
  • pg.spi.log(msg) - Send a LOG message to the Greenplum Database log.
  • pg.spi.info(msg) - Send an INFO message to the Greenplum Database log.
  • pg.spi.notice(msg) - Send a NOTICE message to the Greenplum Database log.
  • pg.spi.warning(msg) - Send a WARNING message to the Greenplum Database log.
  • pg.spi.error(msg) - Send an ERROR message to the Greenplum Database log. An ERROR message raised in Greenplum Database causes the query execution process to stop and the transaction to rollback.
  • pg.spi.fatal(msg) - sSend a FATAL message to the Greenplum Database log. A FATAL message causes Greenplum Database session to be closed and transaction to be rolled back.

For information about PL/R, see Greenplum PL/R Language Extension.

For information about the pg.spi methods, see http://www.joeconway.com/plr/doc/plr-spi-rsupport-funcs-normal.html

Configuring PL/Container

The Greenplum Database utility plcontainer manages the PL/Container configuration files in a Greenplum Database system. The utility ensures that the configuration files are consistent across the Greenplum Database master and segment hosts.

Warning: Modifying the configuration files manually might create different, incompatible configurations on different Greenplum Database segments that could cause unexpected behavior.

Configuration changes that are made with the utility are applied to the XML files on all Greenplum Database segments. However, PL/Container configurations of currently running sessions use the configuration that existed during session start up. To update the PL/Container configuration in a running session, execute this command in the session.

select * from plcontainer_refresh_config;

Running the command executes a PL/Container function that updates the configuration on the master and segment instances.

When you change the plcontainer_configuration.xml configuration file with the plcontainer utility, the utility creates a back up of the original configuration file in the same directory. The backup file name is plcontainer_configuration.xml.bakYYYYMMDD_hhmmss. The timestamp of the change is appended to the file name. Using the plcontainer configure command with the --restore option, you can roll back the configuration changes to the previous version.

plcontainer Utility

The plcontainer utility installs Docker images and manages the PL/Container configuration. The utility consists of two commands.

  • plcontainer configure - Manages the PL/Container configuration file on the hosts. You can add Docker image information to the PL/Container configuration file including the image name, location, and shared folder information. You can also edit the configuration file.
  • plcontainer install - Install a Docker image in Docker repository and add the image information to the PL/Container configuration file on each host.
The plcontainer utility syntax:
plcontainer configure {{-n | --name} container-name
               {-i | --image} image-location
               {-l | --language} language
               {-v | --volume} shared-volumes } |
             {[-e --editor [editor] } |
             { --reset | --restore } |
             { | {-s | --show} | 
             {-f --file} config-file} 
             [{-y | --yes)]
             [--verbose] 

plcontainer install {-n | --name} container-name 
             {-i | --image} image-location
             {-c | --imagename} docker-image
             {-l | --language} language
             {-v | --volume} shared-volumes

plcontainer {configure | install} {-h | --help}

Options

{-c | --imagename} local-image
The utility installs the Docker image on the Greenplum Database hosts with the specified Docker name and uses the name in the PL/Container configuration file element image when creating a container element in the configuration file.
{-e | --editor } [editor]
Open the file plcontainer_configuration.xml with the specified editor. The default is the vi editor.
Saving the file updates the configuration file on all Greenplum Database hosts and saves the previous version of the file.
{-f | --file} config-file

The utility replaces the existing PL/Container configuration file with the specified file. Specify the absolute path to a configuration file. The configuration file is replaced on all Greenplum Database hosts.

{-i | --image} docker-image
Specify a full Docker image. For example pivotaldata/plcontainer_python:1.0.0.
  • configure - When creating a container entry in PL/Container configuration this is the value of configuration file element image. The Docker image must be installed.
  • install - Installs the Docker image from the specified location. You can specify a URL to a Docker registry or the absolute path to a tar.gz file that contains a docker image. When installing a docker image, the utility uses --imagename local-image for the value of configuration file element image.
{-l | --language} language

Configure PL/Container language type, supported values are python (PL/Python) and r (PL/R).

{-n | --name} container-name
When adding a container element in the PL/Container configuration file, this is the value of the name element. You specify the name in the Greenplum Database UDF on the # container line. For example, this line in a PL/Container UDF plc_r_shared specifies using the information in the plc_r_shared container element to create a Docker container.
# container: plc_r_shared
--reset
Reset the configuration file to the default.
--restore
Restore the previous version of the PL/Container configuration file.
-s | --show
Display the contents of the PL/Container configuration file.
{-v | --volume} shared-volume
Optional. Specify a Docker volume to bind mount. You can specify multiple volumes as a comma separated lists of volumes.
The format for a shared volume: host-dir:container-dir:[rw|ro]. The information is stored as attributes in the shared_directory element of the container element in the PL/Container configuration file.
  • host-dir - absolute path to a directory on the host system. The Greenplum Database administrator user (gpadmin) must have appropriate access to the directory.
  • container-dir - absolute path to a directory in the Docker container.
  • [rw|ro] - read-write or read-only access to the host directory from the container. Information is stored in the configuration file element shared_directory.
The utility sets a read-only shared volume when the Docker images are installed.
This is the shared-volume that the utility specifies for the Greenplum PL/R Docker image.
/usr/local/greenplum-db/./bin/rclient:/clientdir:ro 
This is the shared-volume that the utility specifies for the Greenplum PL/Python Docker image.
/usr/local/greenplum-db/./bin/pyclient:/clientdir:ro
If needed, you can specify other shared directories. Specifying the same shared directory as the one that is automatically set by the utility will cause a Docker container startup failure.
When specifying read-write access to host directory, ensure that the specified host directory has the correct permissions. Also, if a Docker image managed by PL/Container is configured with read-write access to a host directory, PL/Container could run multiple Docker containers on a host that change data in the directory. This might cause issues when running PL/Container user-defined functions that access the shared directory.
--verbose
Enable verbose logging.
-y | --yes
Continue without confirmation prompts.
h | --help
Display help text.

Examples

These are examples of common commands to manage PL/Container:

  • Initialize the Greenplum Database installation with default configuration file after installing a PL/Container package:
    plcontainer configure --reset
  • Edit the configuration in an interactive editor of your choice:
    plcontainer configure -e vim
  • Show the current configuration file:
    plcontainer configure --show
  • Restore the previous configuration from a backup:
    plcontainer configure --restore
  • Overwrite the PL/Container configuration file with an XML file:
    plcontainer configure -f new_plcontainer_configuration.xml 
  • Add a container entry to the PL/Container configuration file:
    plcontainer configure -n plc_python_newpy -l python
      -i pivotaldata/plc_python_newimage:latest
  • Install a Docker image and add a container entry for the image in the PL/Container configuration file.
    plcontainer install -n plc_r_newr -i plc_newr.tar.gz -c pivotaldata/plc_r_newr:latest 
      -l r

PL/Container Configuration File

The default PL/Container configuration file is in $GPHOME/share/postgresql/plcontainer/plcontainer_configuration.xml of each host. The PL/Container configuration file is an XML file. In the XML file, the root element configuration contains a one or more container elements, one element for each PL/Container language in the Greenplum Database installation.
<configuration>
   <container>
      <name>plc_python_shared</name>
      <image>pivotaldata/plcontainer_python:1.0.0</image>
      <command>./client</command>
      <memory_mb>128</memory_mb>
      <use_network>no</use_network>
      <shared_directory access="ro" container="/clientdir" host="/path/to/pyclient"/>
   </container>
   <container>
      <name>plc_r</name>
      <image>pivotaldata/plcontainer_r:1.0.0</image>
      <command>/rclient.sh</command>
      <memory_mb>256</memory_mb>
      <use_network>yes</use_network>
      <shared_directory access="ro" container="/clientdir" host="/usr/local/greenplum-db/./bin/rclient"/>
   </container>
</configuration>

These are the XML elements and attributes in a PL/Container configuration file.

configuration
Root element for the XML file.
container
One element for each specific container available in the system. Child elements of the configuration element.
name
Required. The value is used to reference a Docker container from a function. Only containers defined in the PL/Container configuration file can be specified in PL/Container functions. A Docker container cannot be referenced by its full Docker name (container ID) for security reasons. This name must be unique in the configuration file.
container_id

Required. The value is the full Docker image name, including image tag. The same way you specify them for starting this container in Docker. Configuration allows to have many container objects referencing the same image name, this way in Docker they would be represented by identical containers.

For example, you might have two containers named plc_python_128 and plc_python_256, both referencing the Docker image pivotaldata/plcontainer_python:1.0.0, but first one with 128MB RAM limit and the second one with 256MB limit that is specified by the memory_mb element.

command
Required. The value is the command to be run inside of container to start the client process inside in the container.
You should modify it only if you build your custom container and want to implement some additional initialization logic before the container starts.
Note: This element cannot be set with the plcontainer install command. You can update the configuration file with the with the plcontainer configure -e command.
memory_mb
The value specifies the amount of memory container is allowed to use, in MB. Each container is started with this amount of RAM and twice the amount of swap space. The container memory consumption is limited by the host system cgroupsconfiguration, which means in case of memory overcommit, the container is killed by the System.
Note: You can add this element by editing the configuration file with the plcontainer configure -e command.
shared_directory
Required. This element specifies one or more shared directories for a container, with different sharing options. There must be at least one shared directory between client location and the directory in the container, /clientdir usually in the Pivotal provided image.
XML attributes allowed:
  • host - specifies a shared directory location on the host system.
  • container - specifies a directory location inside of container.
  • access - specifies access level to this shared directory, which can be either ro (read-only) or rw (read-write).
The plcontainer utility sets a read-only shared volume when the Docker images are installed.
This is the shared_directory element that the utility creates for the Greenplum PL/R Docker image.
<shared_directory access="ro" container="/clientdir" host="/usr/local/greenplum-db/./bin/rclient"/> 
This is the shared_directory element that the utility creates for the Greenplum PL/Python Docker image.
<shared_directory access="ro" container="/clientdir" host="/usr/local/greenplum-db/./bin/pyclient"/>
If needed, you can specify other shared directories. Specifying the same shared directory as the one that is automatically set by the utility will cause a Docker container startup failure.
When specifying read-write access to host directory, ensure that the specified host directory has the correct permissions. Also, if a PL/Container container is configured with read-write access to a host directory, PL/Container could run multiple Docker containers on a host that change data in the directory. This might cause issues when running PL/Container user-defined functions that access the shared directory.
use_network

Optional. The value can be either yes or no to specify whether use TCP or IPC for communication between the Greenplum Database process and the Docker container process. The default is no use IPC.

Updating the PL/Container Configuration

You can add a container element to the PL/Container configuration file with the plcontainer configure command specifying options with options that specify values such as the name, Docker image, command, and shared directory. You can use the plcontainer configure command with the -e option to edit the configuration file. The utility updates the configuration file on all hosts.

The PL/Container configuration file can contain multiple container elements that reference the same Docker image specified by the XML element image. In the example configuration file, the image specifies contains container elements named plc_python_128 and plc_python_256, both referencing the Docker container pivotaldata/plcontainer_python:1.0.0. The first element is defined with a 128MB RAM limit and the second one with a 256MB RAM limit.

<configuration>
  <container>
    <name>plc_python_128</name>
    <image>pivotaldata/plcontainer_python:1.0.0</image>
    <command>./client</command>
    <memory_mb>128</memory_mb>
  </container>
  <container>
    <name>plc_python_256</name>
    <cimage>pivotaldata/plcontainer_python:1.0.0</image>
    <command>./client</command>
    <memory_mb>256</memory_mb>
  </container>
<configuration>

Notes

  • PL/Container configuration file plcontainer_configuration.xml is stored in all the Greenplum Database data directories for all the Greenplum Database segment instances: master, standby master, primary and mirror. This query lists the Greenplum Database system data directories:
    select g.hostname, fe.fselocation as directory 
       from pg_filespace as f, pg_filespace_entry as fe, 
           gp_segment_configuration as g
       where f.oid = fe.fsefsoid and g.dbid = fe.fsedbid 
           and f.fsname = 'pg_system';
  • In some cases, when PL/Container is running in a high concurrency environment, the Docker daemon hangs with log entries that indicate a memory shortage. This can happen even when the system seems to have adequate free memory.

    The issue seems to be triggered by a combination of two factors, the aggressive virtual memory requirement of the Go language (golang) runtime that is used by PL/Container, and the Greenplum Database Linux server kernel parameter setting for overcommit_memory. The parameter is set to 2 which does not allow memory overcommit.

    A workaround that might help is to increase the amount of swap space and increase the Linux server kernel parameter overcommit_ratio. If the issue still occurs after the changes, there might be memory shortage. You should check free memory on the system and add more RAM if needed. You can also decrease the cluster load.

  • PL/Container does not limit the Docker base device size, the size of the Docker container. In some cases, the Docker daemon controls the base device size. For example, if the Docker storage driver is devicemapper, the Docker daemon --storage-opt option flag dm.basesize controls the base device size. The default base device size for devicemapper is 10GB. The Docker command docker info displays Docker system information including the storage driver. The base device size is displayed in Docker 1.12 and later. For information about Docker storage drivers, see the Docker information Daemon storage-driver.

    When setting the Docker base device size, the size must be set on all Greenplum Database hosts.

Installing Docker

To use PL/Container, Docker must be installed on all Greenplum Database host systems. The these instructions show how to set up the Docker service on CentOS 6 and CentOS 7. Installing on RHEL 6 or RHEL 7 is a similar process.

Before performing the Docker installation ensure these requirements are met.
  • The CentOS extras repository is accessible.
  • The user has sudo privileges or is root.

See also the Docker site installation instructions for CentOShttps://docs.docker.com/engine/installation/linux/centos/. For a list of Docker commands, see the Docker engine Run Reference https://docs.docker.com/engine/reference/run/.

Installing Docker on CentOS 7

These steps install the docker package and start the docker service as a user with sudo privileges.

  1. Install dependencies required for Docker
    sudo yum install -y yum-utils device-mapper-persistent-data lvm2
  2. Add the Docker repo
    sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
  3. Update yum cache
    sudo yum makecache fast
  4. Install Docker
    sudo yum -y install docker-ce
  5. Start Docker daemon.
    sudo systemctl start docker
  6. To give access to the Docker daemon and docker commands, assign the Greenplum Database administrator (gpadmin) to the group docker.
    sudo usermod -aG docker gpadmin
  7. Exit the session and login again to update the privileges.
  8. Run a Docker command to test the Docker installation. This command lists the currently running Docker containers.
    docker ps
This command configures Docker to start when the host system starts.
sudo systemctl start  docker.service
After you have installed Docker on all Greenplum Database hosts, restart the Greenplum Database system to give Greenplum Database access to Docker.
gpstop -ra

Installing Docker on CentOS 6

These steps install the Docker package and start the docker service as a user with sudo privileges.

  1. Install EPEL package
    sudo yum -y install epel-release
  2. Install Docker
    sudo yum -y install docker-io
  3. Start Docker
    sudo service docker start
  4. To give access to the Docker daemon and docker commands, assign the Greenplum Database administrator (gpadmin) to the group docker.
    sudo usermod -aG docker gpadmin
  5. Exit the session and login again to update the privileges.
  6. Run a Docker command to test the Docker installation. This command lists the currently running Docker containers.
    docker ps
This command configures Docker to start when the host system starts.
sudo chkconfig docker on
After you have installed Docker on all Greenplum Database hosts, restart the Greenplum Database system to give Greenplum Database access to Docker.
gpstop -ra

References