Greenplum PL/Container Language Extension
A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 5.x documentation.
Greenplum PL/Container Language Extension
This section includes the following information:
- About the PL/Container Language Extension
- PL/Container Docker Images
- Installing the PL/Container Language Extension
- Uninstalling PL/Container
- Using PL/Container
- About PL/Container Running PL/Python
- About PL/Container Running PL/R
- Configuring PL/Container
- Installing Docker
- References
About the PL/Container Language Extension
The Greenplum Database PL/Container language extension (PL/Container) is an interface that allows Greenplum Database to interact with a Docker container to execute a user-defined function (UDF) in the container. Docker containers ensure the user code cannot access the file system of the source host. Also, containers are started without network access or with limited network access and cannot connect back to Greenplum Database or open any other external connections. For information about available UDF languages, see PL/Container Docker Images
Generally speaking, a Docker container is a Linux process that runs in a managed way by using Linux kernel features such as cgroups, namespaces and union file systems. A Docker image is the basis of a container. A Docker container is a running instance of a Docker image. When you start a Docker container you specify a Docker image. A Docker image is the collection of root filesystem changes and execution parameters that are used when you run a Docker container on the host system. An image does not have state and never changes. For information about Docker, see the Docker web site https://www.docker.com/.
Greenplum Database starts a container only on the first call to a function in that container. For example, consider a query that selects table data using all available segments, and applies a transformation to the data using a PL/Container function. In this case, Greenplum Database would start the Docker container only once on each segment, and then contact the running container to obtain the results.
After starting a full cycle of a query execution. the executor sends a call to the container. The container might respond with an SPI - SQL query executed by the container to get some data back from the database, returning the result to the query executor.
The container shuts down when the connection to it is closed. This occurs when you close the Greenplum Database session that started the container. A container running in standby mode has almost no consumption of CPU resources as it is waiting on the socket. PL/Container memory consumption depends on the amount of data you cache in global dictionaries.
The PL/Container language extension is available as an open source module. For information about the module, see the README file in the GitHub repository at https://github.com/greenplum-db/plcontainer.
PL/Container Docker Images
A PL/Python image and a PL/R image are available from the Greenplum Database product download site of Pivotal Network at https://network.pivotal.io/.
- PL/Container for Python - Docker image with Python 2.7.12 installed.
The Python Data Science Module is also installed. The module contains a set python libraries related to data science. For information about the module, see Python Data Science Module Package.
- PL/Container for R - A Docker image with container with R-3.3.3 installed.
The R Data Science package is also installed. The package contains a set of R libraries related to data science. For information about the module, see R Data Science Library Package.
The Docker image tag represents the PL/Container release version (for example, 1.0.0). For example, the full Docker image name for the PL/Container for Python Docker image is similar to pivotaldata/plc_python_shared:1.0.0. This is the name that is referred to in the default PL/Container configuration. Also, You can create custom Docker images, install the image and add the image to the PL/Container configuration.
Prerequisites
Ensure your Greenplum Database system meets the following prerequisites:
- PL/Container is supported on Pivotal Greenplum Database 5.2.x on Red Hat Enterprise Linux (RHEL) 7.x or 6.6+ (or later) and CentOS 7.x or 6.6+ (or later).
- These are Docker host operating system prerequisites.
RHEL or CentOS 7.x - Minimum supported Linux OS kernel version is 3.10. RHEL 7.x and CentOS 7.x use this kernel version.
RHEL or CentOS 6.6+ - Minimum supported Linux OS kernel version 2.6.32-431
You can check your kernel version with the command uname -r
Note: The Red Hat provided, maintained, and supported version of Docker is only available on RHEL 7. Red Hat does not recommend running any version of Docker on any RHEL 6 releases. Docker feature developments are tied to RHEL7.x infrastructure components for kernel, devicemapper (thin provisioning, direct lvm), sVirt and systemd.
- Docker is installed on Greenplum Database hosts (master, primary and all standby
hosts)
- For RHEL or CentOS 7.x - Docker 17.05
- RHEL or CentOS 6.6+ - Docker 1.7
See Installing Docker.
- On each Greenplum Database host the gpadmin user should be part of the docker group for the user to be able to manage Docker images and containers.
Installing the PL/Container Language Extension
- Ensure the Greenplum Database hosts meet the prerequisites, see Prerequisites.
- Install the PL/Container extension, see Installing the PL/Container Language Extension Package.
- Install Docker images and configure PL/Container, see Installing PL/Container Docker Images.
Installing the PL/Container Language Extension Package
Install the PL/Container language extension with the Greenplum Database gppkg utility.
- Copy the PL/Container language extension package to the Greenplum Database master host as the gpadmin user.
- Make sure Greenplum Database is up and running. If not, bring it up with this
command.
gpstart -a
- Run the package installation
command.
gppkg -i plcontainer-1.0.0-rhel7-x86_64.gppkg
- Source the file
$GPHOME/greenplum_path.sh.
source $GPHOME/greenplum_path.sh
- Restart Greenplum Database.
gpstop -ra
- Enable PL/Container for specific databases by
running
psql -d your_database -f $GPHOME/share/postgresql/plcontainer/plcontainer_install.sql
The SQL script registers the language plcontainer in the database and creates PL/Container specific UDFs.
After installing PL/Container, you can manage Docker images and manage the PL/Container configuration with the Greenplum Database plcontainer utility.
Installing PL/Container Docker Images
The PL/Container language extension includes the plcontainer utility that installs Docker images on the Greenplum Database hosts and adds configuration information to the PL/Container configuration file. The configuration information allows PL/Container to create Docker containers with the Docker images. For information about plcontainer, see The plcontainer Utility.
- plcontainer-python-images-1.0.0.tar.gz
- plcontainer-r-images-1.0.0.tar.gz
Install the Docker images on the Greenplum Database hosts. This example uses the plcontainer utility to install a Docker image for Python and to update the PL/Container configuration. The example assumes the Docker image to be installed is in a file in /home/gpadmin.
plcontainer image-add -i /home/gpadmin/plcontainer-python-images-1.0.0.tar.gz
The utility displays progress information as it installs the Docker image on the Greenplum Database hosts.
Use the plcontainer image-show command to display the installed Docker images on the local host.
plcontainer runtime-add -r plc_py -i pivotaldata/plcontainer_python_shared:devel -l python
The utility displays progress information as it updates the PL/Container configuration file on the Greenplum Database instances.
You can view the PL/Container configuration information with the plcontainer runtime-show -r plc_py command. You can view the PL/Container configuration XML file with the plcontainer runtime-edit command.
Uninstalling PL/Container
To uninstall PL/Container, remove Docker containers and images, and then remove the PL/Container support from Greenplum Database.
When you remove support for PL/Container, the plcontainer user-defined functions that you created in the database will no longer work.
Uninstall Docker Containers and Images
On the Greenplum Database hosts, uninstall the Docker containers and images that are no longer required.
The plcontainer image-list command lists the Docker images that are installed on the local Greenplum Database host.
The plcontainer image-delete command deletes a specified Docker image from all Greenplum Database hosts.
- The command docker ps -a lists all containers on a host. The command docker stop stops a container.
- The command docker images lists the images on a host.
- The command docker rmi removes images.
- The command docker rm removes containers.
Remove PL/Container Support for a Database
For a database that no long requires PL/Container, remove support for PL/Container. Run the plcontainer_uninstall.sql script as the gpadmin user. For example, this command removes the plcontainer language in the mytest database.
psql -d mytest -f $GPHOME/share/postgresql/plcontainer/plcontainer_uninstall.sql
The script drops the plcontainer language with CASCADE to drop functions that depend on the language.
Uninstalling PL/Container Language Extension
If no databases have plcontainer as a registered language, uninstall the Greenplum Database PL/Container language extension with the gppkg utility.
- Use the Greenplum Database gppkg utility with the -r
option to uninstall the PL/Container language extension. This example uninstalls the
PL/Container language extension on a Linux
system:
$ gppkg -r plcontainer-1.0.0-rhel7
You can run the gppkg utility with the options -q --all to list the installed extensions and their versions.
- Reload
greenplum_path.sh.
$ source $GPHOME/greenplum_path.sh
- Restart the database.
$ gpstop -ra
Using PL/Container
When you enable PL/Container in database of a Greenplum Database system, the language plcontainer is registered in the database. You can create and run user-defined functions in the procedural languages supported by the PL/Container Docker images when you specify plcontainer as a language in a UDF definition.
A UDF definition that uses PL/Container must have the these items.
- The first line of the UDF must be # container: ID
- The LANGUAGE attribute must be plcontainer
The ID is the name that PL/Container uses to identify a Docker image. When Greenplum Database executes a UDF on a host, the Docker image on the host is used to start a Docker container that runs the UDF. In the XML configuration file plcontainer_configuration.xml, there is a runtime XML element that contains a corresponding id XML element that specifies the Docker container startup information. See Configuring PL/Container for information about how PL/Container maps the ID to a Docker image. See Examples for example UDF definitions.
The PL/Container configuration file is read only on the first invocation of a PL/Container function in each Greenplum Database session that runs PL/Container functions. You can force the configuration file to be re-read by performing a SELECT command on the view plcontainer_refresh_config during the session. For example, this SELECT command forces a the configuration file to be read.
select * from plcontainer_refresh_config;
gp_segment_id | plcontainer_refresh_local_config ---------------+---------------------------------- 1 | ok 0 | ok -1 | ok (3 rows)
Also, you can show all the configurations in the session by performing a SELECT command on the view plcontainer_show_config. For example, this SELECT command returns the PL/Container configurations.
select * from plcontainer_show_config;
INFO: plcontainer: Container 'plc_py_test' configuration INFO: plcontainer: image = 'pivotaldata/plcontainer_python_shared:devel' INFO: plcontainer: memory_mb = '1024' INFO: plcontainer: use container network = 'no' INFO: plcontainer: use container logging = 'no' INFO: plcontainer: shared directory from host '/usr/local/greenplum-db/./bin/plcontainer_clients' to container '/clientdir' INFO: plcontainer: access = readonly ... INFO: plcontainer: Container 'plc_r_example' configuration (seg0 slice3 192.168.180.45:40000 pid=3304) INFO: plcontainer: image = 'pivotaldata/plcontainer_r_without_clients:0.2' (seg0 slice3 192.168.180.45:40000 pid=3304) INFO: plcontainer: memory_mb = '1024' (seg0 slice3 192.168.180.45:40000 pid=3304) INFO: plcontainer: use container network = 'no' (seg0 slice3 192.168.180.45:40000 pid=3304) INFO: plcontainer: use container logging = 'yes' (seg0 slice3 192.168.180.45:40000 pid=3304) INFO: plcontainer: shared directory from host '/usr/local/greenplum-db/bin/plcontainer_clients' to container '/clientdir' (seg0 slice3 192.168.180.45:40000 pid=3304) INFO: plcontainer: access = readonly (seg0 slice3 192.168.180.45:40000 pid=3304) gp_segment_id | plcontainer_show_local_config ---------------+------------------------------- 0 | ok -1 | ok 1 | ok
select * from plcontainer_containers_summary();
If a normal (non-superuser) Greenplum Database user runs the function, the function displays information only for containers created by the user. If a Greenplum Database superuser runs the function, information for all containers created by Greenplum Database users is displayed. This is sample output when 2 containers are running.
SEGMENT_ID | CONTAINER_ID | UP_TIME | OWNER | MEMORY_USAGE(KB) ------------+------------------------------------------------------------------+--------------+---------+------------------ 1 | 693a6cb691f1d2881ec0160a44dae2547a0d5b799875d4ec106c09c97da422ea | Up 8 seconds | gpadmin | 12940 1 | bc9a0c04019c266f6d8269ffe35769d118bfb96ec634549b2b1bd2401ea20158 | Up 2 minutes | gpadmin | 13628 (2 rows)
Examples
The values in the # container lines of the examples, plc_python_shared and plc_r_shared, are the id XML elements defined in the plcontainer_config.xml file. The id element is mapped to the image element that specifies the Docker image to be started. If you configured PL/Container with a different ID, change the value of the # container line. For information about configuring PL/Container and viewing the configuration settings, see Configuring PL/Container.
CREATE OR REPLACE FUNCTION pylog100() RETURNS double precision AS $$ # container: plc_python_shared import math return math.log10(100) $$ LANGUAGE plcontainer;
CREATE OR REPLACE FUNCTION rlog100() RETURNS text AS $$ # container: plc_r_shared return(log10(100)) $$ LANGUAGE plcontainer;
If the # container line in a UDF specifies an ID that is not in the PL/Container configuration file, Greenplum Database returns an error when you try to execute the UDF.
About PL/Container Running PL/Python
In the Python language container, the module plpy is implemented. The module contains these methods:
- plpy.execute(stmt) - Executes the query string stmt and returns query result in a list of dictionary objects. To be able to access the result fields ensure your query returns named fields.
- plpy.prepare(stmt,[, argtypes]) - Prepares the execution plan for a query. It is called with a query string and a list of parameter types, if you have parameter references in the query.
- plpy.execute(plan, ,[, argtypes]) - Executes a prepared plan.
- plpy.debug(msg) - Sends a DEBUG2 message to the Greenplum Database log.
- plpy.log(msg) - Sends a LOG message to the Greenplum Database log.
- plpy.info(msg) - Sends an INFO message to the Greenplum Database log.
- plpy.notice(msg) - Sends a NOTICE message to the Greenplum Database log.
- plpy.warning(msg) - Sends a WARNING message to the Greenplum Database log.
- plpy.error(msg) - Sends an ERROR message to the Greenplum Database log. An ERROR message raised in Greenplum Database causes the query execution process to stop and the transaction to rollback.
- plpy.fatal(msg) - Sends a FATAL message to the Greenplum Database log. A FATAL message causes Greenplum Database session to be closed and transaction to be rolled back.
- plpy.subtransaction() - Manages plpy.execute calls in an explicit subtransaction. See Explicit Subtransactions in the PostgreSQL documentation for additional information about plpy.subtransaction().
- Multi-dimensional arrays.
Also, the Python module has two global dictionary objects that retain the data between function calls. They are named GD and SD. GD is used to share the data between all the function running within the same container, while SD is used for sharing the data between multiple calls of each separate function. Be aware that accessing the data is possible only within the same session, when the container process lives on a segment or master. Be aware that for idle sessions Greenplum Database terminates segment processes, which means the related containers would be shut down and the data from GD and SD lost.
For information about PL/Python, see Greenplum PL/Python Language Extension.
For information about the plpy methods, see https://www.postgresql.org/docs/8.4/static/plpython-database.htm.
About PL/Container Running PL/R
In the R language container, the module pg.spi is implemented. The module contains these methods:
- pg.spi.exec(stmt) - Executes the query string stmt and returns query result in R data.frame. To be able to access the result fields make sure your query returns named fields.
- pg.spi.prepare(stmt,[, argtypes]) - Prepares the execution plan for a query. It is called with a query string and a list of parameter types if you have parameter references in the query.
- pg.spi.execp(plan, ,[, argtypes]) - Execute a prepared plan.
- pg.spi.debug(msg) - Sends a DEBUG2 message to the Greenplum Database log.
- pg.spi.log(msg) - Sends a LOG message to the Greenplum Database log.
- pg.spi.info(msg) - Sends an INFO message to the Greenplum Database log.
- pg.spi.notice(msg) - Sends a NOTICE message to the Greenplum Database log.
- pg.spi.warning(msg) - Sends a WARNING message to the Greenplum Database log.
- pg.spi.error(msg) - Sends an ERROR message to the Greenplum Database log. An ERROR message raised in Greenplum Database causes the query execution process to stop and the transaction to rollback.
- pg.spi.fatal(msg) - Sends a FATAL message to the Greenplum Database log. A FATAL message causes Greenplum Database session to be closed and transaction to be rolled back.
- Multi-dimensional arrays.
For information about PL/R, see Greenplum PL/R Language Extension.
For information about the pg.spi methods, see http://www.joeconway.com/plr/doc/plr-spi-rsupport-funcs-normal.html
Configuring PL/Container
The Greenplum Database utility plcontainer manages the PL/Container configuration files in a Greenplum Database system. The utility ensures that the configuration files are consistent across the Greenplum Database master and segment instances.
Configuration changes that are made with the utility are applied to the XML files on all Greenplum Database segments. However, PL/Container configurations of currently running sessions use the configuration that existed during session start up. To update the PL/Container configuration in a running session, execute this command in the session.
select * from plcontainer_refresh_config;
Running the command executes a PL/Container function that updates the session configuration on the master and segment instances.
The plcontainer Utility
The plcontainer utility installs Docker images and manages the PL/Container configuration. The utility consists of two sets of commands.
- image-* commands manage Docker images on the Greenplum Database system hosts.
- runtime-* commands manage the PL/Container configuration file on the Greenplum Database instances. You can add Docker image information to the PL/Container configuration file including the image name, location, and shared folder information. You can also edit the configuration file.
To configure PL/Container to use a Docker image, you install the Docker image on all the Greenplum Database hosts and then add configuration information to the PL/Container configuration.
PL/Container configuration values, such as image names, runtime IDs, and parameter values and names are case sensitive.
plcontainer Syntax
plcontainer [command] [-h | --help] [--verbose]
Where command is one of the following.
image-add {{-f | --file} image_file} | {{-u | --URL} image_URL} image-delete {-i | --image} image_name image-list runtime-add {-r | --runtime} runtime_id {-i | --image} image_name {-l | --language} {python | r} [{-v | --volume} shared_volume [{-v| --volume} shared_volume...]] [{-s | --setting} param_value [{-s | --setting} param_value ...]] runtime-replace {-r | --runtime} runtime_id {-i | --image} image_name -l {r | python} [{-v | --volume} shared_volume [{-v | --volume} shared_volume...]] [{-s | --setting} param_value [{-s | --setting} param_value ...]] runtime-show {-r | --runtime} runtime_id runtime-delete {-r | --runtime} runtime_id runtime-edit [{-e | --editor} editor] runtime-backup {-f | --file} config_file runtime-restore {-f | --file} config_file runtime-verify
plcontainer Commands and Options
- image-add location
- Install a Docker image on the Greenplum Database hosts. Specify either the location
of the Docker image file on the host or the URL to the Docker image. These are the
supported location options.
- {-f | --file} image_file Specify the tar archive file on the host that contains the Docker image. This example points to an image file in the gpadmin home directory /home/gpadmin/test_image.tar.gz
- {-u | --URL} image_URL Specify the URL of the Docker repository and image. This example URL points to a local Docker repository 192.168.0.1:5000/images/mytest_plc_r:devel
- After installing the Docker image, use the runtime-add command to configure PL/Container to use the Docker image.
- image-delete {-i | --image} image_name
- Remove an installed Docker image from all Greenplum Database hosts. Specify the full Docker image name including the tag for example pivotaldata/plcontainer_python_shared:1.0.0
- image-list
- List the Docker images installed on the host. The command list only the images on the local host, not remote hosts. The command lists all installed Docker images, including images installed with Docker commands.
- runtime-add options
- Add configuration information to the PL/Container configuration file on all Greenplum Database hosts. If the specified runtime_id exists, the utility returns an error and the configuration information is not added.
- For information about PL/Container configuration, see PL/Container Configuration File.
- These are the supported options:
-
- {-i | --image} docker-image
- Required. Specify the full Docker image name, including the tag, that is installed on the Greenplum Database hosts. For example pivotaldata/plcontainer_python:1.0.0.
- The utility returns a warning if the specified Docker image is not installed.
- The plcontainer image-list command displays installed image information including the name and tag (the Repository and Tag columns).
- {-l | --language} python | r
- Required. Specify the PL/Container language type, supported values are python (PL/Python) and r (PL/R). When adding configuration information for a new runtime, the utility adds a startup command to the configuration based on the language you specify.
- Startup command for the Python
language.
/clientdir/pyclient.sh
- Startup command for the R
language.
/clientdir/rclient.sh
- {-r | --runtime} runtime_id
- Required. Add the runtime ID. When adding a runtime element in the PL/Container configuration file, this is the value of the id element in the PL/Container configuration file. Maximum length is 63 Bytes.
- You specify the name in the Greenplum Database UDF on the # container line. See Examples.
- {-s | --setting} param=value
- Optional. Specify a setting to add to the runtime configuration information.
You can specify this option multiple times. The setting applies to the runtime
configuration specified by the runtime_id. The parameter is
the XML attribute of the setting element in the PL/Container
configuration file. These are valid parameters.
- memory_mb - Set the memory allocated for the container. The value is an integer that specifies the amount of memory in MB.
-
use_container_network - Set the type of networking for
communication between the container and Greenplum Database. The value is
either yes, use TCP, or no use IPC. The
default is no, use IPC.
We recommend not changing the value. The default value (IPC) performs better than TCP in most environments.
-
use_container_logging - Enable or disable Docker logging
for the container. The value is either yes (enable logging)
or no (disable logging, the default).
The Greenplum Database server configuration parameter log_min_messages controls the log level. The default log level is warning. For information about PL/Container log information, see Notes.
- {-v | --volume} shared-volume
- Optional. Specify a Docker volume to bind mount. You can specify this option multiple times to define multiple volumes.
- The format for a shared volume:
host-dir:container-dir:[rw|ro].
The information is stored as attributes in the shared_directory
element of the runtime element in the PL/Container
configuration file.
- host-dir - absolute path to a directory on the host system. The Greenplum Database administrator user (gpadmin) must have appropriate access to the directory.
- container-dir - absolute path to a directory in the Docker container.
- [rw|ro] - read-write or read-only access to the host directory from the container.
- When adding configuration information for a new runtime, the utility adds this read-only shared volume information.
-
greenplum-home/bin/plcontainer_clients:/clientdir:ro
- If needed, you can specify other shared directories. The utility returns an
error if the specified container-dir is the same as the one
that is added by the utility, or if you specify multiple shared volumes with the
same container-dir.Warning: Allowing read-write access to a host directory requires special considerations.
- When specifying read-write access to host directory, ensure that the specified host directory has the correct permissions.
- When running PL/Container user-defined functions, multiple concurrent Docker containers that are running on a host could change data in the host directory. Ensure that the functions support multiple concurrent access to the data in the host directory.
- runtime-backup {-f | --file} config_file
-
Copies the PL/Container configuration file to the specified file on the local host.
- runtime-delete {-r | --runtime} runtime_id
-
Removes runtime configuration information in the PL/Container configuration file on all Greenplum Database instances. The utility returns a message if the specified runtime_id does not exist in the file.
- runtime-edit [{-e | --editor} editor]
- Edit the XML file plcontainer_configuration.xml with the specified
editor. The default editor is vi.
Saving the file updates the configuration file on all Greenplum Database hosts. If errors exist in the updated file, the utility returns an error and does not update the file.
- runtime-replace options
-
Replaces runtime configuration information in the PL/Container configuration file on all Greenplum Database instances. If the runtime_id does not exist, the information is added to the configuration file. The utility adds a startup command and shared directory to the configuration.
See runtime-add for command options and information added to the configuration.
- runtime-restore {-f | --file} config_file
-
Replaces information in the PL/Container configuration file plcontainer_configuration.xml on all Greenplum Database instances with the information from the specified file on the local host.
- runtime-show [{-r | --runtime} runtime_id]
-
Displays formatted PL/Container runtime configuration information. If a runtime_id is not specified, the configuration for all runtime IDs are displayed.
- runtime-verify
-
Checks the PL/Container configuration information on the Greenplum Database instances with the configuration information on the master. If the utility finds inconsistencies, you are prompted to replace the remote copy with the local copy. The utility also performs XML validation.
- -h | --help
- Display help text. If specified without a command, displays help for all plcontainer commands. If specified with a command, displays help for the command.
- --verbose
- Enable verbose logging for the command.
Examples
These are examples of common commands to manage PL/Container:
- Install a Docker image on all Greenplum Database hosts. This example loads a Docker
image from a file. The utility displays progress information on the command line as
the utility installs the Docker image on all the
hosts.
plcontainer image-add -f plc_newr.tar.gz
After installing the Docker image, you add or update a runtime entry in the PL/Container configuration file to give PL/Container access to the Docker image to start Docker containers.
- Add a container entry to the PL/Container configuration file. This example adds
configuration information for a PL/R runtime, and specifies a shared volume and
settings for memory and logging.
plcontainer runtime-add -r runtime2 -i test_image2:0.1 -l r \ -v /host_dir2/shared2:/container_dir2/shared2:ro \ -s memory_mb=512 -s use_container_logging=yes
The utility displays progress information on the command line as it adds the runtime configuration to the configuration file and distributes the updated configuration to all instances.
- Show specific runtime with given runtime id in configuration
file
plcontainer runtime-show -r plc_python_shared
The utility displays the configuration information similar to this output.PL/Container Runtime Configuration: --------------------------------------------------------- Runtime ID: plc_python_shared Linked Docker Image: test1:latest Runtime Setting(s): Shared Directory: ---- Shared Directory From HOST '/usr/local/greenplum-db/bin/plcontainer_clients' to Container '/clientdir', access mode is 'ro' ---- Shared Directory From HOST '/home/gpadmin/share/' to Container '/opt/share', access mode is 'rw' ---------------------------------------------------------
- Edit the configuration in an interactive editor of your choice. This example edits
the configuration file with the vim
editor.
plcontainer runtime-edit -e vim
When you save the file, the utility displays progress information on the command line as it distributes the file to the Greenplum Database hosts.
- Save the current PL/Container configuration to a file. This example saves the file
to the local file
/home/gpadmin/saved_plc_config.xml
plcontainer runtime-backup -f /home/gpadmin/saved_plc_config.xml
- Overwrite PL/Container configuration file with an XML file. This example replaces
the information in the configuration file with the information from the file in the
/home/gpadmin
directory.
plcontainer runtime-restore -f /home/gpadmin/new_plcontainer_configuration.xml
The utility displays progress information on the command line as it distributes the updated file to the Greenplum Database instances.
PL/Container Configuration File
PL/Container maintains a configuration file plcontainer_configuration.xml in the data directory of all Greenplum Database segments. The PL/Container configuration file is an XML file. In the XML file, the root element configuration contains one or more runtime elements. You specify the id of the runtime element in the # container: line of a PL/Container function definition.
In an XML file, names, such as element and attribute names, and values are case sensitive.
<?xml version="1.0" ?> <configuration> <runtime> <id>plc_python_example1</id> <image>pivotaldata/plcontainer_python_with_clients:0.1</image> <command>./pyclient</command> </runtime> <runtime> <id>plc_python_example2</id> <image>pivotaldata/plcontainer_python_without_clients:0.1</image> <command>/clientdir/pyclient.sh</command> <shared_directory access="ro" container="/clientdir" host="/usr/local/greenplum-db/bin/plcontainer_clients"/> <setting memory_mb="512"/> <setting use_container_network="yes"/> <setting use_container_logging="yes"/> </runtime> <runtime> <id>plc_r_example</id> <image>pivotaldata/plcontainer_r_without_clients:0.2</image> <command>/clientdir/rclient.sh</command> <shared_directory access="ro" container="/clientdir" host="/usr/local/greenplum-db/bin/plcontainer_clients"/> <setting use_container_logging="yes"/> </runtime> <runtime> </configuration>
These are the XML elements and attributes in a PL/Container configuration file.
- configuration
- Root element for the XML file.
- runtime
- One element for each specific container available in the system. These are child elements of the configuration element.
-
- id
- Required. The value is used to reference a Docker container from a
PL/Container user-defined function. The id value must be unique
in the configuration. The id must start with a character or
digit (a-z, A-Z, or 0-9) and can contain characters, digits, or the characters
_ (underscore), . (period), or
- (dash). Maximum length is 63 Bytes.
The id specifies which Docker image to use when PL/Container creates a Docker container to execute a user-defined function.
- image
-
Required. The value is the full Docker image name, including image tag. The same way you specify them for starting this container in Docker. Configuration allows to have many container objects referencing the same image name, this way in Docker they would be represented by identical containers.
For example, you might have two runtime elements, with different id elements, plc_python_128 and plc_python_256, both referencing the Docker image pivotaldata/plcontainer_python:1.0.0. The first runtime specifies a 128MB RAM limit and the second one specifies a 256MB limit that is specified by the memory_mb attribute of a setting element.
- command
- Required. The value is the command to be run inside of container to start the client process inside in the container. When creating a runtime element, the plcontainer utility adds a command element based on the language (the -l option).
-
command element for the python
language.
<command>/clientdir/pyclient.sh</command>
-
command element for the R
language.
<command>/clientdir/rclient.sh</command>
- You should modify the value only if you build a custom container and want to
implement some additional initialization logic before the container
starts.Note: This element cannot be set with the plcontainer utility. You can update the configuration file with the with the plcontainer runtime-edit command.
- shared_directory
- Optional. This element specifies a shared Docker shared volume for a container
with access information. Mutliple shared_directory elements are
allowed. Each shared_directory element specifies a single
shared volume. XML attributes for the shared_directory
element:
- host - a directory location on the host system.
- container - a directory location inside of container.
- access - access level to the host directory, which can be either ro (read-only) or rw (read-write).
- When creating a runtime element, the
plcontainer utility adds a shared_directory
element.
<shared_directory access="ro" container="/clientdir" host="/usr/local/greenplum-db/bin/plcontainer_clients"/>
- For each runtime element, the container
attribute of the shared_directory elements must be unique. For
example, a runtime element cannot have two
shared_directory elements with attribute
container="/clientdir". Warning: Allowing read-write access to a host directory requires special consideration.
- When specifying read-write access to host directory, ensure that the specified host directory has the correct permissions.
- When running PL/Container user-defined functions, multiple concurrent Docker containers that are running on a host could change data in the host directory. Ensure that the functions support multiple concurrent access to the data in the host directory.
- settings
- Optional. This element specifies Docker container configuration information.
Each setting element contains one attribute. The element
attribute specifies logging, memory, or networking information. For example,
this element enables
logging.
<setting use_container_logging="yes"/>
- These are the valid attributes.
- memory_mb="size"
- Optional. The value specifies the amount of memory, in MB, that a container is allowed to use. Each container is started with this amount of RAM and twice the amount of swap space. The container memory consumption is limited by the host system cgroups configuration, which means in case of memory overcommit, the container is killed by the system.
- use_container_logging="{yes | no}"
- Optional. Enables or disables Docker logging for the container. The attribute value yes enables logging. The attribute value no disables logging (the default).
- The Greenplum Database server configuration parameter log_min_messages controls the PL/Container log level. The default log level is warning. For information about PL/Container log information, see Notes.
-
By default, the PL/Container log information is sent to a system service. On Red Hat 7 or CentOS 7 systems, the log information is sent to the journald service. On Red Hat 6 or CentOS 6 systems, the log is sent to the syslogd service.
- use_container_network="{yes | no}"
- Optional. The value can be either yes or
no to specify whether to use TCP (the value
yes) or IPC (the value no) for
communication between the Greenplum Database process and the Docker
container process. The default is no use IPC.
We recommend not changing the value. The default value (IPC) performs better than TCP in most environments.
Updating the PL/Container Configuration
You can add a runtime element to the PL/Container configuration file with the plcontainer runtime-add command. The command options specify information such as the runtime ID, Docker image, and language. You can use the plcontainer runtime-replace command to update an existing runtime element. The utility updates the configuration file on the master and all segment instances.
The PL/Container configuration file can contain multiple runtime elements that reference the same Docker image specified by the XML element image. In the example configuration file, the runtime elements contain id elements named plc_python_128 and plc_python_256, both referencing the Docker container pivotaldata/plcontainer_python:1.0.0. The first runtime element is defined with a 128MB RAM limit and the second one with a 256MB RAM limit.
<configuration> <runtime> <id>plc_python_128</id> <image>pivotaldata/plcontainer_python:1.0.0</image> <command>./client</command> <shared_directory access="ro" container="/clientdir" host="/usr/local/gpdb/bin/plcontainer_clients"/> <setting memory_mb="128"/> </runtime> <runtime> <id>plc_python_256</id> <image>pivotaldata/plcontainer_python:1.0.0</image> <command>./client</command> <shared_directory access="ro" container="/clientdir" host="/usr/local/gpdb/bin/plcontainer_clients"/> <setting memory_mb="256"/> </runtime> <configuration>
Notes
- PL/Container maintains the configuration file
plcontainer_configuration.xml in the data directory of all Greenplum
Database segment instances: master, standby master, primary, and mirror. This query
lists the Greenplum Database system data
directories:
SELECT g.hostname, fe.fselocation as directory FROM pg_filespace AS f, pg_filespace_entry AS fe, gp_segment_configuration AS g WHERE f.oid = fe.fsefsoid AND g.dbid = fe.fsedbid AND f.fsname = 'pg_system';
A sample PL/Container configuration file is in $GPHOME/share/postgresql/plcontainer.
- When Greenplum Database executes a PL/Container UDF, Query Executer (QE) processes
start Docker containers and reuse them as needed. After a certain amount of idle time, a
QE process quits and destroys its Docker containers. You can control the amount of idle
time with the Greenplum Database server configuration parameter gp_vmem_idle_resource_timeout. Controlling the idle time might help
with Docker container reuse and avoid the overhead of creating and starting a Docker
container.Warning: Changing gp_vmem_idle_resource_timeout value, might affect performance due to resource issues. The parameter also controls the freeing of Greenplum Database resources other than Docker containers.
- In some cases, when PL/Container is running in a high concurrency environment, the
Docker daemon hangs with log entries that indicate a memory shortage. This can happen
even when the system seems to have adequate free memory.
The issue seems to be triggered by a combination of two factors, the aggressive virtual memory requirement of the Go language (golang) runtime that is used by PL/Container, and the Greenplum Database Linux server kernel parameter setting for overcommit_memory. The parameter is set to 2 which does not allow memory overcommit.
A workaround that might help is to increase the amount of swap space and increase the Linux server kernel parameter overcommit_ratio. If the issue still occurs after the changes, there might be memory shortage. You should check free memory on the system and add more RAM if needed. You can also decrease the cluster load.
- PL/Container does not limit the Docker base device size, the size of the Docker
container. In some cases, the Docker daemon controls the base device size. For example,
if the Docker storage driver is devicemapper, the Docker daemon
--storage-opt option flag dm.basesize controls the
base device size. The default base device size for devicemapper is 10GB. The Docker
command docker info displays Docker system information including the
storage driver. The base device size is displayed in Docker 1.12 and later. For
information about Docker storage drivers, see the Docker information Daemon storage-driver.
When setting the Docker base device size, the size must be set on all Greenplum Database hosts.
- When PL/Container logging is enabled, you can set the log level with the Greenplum
Database server configuration parameter log_min_messages. The default log level is warning.
The parameter controls the PL/Container log level and also controls the Greenplum
Database log level.
- PL/Container logging is enabled or disabled for each runtime ID with the setting attribute use_container_logging. The default is no logging.
- The PL/Container log information is the information from the UDF that is run in the Docker container. By default, the PL/Container log information is sent to a system service. On Red Hat 7 or CentOS 7 systems, the log information is sent to the journald service. On Red Hat 6 or CentOS 6 systems, the log information is sent to the syslogd service. The PL/Container log information is sent to the log file of the host were the Docker container runs.
- The Greenplum Database log information is sent to log file on the Greenplum Database master.
When testing or troubleshooting a PL/Container UDF, you can change the Greenplum Database log level with the SET command. You can set the parameter in the session before you run your PL/Container UDF. This example sets the log level to debug1.
SET log_min_messages='debug1' ;
Note: The parameter log_min_messages controls both the Greenplum Database and PL/Container logging, increasing the log level might affect Greenplum Database performance even if a PL/Container UDF is not running.
Installing Docker
To use PL/Container, Docker must be installed on all Greenplum Database host systems. The these instructions show how to set up the Docker service on CentOS 6 and CentOS 7. Installing on RHEL 6 or RHEL 7 is a similar process.
- The CentOS extras repository is accessible.
- The user has sudo privileges or is root.
See also the Docker site installation instructions for CentOShttps://docs.docker.com/engine/installation/linux/centos/. For a list of Docker commands, see the Docker engine Run Reference https://docs.docker.com/engine/reference/run/.
Installing Docker on CentOS 7
These steps install the docker package and start the docker service as a user with sudo privileges.
- Install dependencies required for
Docker
sudo yum install -y yum-utils device-mapper-persistent-data lvm2
- Add the Docker
repo
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
- Update yum cache
sudo yum makecache fast
- Install Docker
sudo yum -y install docker-ce
- Start Docker daemon.
sudo systemctl start docker
- To give access to the Docker daemon and docker commands, assign the Greenplum Database
administrator (gpadmin) to the group
docker.
sudo usermod -aG docker gpadmin
- Exit the session and login again to update the privileges.
- Run a Docker command to test the Docker installation. This command lists the currently
running Docker containers.
docker ps
sudo systemctl start docker.service
gpstop -ra
Installing Docker on CentOS 6
These steps install the Docker package and start the docker service as a user with sudo privileges.
- Install EPEL package
sudo yum -y install epel-release
- Install Docker
sudo yum -y install docker-io
- Create a docker group
sudo groupadd docker
- Start Docker
sudo service docker start
- To give access to the Docker daemon and docker commands, assign the Greenplum Database
administrator (gpadmin) to the group
docker.
sudo usermod -aG docker gpadmin
- Exit the session and login again to update the privileges.
- Run a Docker command to test the Docker installation. This command lists the currently
running Docker containers.
docker ps
sudo chkconfig docker on
gpstop -ra
References
Docker home page https://www.docker.com/
Docker command line interface https://docs.docker.com/engine/reference/commandline/cli/
Dockerfile reference https://docs.docker.com/engine/reference/builder/
Installing Docker on Linux systems https://docs.docker.com/engine/installation/linux/centos/
Control and configure Docker with systemd https://docs.docker.com/engine/admin/systemd/