Accessing the Database

A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 4.x documentation.

Accessing the Database

This chapter explains the various client tools you can use to connect to Greenplum Database, and how to establish a database session.

Establishing a Database Session

Users can connect to Greenplum Database using a PostgreSQL-compatible client program, such as psql. Users and administrators always connect to Greenplum Database through the master - the segments cannot accept client connections.

In order to establish a connection to the Greenplum Database master, you will need to know the following connection information and configure your client program accordingly.

Table 1. Connection Parameters
Connection Parameter Description Environment Variable
Application name The application name that is connecting to the database. The default value, held in the application_name connection parameter is psql. $PGAPPNAME
Database name The name of the database to which you want to connect. For a newly initialized system, use the template1 database to connect for the first time. $PGDATABASE
Host name The host name of the Greenplum Database master. The default host is the local host. $PGHOST
Port The port number that the Greenplum Database master instance is running on. The default is 5432. $PGPORT
User name The database user (role) name to connect as. This is not necessarily the same as your OS user name. Check with your Greenplum administrator if you are not sure what you database user name is. Note that every Greenplum Database system has one superuser account that is created automatically at initialization time. This account has the same name as the OS name of the user who initialized the Greenplum system (typically gpadmin). $PGUSER

Supported Client Applications

Users can connect to Greenplum Database using various client applications:

  • A number of Greenplum Database Client Applications are provided with your Greenplum installation. The psql client application provides an interactive command-line interface to Greenplum Database.
  • pgAdmin III for Greenplum Database is an enhanced version of the popular management tool pgAdmin III. Since version 1.10.0, the pgAdmin III client available from PostgreSQL Tools includes support for Greenplum-specific features. Installation packages are available for download from the pgAdmin download site.
  • Using standard Database Application Interfaces, such as ODBC and JDBC, users can create their own client applications that interface to Greenplum Database. Because Greenplum Database is based on PostgreSQL, it uses the standard PostgreSQL database drivers.
  • Most Third-Party Client Tools that use standard database interfaces, such as ODBC and JDBC, can be configured to connect to Greenplum Database.

Greenplum Database Client Applications

Greenplum Database comes installed with a number of client applications located in $GPHOME/bin of your Greenplum Database master host installation. The following are the most commonly used client applications:

Table 2. Commonly used client applications
Name Usage
createdb create a new database
createlang define a new procedural language
createuser define a new database role
dropdb remove a database
droplang remove a procedural language
dropuser remove a role
psql PostgreSQL interactive terminal
reindexdb reindex a database
vacuumdb garbage-collect and analyze a database

When using these client applications, you must connect to a database through the Greenplum master instance. You will need to know the name of your target database, the host name and port number of the master, and what database user name to connect as. This information can be provided on the command-line using the options -d, -h, -p, and -U respectively. If an argument is found that does not belong to any option, it will be interpreted as the database name first.

All of these options have default values which will be used if the option is not specified. The default host is the local host. The default port number is 5432. The default user name is your OS system user name, as is the default database name. Note that OS user names and Greenplum Database user names are not necessarily the same.

If the default values are not correct, you can set the environment variables PGDATABASE, PGHOST, PGPORT, and PGUSER to the appropriate values, or use a psql~/.pgpass file to contain frequently-used passwords. For information about Greenplum Database environment variables, see the Greenplum Database Reference Guide. For information about psql, see the Greenplum Database Utility Guide.

Connecting with psql

Depending on the default values used or the environment variables you have set, the following examples show how to access a database via psql:

$ psql -d gpdatabase -h master_host -p 5432 -U gpadmin
$ psql gpdatabase
$ psql

If a user-defined database has not yet been created, you can access the system by connecting to the template1 database. For example:

$ psql template1

After connecting to a database, psql provides a prompt with the name of the database to which psql is currently connected, followed by the string => (or =# if you are the database superuser). For example:

gpdatabase=>

At the prompt, you may type in SQL commands. A SQL command must end with a ; (semicolon) in order to be sent to the server and executed. For example:

=> SELECT * FROM mytable;

See the Greenplum Reference Guide for information about using the psql client application and SQL commands and syntax.

pgAdmin III for Greenplum Database

If you prefer a graphic interface, use pgAdmin III for Greenplum Database. This GUI client supports PostgreSQL databases with all standard pgAdmin III features, while adding support for Greenplum-specific features.

pgAdmin III for Greenplum Database supports the following Greenplum-specific features:

  • External tables
  • Append-optimized tables, including compressed append-optimized tables
  • Table partitioning
  • Resource queues
  • Graphical EXPLAIN ANALYZE
  • Greenplum server configuration parameters
    Figure 1. Greenplum Options in pgAdmin III

Installing pgAdmin III for Greenplum Database

The installation package for pgAdmin III for Greenplum Database is available for download from the official pgAdmin III download site (http://www.pgadmin.org). Installation instructions are included in the installation package.

Documentation for pgAdmin III for Greenplum Database

For general help on the features of the graphical interface, select Help contents from the Help menu.

For help with Greenplum-specific SQL support, select Greenplum Database Help from the Help menu. If you have an active internet connection, you will be directed to online Greenplum SQL reference documentation. Alternately, you can install the Greenplum Client Tools package. This package contains SQL reference documentation that is accessible to the help links in pgAdmin III.

Performing Administrative Tasks with pgAdmin III

This topic highlights two of the many Greenplum Database administrative tasks you can perform with pgAdmin III: editing the server configuration, and viewing a graphical representation of a query plan.

Editing Server Configuration

The pgAdmin III interface provides two ways to update the server configuration in postgresql.conf: locally, through the File menu, and remotely on the server through the Tools menu. Editing the server configuration remotely may be more convenient in many cases, because it does not require you to upload or copy postgresql.conf.

To edit server configuration remotely
  1. Connect to the server whose configuration you want to edit. If you are connected to multiple servers, make sure that the correct server is highlighted in the object browser in the left pane.
  2. Select Tools > Server Configuration > postgresql.conf. The Backend Configuration Editor opens, displaying the list of available and enabled server configuration parameters.
  3. Locate the parameter you want to edit, and double click on the entry to open the Configuration settings dialog.
  4. Enter the new value for the parameter, or select/deselect Enabled as desired and click OK.
  5. If the parameter can be enabled by reloading server configuration, click the green reload icon, or select File > Reload server. Many parameters require a full restart of the server.
Viewing a Graphical Query Plan

Using the pgAdmin III query tool, you can run a query with EXPLAIN to view the details of the query plan. The output includes details about operations unique to Greenplum distributed query processing such as plan slices and motions between segments. You can view a graphical depiction of the plan as well as the text-based data output.

To view a graphical query plan
  1. With the correct database highlighted in the object browser in the left pane, select Tools > Query tool.
  2. Enter the query by typing in the SQL Editor, dragging objects into the Graphical Query Builder, or opening a file.
  3. Select Query > Explain options and verify the following options:
    • Verbose — this must be deselected if you want to view a graphical depiction of the query plan
    • Analyze — select this option if you want to run the query in addition to viewing the plan
  4. Trigger the operation by clicking the Explain query option at the top of the pane, or by selecting Query > Explain.

    The query plan displays in the Output pane at the bottom of the screen. Select the Explain tab to view the graphical output. For example:

    Figure 2. Graphical Query Plan in pgAdmin III

Database Application Interfaces

You may want to develop your own client applications that interface to Greenplum Database. PostgreSQL provides a number of database drivers for the most commonly used database application programming interfaces (APIs), which can also be used with Greenplum Database. These drivers are not packaged with the Greenplum Database base distribution. Each driver is an independent PostgreSQL development project and must be downloaded, installed and configured to connect to Greenplum Database. The following drivers are available:

Table 3. Greenplum Database Interfaces
API PostgreSQL Driver Download Link
ODBC pgodbc Available in the Greenplum Database Connectivity package, which can be downloaded from Pivotal Network.
JDBC pgjdbc Available in the Greenplum Database Connectivity package, which can be downloaded from Pivotal Network.
Perl DBI pgperl https://www.postgresql.org/ftp/projects/gborg/pgperl/
Python DBI pygresql http://www.pygresql.org

General instructions for accessing a Greenplum Database with an API are:

  1. Download your programming language platform and respective API from the appropriate source. For example, you can get the Java development kit (JDK) and JDBC API from Sun.
  2. Write your client application according to the API specifications. When programming your application, be aware of the SQL support in Greenplum Database so you do not include any unsupported SQL syntax. See the Greenplum Database Reference Guide for more information.

Download the appropriate PostgreSQL driver and configure connectivity to your Greenplum Database master instance. Greenplum provides a client tools package that contains the supported database drivers for Greenplum Database. Download the client tools packagefrom Pivotal Network and documentation from Pivotal Documentation.

Third-Party Client Tools

Most third-party extract-transform-load (ETL) and business intelligence (BI) tools use standard database interfaces, such as ODBC and JDBC, and can be configured to connect to Greenplum Database. Greenplum has worked with the following tools on previous customer engagements and is in the process of becoming officially certified:

  • Business Objects
  • Microstrategy
  • Informatica Power Center
  • Microsoft SQL Server Integration Services (SSIS) and Reporting Services (SSRS)
  • Ascential Datastage
  • SAS
  • Cognos

Greenplum Professional Services can assist users in configuring their chosen third-party tool for use with Greenplum Database.

Troubleshooting Connection Problems

A number of things can prevent a client application from successfully connecting to Greenplum Database. This topic explains some of the common causes of connection problems and how to correct them.

Table 4. Common connection problems
Problem Solution
No pg_hba.conf entry for host or user To enable Greenplum Database to accept remote client connections, you must configure your Greenplum Database master instance so that connections are allowed from the client hosts and database users that will be connecting to Greenplum Database. This is done by adding the appropriate entries to the pg_hba.conf configuration file (located in the master instance’s data directory). For more detailed information, see Allowing Connections to Greenplum Database.
Greenplum Database is not running If the Greenplum Database master instance is down, users will not be able to connect. You can verify that the Greenplum Database system is up by running the gpstate utility on the Greenplum master host.
Network problems

Interconnect timeouts

If users connect to the Greenplum master host from a remote client, network problems can prevent a connection (for example, DNS host name resolution problems, the host system is down, and so on.). To ensure that network problems are not the cause, connect to the Greenplum master host from the remote client host. For example: pinghostname

If the system cannot resolve the host names and IP addresses of the hosts involved in Greenplum Database, queries and connections will fail. For some operations, connections to the Greenplum Database master use localhost and others use the actual host name, so you must be able to resolve both. If you encounter this error, first make sure you can connect to each host in your Greenplum Database array from the master host over the network. In the /etc/hosts file of the master and all segments, make sure you have the correct host names and IP addresses for all hosts involved in the Greenplum Database array. The 127.0.0.1 IP must resolve to localhost.

Too many clients already By default, Greenplum Database is configured to allow a maximum of 250 concurrent user connections on the master and 750 on a segment. A connection attempt that causes that limit to be exceeded will be refused. This limit is controlled by the max_connections parameter in the postgresql.conf configuration file of the Greenplum Database master. If you change this setting for the master, you must also make appropriate changes at the segments.