Defining Database Objects

A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 4.x documentation.

Defining Database Objects

This section covers data definition language (DDL) in Greenplum Database and how to create and manage database objects.

Creating objects in a Greenplum Database includes making up-front choices about data distribution, storage options, data loading, and other Greenplum features that will affect the ongoing performance of your database system. Understanding the options that are available and how the database will be used will help you make the right choices.

Most of the advanced Greenplum features are enabled with extensions to the SQL CREATE DDL statements.

Creating and Managing Databases

A Greenplum Database system is a single instance of Greenplum Database. There can be several separate Greenplum Database systems installed, but usually just one is selected by environment variable settings. See your Greenplum administrator for details.

There can be multiple databases in a Greenplum Database system. This is different from some database management systems (such as Oracle) where the database instance is the database. Although you can create many databases in a Greenplum system, client programs can connect to and access only one database at a time — you cannot cross-query between databases.

About Template Databases

Each new database you create is based on a template. Greenplum provides a default database, template1. Use template1 to connect to Greenplum Database for the first time. Greenplum Database uses template1 to create databases unless you specify another template. Do not create any objects in template1 unless you want those objects to be in every database you create.

Greenplum uses two other database templates, template0 and postgres, internally. Do not drop or modify template0 or postgres. You can use template0 to create a completely clean database containing only the standard objects predefined by Greenplum Database at initialization, especially if you modified template1.

Creating a Database

The CREATE DATABASE command creates a new database. For example:

=> CREATE DATABASE new_dbname;

To create a database, you must have privileges to create a database or be a Greenplum superuser. If you do not have the correct privileges, you cannot create a database. Contact your Greenplum administrator to either give you the necessary privilege or to create a database for you.

You can also use the client program createdb to create a database. For example, running the following command in a command line terminal connects to Greenplum Database using the provided host name and port and creates a database named mydatabase:

$ createdb -h masterhost -p 5432 mydatabase

The host name and port must match the host name and port of the installed Greenplum Database system.

Some objects, such as roles, are shared by all the databases in a Greenplum Database system. Other objects, such as tables that you create, are known only in the database in which you create them.

Cloning a Database

By default, a new database is created by cloning the standard system database template, template1. Any database can be used as a template when creating a new database, thereby providing the capability to 'clone' or copy an existing database and all objects and data within that database. For example:

=> CREATE DATABASE new_dbname TEMPLATE old_dbname;

Viewing the List of Databases

If you are working in the psql client program, you can use the \l meta-command to show the list of databases and templates in your Greenplum Database system. If using another client program and you are a superuser, you can query the list of databases from the pg_database system catalog table. For example:

=> SELECT datname from pg_database;

Altering a Database

The ALTER DATABASE command changes database attributes such as owner, name, or default configuration attributes. For example, the following command alters a database by setting its default schema search path (the search_path configuration parameter):

=> ALTER DATABASE mydatabase SET search_path TO myschema, public, pg_catalog;

To alter a database, you must be the owner of the database or a superuser.

Dropping a Database

The DROP DATABASE command drops (or deletes) a database. It removes the system catalog entries for the database and deletes the database directory on disk that contains the data. You must be the database owner or a superuser to drop a database, and you cannot drop a database while you or anyone else is connected to it. Connect to template1 (or another database) before dropping a database. For example:

=> \c template1
=> DROP DATABASE mydatabase;

You can also use the client program dropdb to drop a database. For example, the following command connects to Greenplum Database using the provided host name and port and drops the database mydatabase:

$ dropdb -h masterhost -p 5432 mydatabase
Warning: Dropping a database cannot be undone.

Creating and Managing Tablespaces

Tablespaces allow database administrators to have multiple file systems per machine and decide how to best use physical storage to store database objects. They are named locations within a filespace in which you can create objects. Tablespaces allow you to assign different storage for frequently and infrequently used database objects or to control the I/O performance on certain database objects. For example, place frequently-used tables on file systems that use high performance solid-state drives (SSD), and place other tables on standard hard drives.

A tablespace requires a file system location to store its database files. In Greenplum Database, the master and each segment (primary and mirror) require a distinct storage location. The collection of file system locations for all components in a Greenplum system is a filespace. Filespaces can be used by one or more tablespaces.

Creating a Filespace

A filespace sets aside storage for your Greenplum system. A filespace is a symbolic storage identifier that maps onto a set of locations in your Greenplum hosts' file systems. To create a filespace, prepare the logical file systems on all of your Greenplum hosts, then use the gpfilespace utility to define the filespace. You must be a database superuser to create a filespace.

Note: Greenplum Database is not directly aware of the file system boundaries on your underlying systems. It stores files in the directories that you tell it to use. You cannot control the location on disk of individual files within a logical file system.

To create a filespace using gpfilespace

  1. Log in to the Greenplum Database master as the gpadmin user.
    $ su - gpadmin
  2. Create a filespace configuration file:
    $ gpfilespace -o gpfilespace_config
  3. At the prompt, enter a name for the filespace, the primary segment file system locations, the mirror segment file system locations, and a master file system location. For example, if your configuration has 2 primary and 2 mirror segments per host:
    Enter a name for this filespace> fastdisk
    primary location 1> /gpfs1/seg1
    primary location 2> /gpfs1/seg2
    mirror location 1> /gpfs2/mir1
    mirror location 2> /gpfs2/mir2
    master location> /gpfs1/master
    
  4. gpfilespace creates a configuration file. Examine the file to verify that the gpfilespace configuration is correct.
  5. Run gpfilespace again to create the filespace based on the configuration file:
    $ gpfilespace -c gpfilespace_config

Moving the Location of Temporary or Transaction Files

You can move temporary or transaction files to a specific filespace to improve database performance when running queries, creating backups, and to store data more sequentially.

The dedicated filespace for temporary and transaction files is tracked in two separate flat files called gp_temporary_files_filespace and gp_transaction_files_filespace. These are located in the pg_system directory on each primary and mirror segment, and on master and standby. You must be a superuser to move temporary or transaction files. Only the gpfilespace utility can write to this file.

About Temporary and Transaction Files

Unless otherwise specified, temporary and transaction files are stored together with all user data. The default location of temporary files, <filespace_directory>/<tablespace_oid>/<database_oid>/pgsql_tmp is changed when you use gpfilespace --movetempfiles for the first time.

Also note the following information about temporary or transaction files:

  • You can dedicate only one filespace for temporary or transaction files, although you can use the same filespace to store other types of files.
  • You cannot drop a filespace if it used by temporary files.
  • You must create the filespace in advance. See Creating a Filespace.
To move temporary files using gpfilespace
  1. Check that the filespace exists and is different from the filespace used to store all other user data.
  2. Issue smart shutdown to bring the Greenplum Database offline.

    If any connections are still in progess,the gpfilespace --movetempfiles utility will fail.

  3. Bring Greenplum Database online with no active session and run the following command:
    gpfilespace --movetempfilespace filespace_name

    The location of the temporary files is stored in the segment configuration shared memory (PMModuleState) and used whenever temporary files are created, opened, or dropped.

  1. Check that the filespace exists and is different from the filespace used to store all other user data.
  2. Issue smart shutdown to bring the Greenplum Database offline.

    If any connections are still in progess,the gpfilespace --movetransfiles utility will fail.

  3. Bring Greenplum Database online with no active session and run the following command:
    gpfilespace --movetransfilespace filespace_name

    The location of the transaction files is stored in the segment configuration shared memory (PMModuleState) and used whenever transaction files are created, opened, or dropped.

Creating a Tablespace

After you create a filespace, use the CREATE TABLESPACE command to define a tablespace that uses that filespace. For example:

=# CREATE TABLESPACE fastspace FILESPACE fastdisk;

Database superusers define tablespaces and grant access to database users with the GRANTCREATE command. For example:

=# GRANT CREATE ON TABLESPACE fastspace TO admin;

Using a Tablespace to Store Database Objects

Users with the CREATE privilege on a tablespace can create database objects in that tablespace, such as tables, indexes, and databases. The command is:

CREATE TABLE tablename(options) TABLESPACE spacename

For example, the following command creates a table in the tablespace space1:

CREATE TABLE foo(i int) TABLESPACE space1;

You can also use the default_tablespace parameter to specify the default tablespace for CREATE TABLE and CREATE INDEX commands that do not specify a tablespace:

SET default_tablespace = space1;
CREATE TABLE foo(i int);

The tablespace associated with a database stores that database's system catalogs, temporary files created by server processes using that database, and is the default tablespace selected for tables and indexes created within the database, if no TABLESPACE is specified when the objects are created. If you do not specify a tablespace when you create a database, the database uses the same tablespace used by its template database.

You can use a tablespace from any database if you have appropriate privileges.

Viewing Existing Tablespaces and Filespaces

Every Greenplum Database system has the following default tablespaces.

  • pg_global for shared system catalogs.
  • pg_default, the default tablespace. Used by the template1 and template0 databases.

These tablespaces use the system default filespace, pg_system, the data directory location created at system initialization.

To see filespace information, look in the pg_filespace and pg_filespace_entry catalog tables. You can join these tables with pg_tablespace to see the full definition of a tablespace. For example:

=# SELECT spcname as tblspc, fsname as filespc, 
          fsedbid as seg_dbid, fselocation as datadir 
   FROM   pg_tablespace pgts, pg_filespace pgfs, 
          pg_filespace_entry pgfse 
   WHERE  pgts.spcfsoid=pgfse.fsefsoid 
          AND pgfse.fsefsoid=pgfs.oid 
   ORDER BY tblspc, seg_dbid;

Dropping Tablespaces and Filespaces

To drop a tablespace, you must be the tablespace owner or a superuser. You cannot drop a tablespace until all objects in all databases using the tablespace are removed.

Only a superuser can drop a filespace. A filespace cannot be dropped until all tablespaces using that filespace are removed.

The DROP TABLESPACE command removes an empty tablespace.

The DROP FILESPACE command removes an empty filespace.

Note: You cannot drop a filespace if it stores temporary or transaction files.

Creating and Managing Schemas

Schemas logically organize objects and data in a database. Schemas allow you to have more than one object (such as tables) with the same name in the database without conflict if the objects are in different schemas.

The Default "Public" Schema

Every database has a default schema named public. If you do not create any schemas, objects are created in the public schema. All database roles (users) have CREATE and USAGE privileges in the public schema. When you create a schema, you grant privileges to your users to allow access to the schema.

Creating a Schema

Use the CREATE SCHEMA command to create a new schema. For example:

=> CREATE SCHEMA myschema;

To create or access objects in a schema, write a qualified name consisting of the schema name and table name separated by a period. For example:

myschema.table

See Schema Search Paths for information about accessing a schema.

You can create a schema owned by someone else, for example, to restrict the activities of your users to well-defined namespaces. The syntax is:

=> CREATE SCHEMA schemaname AUTHORIZATION username;

Schema Search Paths

To specify an object's location in a database, use the schema-qualified name. For example:

=> SELECT * FROM myschema.mytable;

You can set the search_path configuration parameter to specify the order in which to search the available schemas for objects. The schema listed first in the search path becomes the default schema. If a schema is not specified, objects are created in the default schema.

Setting the Schema Search Path

The search_path configuration parameter sets the schema search order. The ALTER DATABASE command sets the search path. For example:

=> ALTER DATABASE mydatabase SET search_path TO myschema, 
public, pg_catalog;

You can also set search_path for a particular role (user) using the ALTER ROLE command. For example:

=> ALTER ROLE sally SET search_path TO myschema, public, 
pg_catalog;

Viewing the Current Schema

Use the current_schema() function to view the current schema. For example:

=> SELECT current_schema();

Use the SHOW command to view the current search path. For example:

=> SHOW search_path;

Dropping a Schema

Use the DROP SCHEMA command to drop (delete) a schema. For example:

=> DROP SCHEMA myschema;

By default, the schema must be empty before you can drop it. To drop a schema and all of its objects (tables, data, functions, and so on) use:

=> DROP SCHEMA myschema CASCADE;

System Schemas

The following system-level schemas exist in every database:

  • pg_catalog contains the system catalog tables, built-in data types, functions, and operators. It is always part of the schema search path, even if it is not explicitly named in the search path.
  • information_schema consists of a standardized set of views that contain information about the objects in the database. These views get system information from the system catalog tables in a standardized way.
  • pg_toast stores large objects such as records that exceed the page size. This schema is used internally by the Greenplum Database system.
  • pg_bitmapindex stores bitmap index objects such as lists of values. This schema is used internally by the Greenplum Database system.
  • pg_aoseg stores append-optimized table objects. This schema is used internally by the Greenplum Database system.
  • gp_toolkit is an administrative schema that contains external tables, views, and functions that you can access with SQL commands. All database users can access gp_toolkit to view and query the system log files and other system metrics.

Creating and Managing Tables

Greenplum Database tables are similar to tables in any relational database, except that table rows are distributed across the different segments in the system. When you create a table, you specify the table's distribution policy.

Creating a Table

The CREATE TABLE command creates a table and defines its structure. When you create a table, you define:

Choosing Column Data Types

The data type of a column determines the types of data values the column can contain. Choose the data type that uses the least possible space but can still accommodate your data and that best constrains the data. For example, use character data types for strings, date or timestamp data types for dates, and numeric data types for numbers.

There are no performance differences among the character data types CHAR, VARCHAR, and TEXT apart from the increased storage size when you use the blank-padded type. In most situations, use TEXT or VARCHAR rather than CHAR.

Use the smallest numeric data type that will accommodate your numeric data and allow for future expansion. For example, using BIGINT for data that fits in INT or SMALLINT wastes storage space. If you expect that your data values will expand over time, consider that changing from a smaller datatype to a larger datatype after loading large amounts of data is costly. For example, if your current data values fit in a SMALLINT but it is likely that the values will expand, INT is the better long-term choice.

Use the same data types for columns that you plan to use in cross-table joins. Cross-table joins usually use the primary key in one table and a foreign key in the other table. When the data types are different, the database must convert one of them so that the data values can be compared correctly, which adds unnecessary overhead.

Greenplum Database has a rich set of native data types available to users. See the Greenplum Database Reference Guide for information about the built-in data types.

Setting Table and Column Constraints

You can define constraints on columns and tables to restrict the data in your tables. Greenplum Database support for constraints is the same as PostgreSQL with some limitations, including:

  • CHECK constraints can refer only to the table on which they are defined.
  • UNIQUE and PRIMARY KEY constraints must be compatible with their tableʼs distribution key and partitioning key, if any.
  • FOREIGN KEY constraints are allowed, but not enforced.
  • Constraints that you define on partitioned tables apply to the partitioned table as a whole. You cannot define constraints on the individual parts of the table.
Check Constraints

Check constraints allow you to specify that the value in a certain column must satisfy a Boolean (truth-value) expression. For example, to require positive product prices:

=> CREATE TABLE products 
            ( product_no integer, 
              name text, 
              price numeric CHECK (price > 0) );
Not-Null Constraints

Not-null constraints specify that a column must not assume the null value. A not-null constraint is always written as a column constraint. For example:

=> CREATE TABLE products 
       ( product_no integer NOT NULL,
         name text NOT NULL,
         price numeric );
Unique Constraints

Unique constraints ensure that the data contained in a column or a group of columns is unique with respect to all the rows in the table. The table must be hash-distributed (not DISTRIBUTED RANDOMLY), and the constraint columns must be the same as (or a superset of) the table's distribution key columns. For example:

=> CREATE TABLE products 
       ( product_no integer UNIQUE, 
         name text, 
         price numeric)
      DISTRIBUTED BY (product_no);
Primary Keys

A primary key constraint is a combination of a UNIQUE constraint and a NOT NULL constraint. The table must be hash-distributed (not DISTRIBUTED RANDOMLY), and the primary key columns must be the same as (or a superset of) the table's distribution key columns. If a table has a primary key, this column (or group of columns) is chosen as the distribution key for the table by default. For example:

=> CREATE TABLE products 
       ( product_no integer PRIMARY KEY, 
         name text, 
         price numeric)
      DISTRIBUTED BY (product_no);
Foreign Keys

Foreign keys are not supported. You can declare them, but referential integrity is not enforced.

Foreign key constraints specify that the values in a column or a group of columns must match the values appearing in some row of another table to maintain referential integrity between two related tables. Referential integrity checks cannot be enforced between the distributed table segments of a Greenplum database.

Choosing the Table Distribution Policy

All Greenplum Database tables are distributed. When you create or alter a table, you optionally specify DISTRIBUTED BY (hash distribution) or DISTRIBUTED RANDOMLY (round-robin distribution) to determine the table row distribution.

Consider the following points when deciding on a table distribution policy.

  • Even Data Distribution — For the best possible performance, all segments should contain equal portions of data. If the data is unbalanced or skewed, the segments with more data must work harder to perform their portion of the query processing. Choose a distribution key that is unique for each record, such as the primary key.
  • Local and Distributed Operations — Local operations are faster than distributed operations. Query processing is fastest if the work associated with join, sort, or aggregation operations is done locally, at the segment level. Work done at the system level requires distributing tuples across the segments, which is less efficient. When tables share a common distribution key, the work of joining or sorting on their shared distribution key columns is done locally. With a random distribution policy, local join operations are not an option.
  • Even Query Processing — For best performance, all segments should handle an equal share of the query workload. Query workload can be skewed if a table's data distribution policy and the query predicates are not well matched. For example, suppose that a sales transactions table is distributed based on a column that contains corporate names (the distribution key), and the hashing algorithm distributes the data based on those values. If a predicate in a query references a single value from the distribution key, query processing runs on only one segment. This works if your query predicates usually select data on a criteria other than corporation name. For queries that use corporation name in their predicates, it's possible that only one segment instance will handle the query workload.
Declaring Distribution Keys

CREATE TABLE's optional clauses DISTRIBUTED BY and DISTRIBUTED RANDOMLY specify the distribution policy for a table. The default is a hash distribution policy that uses either the PRIMARY KEY (if the table has one) or the first column of the table as the distribution key. Columns with geometric or user-defined data types are not eligible as Greenplum distribution key columns. If a table does not have an eligible column, Greenplum distributes the rows randomly or in round-robin fashion.

To ensure even distribution of data, choose a distribution key that is unique for each record. If that is not possible, choose DISTRIBUTED RANDOMLY. For example:

=> CREATE TABLE products
                        (name varchar(40),
                         prod_id integer,
                         supplier_id integer)
             DISTRIBUTED BY (prod_id);
=> CREATE TABLE random_stuff
                        (things text,
                         doodads text,
                         etc text)
             DISTRIBUTED RANDOMLY;

Choosing the Table Storage Model

Greenplum Database supports several storage models and a mix of storage models. When you create a table, you choose how to store its data. This topic explains the options for table storage and how to choose the best storage model for your workload.

Heap Storage

By default, Greenplum Database uses the same heap storage model as PostgreSQL. Heap table storage works best with OLTP-type workloads where the data is often modified after it is initially loaded. UPDATE and DELETE operations require storing row-level versioning information to ensure reliable database transaction processing. Heap tables are best suited for smaller tables, such as dimension tables, that are often updated after they are initially loaded.

Append-Optimized Storage

Append-optimized table storage works best with denormalized fact tables in a data warehouse environment. Denormalized fact tables are typically the largest tables in the system. Fact tables are usually loaded in batches and accessed by read-only queries. Moving large fact tables to an append-optimized storage model eliminates the storage overhead of the per-row update visibility information, saving about 20 bytes per row. This allows for a leaner and easier-to-optimize page structure. The storage model of append-optimized tables is optimized for bulk data loading. Single row INSERT statements are not recommended.

To create a heap table

Row-oriented heap tables are the default storage type.

=> CREATE TABLE foo (a int, b text) DISTRIBUTED BY (a);

Use the WITH clause of the CREATE TABLE command to declare the table storage options. The default is to create the table as a regular row-oriented heap-storage table. For example, to create an append-optimized table with no compression:

=> CREATE TABLE bar (a int, b text) 
    WITH (appendonly=true)
    DISTRIBUTED BY (a);

UPDATE and DELETE are not allowed on append-optimized tables in a serializable transaction and will cause the transaction to abort. CLUSTER, DECLARE...FOR UPDATE, and triggers are not supported with append-optimized tables.

Choosing Row or Column-Oriented Storage

Greenplum provides a choice of storage orientation models: row, column, or a combination of both. This topic provides general guidelines for choosing the optimum storage orientation for a table. Evaluate performance using your own data and query workloads.

  • Row-oriented storage: good for OLTP types of workloads with many interative transactions and many columns of a single row needed all at once, so retrieving is efficient.
  • Column-oriented storage: good for data warehouse workloads with aggregations of data computed over a small number of columns, or for single columns that require regular updates without modifying other column data.

For most general purpose or mixed workloads, row-oriented storage offers the best combination of flexibility and performance. However, there are use cases where a column-oriented storage model provides more efficient I/O and storage. Consider the following requirements when deciding on the storage orientation model for a table:

  • Updates of table data. If you load and update the table data frequently, choose a row-oriented heap table. Column-oriented table storage is only available on append-optimized tables. See Heap Storage for more information.
  • Frequent INSERTs. If rows are frequently inserted into the table, consider a row-oriented model. Column-oriented tables are not optimized for write operations, as column values for a row must be written to different places on disk.
  • Number of columns requested in queries. If you typically request all or the majority of columns in the SELECT list or WHERE clause of your queries, consider a row-oriented model. Column-oriented tables are best suited to queries that aggregate many values of a single column where the WHERE or HAVING predicate is also on the aggregate column. For example:
    SELECT SUM(salary)...
    SELECT AVG(salary)... WHERE salary > 10000

    Or where the WHERE predicate is on a single column and returns a relatively small number of rows. For example:

    SELECT salary, dept ... WHERE state='CA'
  • Number of columns in the table. Row-oriented storage is more efficient when many columns are required at the same time, or when the row-size of a table is relatively small. Column-oriented tables can offer better query performance on tables with many columns where you access a small subset of columns in your queries.
  • Compression. Column data has the same data type, so storage size optimizations are available in column-oriented data that are not available in row-oriented data. For example, many compression schemes use the similarity of adjacent data to compress. However, the greater adjacent compression achieved, the more difficult random access can become, as data must be uncompressed to be read.

To create a column-oriented table

The WITH clause of the CREATE TABLE command specifies the table's storage options. The default is a row-oriented heap table. Tables that use column-oriented storage must be append-optimized tables. For example, to create a column-oriented table:

=> CREATE TABLE bar (a int, b text) 
    WITH (appendonly=true, orientation=column)
    DISTRIBUTED BY (a);

Using Compression (Append-Optimized Tables Only)

There are two types of in-database compression available in the Greenplum Database for append-optimized tables:

  • Table-level compression is applied to an entire table.
  • Column-level compression is applied to a specific column. You can apply different column-level compression algorithms to different columns.

The following table summarizes the available compression algorithms.

Table 1. Compression Algorithms for Append-Optimized Tables
Table Orientation Available Compression Types Supported Algorithms
Row Table ZLIB and QUICKLZ
Column Column and Table RLE_TYPE, ZLIB, and QUICKLZ

When choosing a compression type and level for append-optimized tables, consider these factors:

  • CPU usage. Your segment systems must have the available CPU power to compress and uncompress the data.
  • Compression ratio/disk size. Minimizing disk size is one factor, but also consider the time and CPU capacity required to compress and scan data. Find the optimal settings for efficiently compressing data without causing excessively long compression times or slow scan rates.
  • Speed of compression. QuickLZ compression generally uses less CPU capacity and compresses data faster at a lower compression ratio than zlib. zlib provides higher compression ratios at lower speeds.

    For example, at compression level 1 (compresslevel=1), QuickLZ and zlib have comparable compression ratios, though at different speeds. Using zlib with compresslevel=6 can significantly increase the compression ratio compared to QuickLZ, though with lower compression speed.

  • Speed of decompression/scan rate. Performance with compressed append-optimized tables depends on hardware, query tuning settings, and other factors. Perform comparison testing to determine the actual performance in your environment.
    Note: Do not use compressed append-optimized tables on file systems that use compression. If the file system on which your segment data directory resides is a compressed file system, your append-optimized table must not use compression.

Performance with compressed append-optimized tables depends on hardware, query tuning settings, and other factors. Greenplum recommends performing comparison testing to determine the actual performance in your environment.

Note: QuickLZ compression level can only be set to level 1; no other options are available. Compression level with zlib can be set at values from 1 - 9. Compression level with RLE can be set at values from 1 - 4.

When an ENCODING clause conflicts with a WITH clause, the ENCODING clause has higher precedence than the WITH clause.

To create a compressed table

The WITH clause of the CREATE TABLE command declares the table storage options. Tables that use compression must be append-optimized tables. For example, to create an append-optimized table with zlib compression at a compression level of 5:

=> CREATE TABLE foo (a int, b text) 
   WITH (appendonly=true, compresstype=zlib, compresslevel=5);

Checking the Compression and Distribution of an Append-Optimized Table

Greenplum provides built-in functions to check the compression ratio and the distribution of an append-optimized table. The functions take either the object ID or a table name. You can qualify the table name with a schema name.

Table 2. Functions for compressed append-optimized table metadata
Function Return Type Description
get_ao_distribution(name)

get_ao_distribution(oid)

Set of (dbid, tuplecount) rows Shows the distribution of an append-optimized table's rows across the array. Returns a set of rows, each of which includes a segment dbid and the number of tuples stored on the segment.
get_ao_compression_ratio(name)

get_ao_compression_ratio(oid)

float8 Calculates the compression ratio for a compressed append-optimized table. If information is not available, this function returns a value of -1.

The compression ratio is returned as a common ratio. For example, a returned value of 3.19, or 3.19:1, means that the uncompressed table is slightly larger than three times the size of the compressed table.

The distribution of the table is returned as a set of rows that indicate how many tuples are stored on each segment. For example, in a system with four primary segments with dbid values ranging from 0 - 3, the function returns four rows similar to the following:

=# SELECT get_ao_distribution('lineitem_comp');
 get_ao_distribution
---------------------
(0,7500721)
(1,7501365)
(2,7499978)
(3,7497731)
(4 rows)

Support for Run-length Encoding

Greenplum Database supports Run-length Encoding (RLE) for column-level compression. RLE data compression stores repeated data as a single data value and a count. For example, in a table with two columns, a date and a description, that contains 200,000 entries containing the value date1 and 400,000 entries containing the value date2, RLE compression for the date field is similar to date1 200000 date2 400000. RLE is not useful with files that do not have large sets of repeated data as it can greatly increase the file size.

There are four levels of RLE compression available. The levels progressively increase the compression ratio, but decrease the compression speed.

Greenplum Database versions 4.2.1 and later support column-oriented RLE compression. To backup a table with RLE compression that you intend to restore to an earlier version of Greenplum Database, alter the table to have no compression or a compression type supported in the earlier version (ZLIB or QUICKLZ) before you start the backup operation.

In Greenplum Database 4.3.3 and later, Greenplum Database combines delta compression with RLE compression for data in columns of type BIGINT, INTEGER, DATE, TIME, or TIMESTAMP. The delta compression algorithm is based on the change between consecutive column values and is designed to improve compression when data is loaded in sorted order or when the compression is applied to data in sorted order.

When Greenplum Database is upgraded to 4.3.3, these rules apply for data in columns that are compressed with RLE:
  • Existing column data are compressed with only RLE compression.
  • New data are compressed with delta compression combined with RLE compression in the columns of type that support it.

If switching the Greenplum Database binary from 4.3.3 to 4.3.2 is required, the following steps are recommended.

  1. Alter append-optimized, column oriented tables with RLE compression columns to use either no compression or a compression type ZLIB or QUICKLZ.
  2. Back up the database.
Note: If you backup a table that uses RLE column compression from a Greenplum Database 4.3.3, you can restore the table in Greenplum Database 4.3.2. However, the compression in the Greenplum Database 4.3.2 is RLE compression, not RLE compression combined with delta compression.

Adding Column-level Compression

You can add the following storage directives to a column for append-optimized tables with row or column orientation:

  • Compression type
  • Compression level
  • Block size for a column

Add storage directives using the CREATE TABLE, ALTER TABLE, and CREATE TYPE commands.

The following table details the types of storage directives and possible values for each.

Table 3. Storage Directives for Column-level Compression
Name Definition Values Comment
COMPRESSTYPE Type of compression. zlib: deflate algorithm

quicklz: fast compression

RLE_TYPE: run-length encoding

none: no compression

Values are not case-sensitive.
COMPRESSLEVEL Compression level. zlib compression: 1-9 1 is the fastest method with the least compression. 1 is the default.

9 is the slowest method with the most compression.

QuickLZ compression:

1 – use compression

1 is the default.
RLE_TYPE compression: 14

1 - apply RLE only

2 - apply RLE then apply zlib compression level 1

3 - apply RLE then apply zlib compression level 5

4 - apply RLE then apply zlib compression level 9

1 is the fastest method with the least compression.

4 is the slowest method with the most compression. 1 is the default.

BLOCKSIZE The size in bytes for each block in the table 8192 – 2097152 The value must be a multiple of 8192.

The following is the format for adding storage directives.

[ ENCODING ( storage_directive [,…] ) ] 

where the word ENCODING is required and the storage directive has three parts:

  • The name of the directive
  • An equals sign
  • The specification

Separate multiple storage directives with a comma. Apply a storage directive to a single column or designate it as the default for all columns, as shown in the following CREATE TABLE clauses.

General Usage:

column_name data_type ENCODING ( storage_directive [, … ] ), …  
COLUMN column_name ENCODING ( storage_directive [, … ] ), … 
DEFAULT COLUMN ENCODING ( storage_directive [, … ] )

Example:

C1 char ENCODING (compresstype=quicklz, blocksize=65536) 
COLUMN C1 ENCODING (compresstype=quicklz, blocksize=65536)
DEFAULT COLUMN ENCODING (compresstype=quicklz)

Default Compression Values

If the compression type, compression level and block size are not defined, the default is no compression, and the block size is set to the Server Configuration Parameter block_size.

Precedence of Compression Settings

Column compression settings are inherited from the table level to the partition level to the subpartition level. The lowest-level settings have priority.

  • Column compression settings specified for subpartitions override any compression settings at the partition, column or table levels.
  • Column compression settings specified for partitions override any compression settings at the column or table levels.
  • Column compression settings specified at the table level override any compression settings for the entire table.
  • When an ENCODING clause conflicts with a WITH clause, the ENCODING clause has higher precedence than the WITH clause.
Note: The INHERITS clause is not allowed in a table that contains a storage directive or a column reference storage directive.

Tables created using the LIKE clause ignore storage directive and column reference storage directives.

Optimal Location for Column Compression Settings

The best practice is to set the column compression settings at the level where the data resides. See Example 5, which shows a table with a partition depth of 2. RLE_TYPE compression is added to a column at the subpartition level.

Storage Directives Examples

The following examples show the use of storage directives in CREATE TABLE statements.

Example 1

In this example, column c1 is compressed using zlib and uses the block size defined by the system. Column c2 is compressed with quicklz, and uses a block size of 65536. Column c3 is not compressed and uses the block size defined by the system.

CREATE TABLE T1 (c1 int ENCODING (compresstype=zlib),
                  c2 char ENCODING (compresstype=quicklz, blocksize=65536),
                  c3 char     WITH (appendonly=true, orientation=column);
Example 2

In this example, column c1 is compressed using zlib and uses the block size defined by the system. Column c2 is compressed with quicklz, and uses a block size of 65536. Column c3 is compressed using RLE_TYPE and uses the block size defined by the system.

CREATE TABLE T2 (c1 int ENCODING (compresstype=zlib),
                  c2 char ENCODING (compresstype=quicklz, blocksize=65536),
                  c3 char,
                  COLUMN c3 ENCODING (RLE_TYPE)
                  )
    WITH (appendonly=true, orientation=column)
Example 3

In this example, column c1 is compressed using zlib and uses the block size defined by the system. Column c2 is compressed with quicklz, and uses a block size of 65536. Column c3 is compressed using zlib and uses the block size defined by the system. Note that column c3 uses zlib (not RLE_TYPE) in the partitions, because the column storage in the partition clause has precedence over the storage directive in the column definition for the table.

CREATE TABLE T3 (c1 int ENCODING (compresstype=zlib),
                  c2 char ENCODING (compresstype=quicklz, blocksize=65536),
                  c3 char, COLUMN c3 ENCODING (compresstype=RLE_TYPE) )
    WITH (appendonly=true, orientation=column)
    PARTITION BY RANGE (c3) (START ('1900-01-01'::DATE)          
                             END ('2100-12-31'::DATE),
                             COLUMN c3 ENCODING (zlib));
Example 4

In this example, CREATE TABLE assigns a storage directive to c1. Column c2 has no storage directive and inherits the compression type (quicklz) and block size (65536) from the DEFAULT COLUMN ENCODING clause.

Column c3's ENCODING clause defines its compression type, RLE_TYPE. The DEFAULT COLUMN ENCODING clause defines c3's block size, 65536.

The ENCODING clause defined for a specific column overrides the DEFAULT ENCODING clause, so column c4 has a compress type of none and the default block size.

CREATE TABLE T4 (c1 int ENCODING (compresstype=zlib),
                  c2 char,
                  c4 smallint ENCODING (compresstype=none),
                  DEFAULT COLUMN ENCODING (compresstype=quicklz,
                                             blocksize=65536),
                  COLUMN c3 ENCODING (compresstype=RLE_TYPE)
                  ) 
   WITH (appendonly=true, orientation=column);
Example 5

This example creates an append-optimized, column-oriented table, T5. T5 has two partitions, p1 and p2, each of which has subpartitions. Each subpartition has ENCODING clauses:

  • The ENCODING clause for partition p1's subpartition sp1 defines column i's compression type as zlib and block size as 65536.
  • The ENCODING clauses for partition p2's subpartition sp1 defines column i's compression type as rle_type and block size is the default value. Column k uses the default compression and its block size is 8192.
    CREATE TABLE T5(i int, j int, k int, l int) 
        WITH (appendonly=true, orientation=column)
        PARTITION BY range(i) SUBPARTITION BY range(j)
        (
           p1 start(1) end(2)
           ( subpartition sp1 start(1) end(2) 
             column i encoding(compresstype=zlib, blocksize=65536)
           ), 
           partition p2 start(2) end(3)
           ( subpartition sp1 start(1) end(2)
               column i encoding(compresstype=rle_type)
               column k encoding(blocksize=8192)
           )
        );

For an example showing how to add a compressed column to an existing table with the ALTER TABLE command, see Adding a Compressed Column to Table.

Adding Compression in a TYPE Command

You can define a compression type to simplify column compression statements. For example, the following CREATE TYPE command defines a compression type, comptype, that specifies quicklz compression.

where comptype is defined as:

CREATE TYPE comptype (
   internallength = 4,
   input = comptype_in,
   output = comptype_out,
   alignment = int4,
   default = 123,
   passedbyvalue,
   compresstype="quicklz",
   blocksize=65536,
   compresslevel=1
   );

You can then use comptype in a CREATE TABLE command to specify quicklz compression for a column:

CREATE TABLE t2 (c1 comptype)
    WITH (APPENDONLY=true, ORIENTATION=column);

For information about creating and adding compression parameters to a type, see CREATE TYPE. For information about changing compression specifications in a type, see ALTER TYPE.

Choosing Block Size

The blocksize is the size, in bytes, for each block in a table. Block sizes must be between 8192 and 2097152 bytes, and be a multiple of 8192. The default is 32768.

Specifying large block sizes can consume large amounts of memory. Block size determines buffering in the storage layer. Greenplum maintains a buffer per partition, and per column in column-oriented tables. Tables with many partitions or columns consume large amounts of memory.

Altering a Table

The ALTER TABLE command changes the definition of a table. Use ALTER TABLE to change table attributes such as column definitions, distribution policy, storage model, and partition structure (see also Maintaining Partitioned Tables). For example, to add a not-null constraint to a table column:

=> ALTER TABLE address ALTER COLUMN street SET NOT NULL;

Altering Table Distribution

ALTER TABLE provides options to change a table's distribution policy . When the table distribution options change, the table data is redistributed on disk, which can be resource intensive. You can also redistribute table data using the existing distribution policy.

Changing the Distribution Policy

For partitioned tables, changes to the distribution policy apply recursively to the child partitions. This operation preserves the ownership and all other attributes of the table. For example, the following command redistributes the table sales across all segments using the customer_id column as the distribution key:

ALTER TABLE sales SET DISTRIBUTED BY (customer_id); 

When you change the hash distribution of a table, table data is automatically redistributed. Changing the distribution policy to a random distribution does not cause the data to be redistributed. For example:

ALTER TABLE sales SET DISTRIBUTED RANDOMLY;

Redistributing Table Data

To redistribute table data for tables with a random distribution policy (or when the hash distribution policy has not changed) use REORGANIZE=TRUE. Reorganizing data may be necessary to correct a data skew problem, or when segment resources are added to the system. For example, the following command redistributes table data across all segments using the current distribution policy, including random distribution.

ALTER TABLE sales SET WITH (REORGANIZE=TRUE);

Altering the Table Storage Model

Table storage, compression, and orientation can be declared only at creation. To change the storage model, you must create a table with the correct storage options, load the original table data into the new table, drop the original table, and rename the new table with the original table's name. You must also re-grant any table permissions. For example:

CREATE TABLE sales2 (LIKE sales) 
WITH (appendonly=true, compresstype=quicklz, 
      compresslevel=1, orientation=column);
INSERT INTO sales2 SELECT * FROM sales;
DROP TABLE sales;
ALTER TABLE sales2 RENAME TO sales;
GRANT ALL PRIVILEGES ON sales TO admin;
GRANT SELECT ON sales TO guest;

See Exchanging a Partition to learn how to change the storage model of a partitioned table.

Adding a Compressed Column to Table

Use ALTER TABLE command to add a compressed column to a table. All of the options and constraints for compressed columns described in Adding Column-level Compression apply to columns added with the ALTER TABLE command.

The following example shows how to add a column with zlib compression to a table, T1.

ALTER TABLE T1
      ADD COLUMN c4 int DEFAULT 0
      ENCODING (COMPRESSTYPE=zlib);
Inheritance of Compression Settings

A partition that is added to a table that has subpartitions with compression settings inherits the compression settings from the subpartition.The following example shows how to create a table with subpartition encodings, then alter it to add a partition.

CREATE TABLE ccddl (i int, j int, k int, l int)
  WITH
    (APPENDONLY = TRUE, ORIENTATION=COLUMN)
  PARTITION BY range(j)
  SUBPARTITION BY list (k)
  SUBPARTITION template(
    SUBPARTITION sp1 values(1, 2, 3, 4, 5),
    COLUMN i ENCODING(COMPRESSTYPE=ZLIB),
    COLUMN j ENCODING(COMPRESSTYPE=QUICKLZ),
    COLUMN k ENCODING(COMPRESSTYPE=ZLIB),
    COLUMN l ENCODING(COMPRESSTYPE=ZLIB))
  (PARTITION p1 START(1) END(10),
   PARTITION p2 START(10) END(20))
;

ALTER TABLE ccddl
  ADD PARTITION p3 START(20) END(30)
;

Running the ALTER TABLE command creates partitions of table ccddl named ccddl_1_prt_p3 and ccddl_1_prt_p3_2_prt_sp1. Partition ccddl_1_prt_p3 inherits the different compression encodings of subpartition sp1.

Dropping a Table

The DROP TABLE command removes tables from the database. For example:

DROP TABLE mytable;

To empty a table of rows without removing the table definition, use DELETE or TRUNCATE. For example:

DELETE FROM mytable;

TRUNCATE mytable;

DROP TABLE always removes any indexes, rules, triggers, and constraints that exist for the target table. Specify CASCADE to drop a table that is referenced by a view. CASCADE removes dependent views.

Partitioning Large Tables

Table partitioning enables supporting very large tables, such as fact tables, by logically dividing them into smaller, more manageable pieces. Partitioned tables can improve query performance by allowing the Greenplum Database query planner to scan only the data needed to satisfy a given query instead of scanning all the contents of a large table.

Partitioning does not change the physical distribution of table data across the segments. Table distribution is physical: Greenplum Database physicially divides partitioned tables and non-partitioned tables across segments to enable parallel query processing. Table partitioning is logical: Greenplum Database logically divides big tables to improve query performance and facilitate data warehouse maintenance tasks, such as rolling old data out of the data warehouse.

Greenplum Database supports:

  • range partitioning: division of data based on a numerical range, such as date or price.
  • list partitioning: division of data based on a list of values, such as sales territory or product line.
  • A combination of both types.
Figure 1. Example Multi-level Partition Design

Table Partitioning in Greenplum Database

Greenplum Database divides tables into parts (also known as partitions) to enable massively parallel processing. Tables are partitioned during CREATE TABLE using the PARTITION BY (and optionally the SUBPARTITION BY) clause. When you partition a table in Greenplum Database, you create a top-level (or parent) table with one or more levels of sub-tables (or child tables). Internally, Greenplum Database creates an inheritance relationship between the top-level table and its underlying partitions, similar to the functionality of the INHERITS clause of PostgreSQL.

Greenplum uses the partition criteria defined during table creation to create each partition with a distinct CHECK constraint, which limits the data that table can contain. The query planner uses CHECK constraints to determine which table partitions to scan to satisfy a given query predicate.

The Greenplum system catalog stores partition hierarchy information so that rows inserted into the top-level parent table propagate correctly to the child table partitions. To change the partition design or table structure, alter the parent table using ALTER TABLE with the PARTITION clause.

Execution of INSERT, UPDATE and DELETE commands directly on a specific partition (child table) of a partitioned table is not supported. Instead, these commands must be executed on the root partitioned table, the table created with the CREATE TABLE command.

Deciding on a Table Partitioning Strategy

Not all tables are good candidates for partitioning. If the answer is yes to all or most of the following questions, table partitioning is a viable database design strategy for improving query performance. If the answer is no to most of the following questions, table partitioning is not the right solution for that table. Test your design strategy to ensure that query performance improves as expected.

  • Is the table large enough? Large fact tables are good candidates for table partitioning. If you have millions or billions of records in a table, you will see performance benefits from logically breaking that data up into smaller chunks. For smaller tables with only a few thousand rows or less, the administrative overhead of maintaining the partitions will outweigh any performance benefits you might see.
  • Are you experiencing unsatisfactory performance? As with any performance tuning initiative, a table should be partitioned only if queries against that table are producing slower response times than desired.
  • Do your query predicates have identifiable access patterns? Examine the WHERE clauses of your query workload and look for table columns that are consistently used to access data. For example, if most of your queries tend to look up records by date, then a monthly or weekly date-partitioning design might be beneficial. Or if you tend to access records by region, consider a list-partitioning design to divide the table by region.
  • Does your data warehouse maintain a window of historical data? Another consideration for partition design is your organization's business requirements for maintaining historical data. For example, your data warehouse may require that you keep data for the past twelve months. If the data is partitioned by month, you can easily drop the oldest monthly partition from the warehouse and load current data into the most recent monthly partition.
  • Can the data be divided into somewhat equal parts based on some defining criteria? Choose partitioning criteria that will divide your data as evenly as possible. If the partitions contain a relatively equal number of records, query performance improves based on the number of partitions created. For example, by dividing a large table into 10 partitions, a query will execute 10 times faster than it would against the unpartitioned table, provided that the partitions are designed to support the query's criteria.

Creating Partitioned Tables

You partition tables when you create them with CREATE TABLE. This topic provides examples of SQL syntax for creating a table with various partition designs.

To partition a table:

  1. Decide on the partition design: date range, numeric range, or list of values.
  2. Choose the column(s) on which to partition the table.
  3. Decide how many levels of partitions you want. For example, you can create a date range partition table by month and then subpartition the monthly partitions by sales region.

Defining Date Range Table Partitions

A date range partitioned table uses a single date or timestamp column as the partition key column. You can use the same partition key column to create subpartitions if necessary, for example, to partition by month and then subpartition by day. Consider partitioning by the most granular level. For example, for a table partitioned by date, you can partition by day and have 365 daily partitions, rather than partition by year then subpartition by month then subpartition by day. A multi-level design can reduce query planning time, but a flat partition design runs faster.

You can have Greenplum Database automatically generate partitions by giving a START value, an END value, and an EVERY clause that defines the partition increment value. By default, START values are always inclusive and END values are always exclusive. For example:

CREATE TABLE sales (id int, date date, amt decimal(10,2))
DISTRIBUTED BY (id)
PARTITION BY RANGE (date)
( START (date '2008-01-01') INCLUSIVE
   END (date '2009-01-01') EXCLUSIVE
   EVERY (INTERVAL '1 day') );

You can also declare and name each partition individually. For example:

CREATE TABLE sales (id int, date date, amt decimal(10,2))
DISTRIBUTED BY (id)
PARTITION BY RANGE (date)
( PARTITION Jan08 START (date '2008-01-01') INCLUSIVE , 
  PARTITION Feb08 START (date '2008-02-01') INCLUSIVE ,
  PARTITION Mar08 START (date '2008-03-01') INCLUSIVE ,
  PARTITION Apr08 START (date '2008-04-01') INCLUSIVE ,
  PARTITION May08 START (date '2008-05-01') INCLUSIVE ,
  PARTITION Jun08 START (date '2008-06-01') INCLUSIVE ,
  PARTITION Jul08 START (date '2008-07-01') INCLUSIVE ,
  PARTITION Aug08 START (date '2008-08-01') INCLUSIVE ,
  PARTITION Sep08 START (date '2008-09-01') INCLUSIVE ,
  PARTITION Oct08 START (date '2008-10-01') INCLUSIVE ,
  PARTITION Nov08 START (date '2008-11-01') INCLUSIVE ,
  PARTITION Dec08 START (date '2008-12-01') INCLUSIVE 
                  END (date '2009-01-01') EXCLUSIVE );

You do not have to declare an END value for each partition, only the last one. In this example, Jan08 ends where Feb08 starts.

Defining Numeric Range Table Partitions

A numeric range partitioned table uses a single numeric data type column as the partition key column. For example:

CREATE TABLE rank (id int, rank int, year int, gender 
char(1), count int)
DISTRIBUTED BY (id)
PARTITION BY RANGE (year)
( START (2001) END (2008) EVERY (1), 
  DEFAULT PARTITION extra ); 

For more information about default partitions, see Adding a Default Partition.

Defining List Table Partitions

A list partitioned table can use any data type column that allows equality comparisons as its partition key column. A list partition can also have a multi-column (composite) partition key, whereas a range partition only allows a single column as the partition key. For list partitions, you must declare a partition specification for every partition (list value) you want to create. For example:

CREATE TABLE rank (id int, rank int, year int, gender 
char(1), count int ) 
DISTRIBUTED BY (id)
PARTITION BY LIST (gender)
( PARTITION girls VALUES ('F'), 
  PARTITION boys VALUES ('M'), 
  DEFAULT PARTITION other );

For more information about default partitions, see Adding a Default Partition.

Defining Multi-level Partitions

You can create a multi-level partition design with subpartitions of partitions. Using a subpartition template ensures that every partition has the same subpartition design, including partitions that you add later. For example, the following SQL creates the two-level partition design shown in Figure 1:

CREATE TABLE sales (trans_id int, date date, amount 
decimal(9,2), region text) 
DISTRIBUTED BY (trans_id)
PARTITION BY RANGE (date)
SUBPARTITION BY LIST (region)
SUBPARTITION TEMPLATE
( SUBPARTITION usa VALUES ('usa'), 
  SUBPARTITION asia VALUES ('asia'), 
  SUBPARTITION europe VALUES ('europe'), 
  DEFAULT SUBPARTITION other_regions)
  (START (date '2011-01-01') INCLUSIVE
  END (date '2012-01-01') EXCLUSIVE
  EVERY (INTERVAL '1 month'), 
  DEFAULT PARTITION outlying_dates );

The following example shows a three-level partition design where the sales table is partitioned by year, then month, then region. The SUBPARTITION TEMPLATE clauses ensure that each yearly partition has the same subpartition structure. The example declares a DEFAULT partition at each level of the hierarchy.

CREATE TABLE p3_sales (id int, year int, month int, day int, 
region text)
DISTRIBUTED BY (id)
PARTITION BY RANGE (year)
    SUBPARTITION BY RANGE (month)
       SUBPARTITION TEMPLATE (
        START (1) END (13) EVERY (1), 
        DEFAULT SUBPARTITION other_months )
           SUBPARTITION BY LIST (region)
             SUBPARTITION TEMPLATE (
               SUBPARTITION usa VALUES ('usa'),
               SUBPARTITION europe VALUES ('europe'),
               SUBPARTITION asia VALUES ('asia'),
               DEFAULT SUBPARTITION other_regions )
( START (2002) END (2012) EVERY (1), 
  DEFAULT PARTITION outlying_years );

Partitioning an Existing Table

Tables can be partitioned only at creation. If you have a table that you want to partition, you must create a partitioned table, load the data from the original table into the new table, drop the original table, and rename the partitioned table with the original table's name. You must also re-grant any table permissions. For example:

CREATE TABLE sales2 (LIKE sales) 
PARTITION BY RANGE (date)
( START (date '2008-01-01') INCLUSIVE
   END (date '2009-01-01') EXCLUSIVE
   EVERY (INTERVAL '1 month') );
INSERT INTO sales2 SELECT * FROM sales;
DROP TABLE sales;
ALTER TABLE sales2 RENAME TO sales;
GRANT ALL PRIVILEGES ON sales TO admin;
GRANT SELECT ON sales TO guest;

Limitations of Partitioned Tables

A primary key or unique constraint on a partitioned table must contain all the partitioning columns. A unique index can omit the partitioning columns; however, it is enforced only on the parts of the partitioned table, not on the partitioned table as a whole.

Loading Partitioned Tables

After you create the partitioned table structure, top-level parent tables are empty. Data is routed to the bottom-level child table partitions. In a multi-level partition design, only the subpartitions at the bottom of the hierarchy can contain data.

Rows that cannot be mapped to a child table partition are rejected and the load fails. To avoid unmapped rows being rejected at load time, define your partition hierarchy with a DEFAULT partition. Any rows that do not match a partition's CHECK constraints load into the DEFAULT partition. See Adding a Default Partition.

At runtime, the query planner scans the entire table inheritance hierarchy and uses the CHECK table constraints to determine which of the child table partitions to scan to satisfy the query's conditions. The DEFAULT partition (if your hierarchy has one) is always scanned. DEFAULT partitions that contain data slow down the overall scan time.

When you use COPY or INSERT to load data into a parent table, the data is automatically rerouted to the correct partition, just like a regular table.

Best practice for loading data into partitioned tables is to create an intermediate staging table, load it, and then exchange it into your partition design. See Exchanging a Partition.

Verifying Your Partition Strategy

When a table is partitioned based on the query predicate, you can use EXPLAIN to verify that the query planner scans only the relevant data to examine the query plan.

For example, suppose a sales table is date-range partitioned by month and subpartitioned by region as shown in Figure 1. For the following query:

EXPLAIN SELECT * FROM sales WHERE date='01-07-12' AND 
region='usa';

The query plan for this query should show a table scan of only the following tables:

  • the default partition returning 0-1 rows (if your partition design has one)
  • the January 2012 partition (sales_1_prt_1) returning 0-1 rows
  • the USA region subpartition (sales_1_2_prt_usa) returning some number of rows.

The following example shows the relevant portion of the query plan.

->  Seq Scan onsales_1_prt_1 sales (cost=0.00..0.00 rows=0 
     width=0)
Filter: "date"=01-07-08::date AND region='USA'::text
->  Seq Scan onsales_1_2_prt_usa sales (cost=0.00..9.87 
rows=20 
      width=40)

Ensure that the query planner does not scan unnecessary partitions or subpartitions (for example, scans of months or regions not specified in the query predicate), and that scans of the top-level tables return 0-1 rows.

Troubleshooting Selective Partition Scanning

The following limitations can result in a query plan that shows a non-selective scan of your partition hierarchy.

  • The query planner can selectively scan partitioned tables only when the query contains a direct and simple restriction of the table using immutable operators such as:

    =, < , <= , >,  >= , and <>

  • Selective scanning recognizes STABLE and IMMUTABLE functions, but does not recognize VOLATILE functions within a query. For example, WHERE clauses such as date > CURRENT_DATE cause the query planner to selectively scan partitioned tables, but time > TIMEOFDAY does not.

Viewing Your Partition Design

You can look up information about your partition design using the pg_partitions view. For example, to see the partition design of the sales table:

SELECT partitionboundary, partitiontablename, partitionname, 
partitionlevel, partitionrank 
FROM pg_partitions 
WHERE tablename='sales';

The following table and views show information about partitioned tables.

  • pg_partition - Tracks partitioned tables and their inheritance level relationships.
  • pg_partition_templates - Shows the subpartitions created using a subpartition template.
  • pg_partition_columns - Shows the partition key columns used in a partition design.

For information about Greenplum Database system catalog tables and views, see the Greenplum Database Reference Guide.

Maintaining Partitioned Tables

To maintain a partitioned table, use the ALTER TABLE command against the top-level parent table. The most common scenario is to drop old partitions and add new ones to maintain a rolling window of data in a range partition design. You can convert (exchange) older partitions to the append-optimized compressed storage format to save space. If you have a default partition in your partition design, you add a partition by splitting the default partition.

Important: When defining and altering partition designs, use the given partition name, not the table object name. Although you can query and load any table (including partitioned tables) directly using SQL commands, you can only modify the structure of a partitioned table using the ALTER TABLE...PARTITION clauses.

Partitions are not required to have names. If a partition does not have a name, use one of the following expressions to specify a part: PARTITION FOR (value) or )PARTITION FOR(RANK(number).

Adding a Partition

You can add a partition to a partition design with the ALTER TABLE command. If the original partition design included subpartitions defined by a subpartition template, the newly added partition is subpartitioned according to that template. For example:

ALTER TABLE sales ADD PARTITION 
            START (date '2009-02-01') INCLUSIVE 
            END (date '2009-03-01') EXCLUSIVE;

If you did not use a subpartition template when you created the table, you define subpartitions when adding a partition:

ALTER TABLE sales ADD PARTITION 
            START (date '2009-02-01') INCLUSIVE 
            END (date '2009-03-01') EXCLUSIVE
      ( SUBPARTITION usa VALUES ('usa'), 
        SUBPARTITION asia VALUES ('asia'), 
        SUBPARTITION europe VALUES ('europe') );

When you add a subpartition to an existing partition, you can specify the partition to alter. For example:

ALTER TABLE sales ALTER PARTITION FOR (RANK(12))
      ADD PARTITION africa VALUES ('africa');
Note: You cannot add a partition to a partition design that has a default partition. You must split the default partition to add a partition. See Splitting a Partition.

Renaming a Partition

Partitioned tables use the following naming convention. Partitioned subtable names are subject to uniqueness requirements and length limitations.

<parentname>_<level>_prt_<partition_name>

For example:

sales_1_prt_jan08

For auto-generated range partitions, where a number is assigned when no name is given):

sales_1_prt_1

To rename a partitioned child table, rename the top-level parent table. The <parentname> changes in the table names of all associated child table partitions. For example, the following command:

ALTER TABLE sales RENAME TO globalsales;

Changes the associated table names:

globalsales_1_prt_1

You can change the name of a partition to make it easier to identify. For example:

ALTER TABLE sales RENAME PARTITION FOR ('2008-01-01') TO jan08;

Changes the associated table name as follows:

sales_1_prt_jan08

When altering partitioned tables with the ALTER TABLE command, always refer to the tables by their partition name (jan08) and not their full table name (sales_1_prt_jan08).

Note: The table name cannot be a partition name in an ALTER TABLE statement. For example, ALTER TABLE sales... is correct, ALTER TABLE sales_1_part_jan08... is not allowed.

Adding a Default Partition

You can add a default partition to a partition design with the ALTER TABLE command.

ALTER TABLE sales ADD DEFAULT PARTITION other;

If your partition design is multi-level, each level in the hierarchy must have a default partition. For example:

ALTER TABLE sales ALTER PARTITION FOR (RANK(1)) ADD DEFAULT 
PARTITION other;

ALTER TABLE sales ALTER PARTITION FOR (RANK(2)) ADD DEFAULT 
PARTITION other;

ALTER TABLE sales ALTER PARTITION FOR (RANK(3)) ADD DEFAULT 
PARTITION other;

If incoming data does not match a partition's CHECK constraint and there is no default partition, the data is rejected. Default partitions ensure that incoming data that does not match a partition is inserted into the default partition.

Dropping a Partition

You can drop a partition from your partition design using the ALTER TABLE command. When you drop a partition that has subpartitions, the subpartitions (and all data in them) are automatically dropped as well. For range partitions, it is common to drop the older partitions from the range as old data is rolled out of the data warehouse. For example:

ALTER TABLE sales DROP PARTITION FOR (RANK(1));

Truncating a Partition

You can truncate a partition using the ALTER TABLE command. When you truncate a partition that has subpartitions, the subpartitions are automatically truncated as well.

ALTER TABLE sales TRUNCATE PARTITION FOR (RANK(1));

Exchanging a Partition

You can exchange a partition using the ALTER TABLE command. Exchanging a partition swaps one table in place of an existing partition. You can exchange partitions only at the lowest level of your partition hierarchy (only partitions that contain data can be exchanged).

Partition exchange can be useful for data loading. For example, load a staging table and swap the loaded table into your partition design. You can use partition exchange to change the storage type of older partitions to append-optimized tables. For example:

CREATE TABLE jan12 (LIKE sales) WITH (appendonly=true);
INSERT INTO jan12 SELECT * FROM sales_1_prt_1 ;
ALTER TABLE sales EXCHANGE PARTITION FOR (DATE '2012-01-01') 
WITH TABLE jan12;
Note: This example refers to the single-level definition of the table sales, before partitions were added and altered in the previous examples.

Splitting a Partition

Splitting a partition divides a partition into two partitions. You can split a partition using the ALTER TABLE command. You can split partitions only at the lowest level of your partition hierarchy: only partitions that contain data can be split. The split value you specify goes into the latter partition.

For example, to split a monthly partition into two with the first partition containing dates January 1-15 and the second partition containing dates January 16-31:

ALTER TABLE sales SPLIT PARTITION FOR ('2008-01-01')
AT ('2008-01-16')
INTO (PARTITION jan081to15, PARTITION jan0816to31);

If your partition design has a default partition, you must split the default partition to add a partition.

When using the INTO clause, specify the current default partition as the second partition name. For example, to split a default range partition to add a new monthly partition for January 2009:

ALTER TABLE sales SPLIT DEFAULT PARTITION 
START ('2009-01-01') INCLUSIVE 
END ('2009-02-01') EXCLUSIVE 
INTO (PARTITION jan09, default partition);

Modifying a Subpartition Template

Use ALTER TABLE SET SUBPARTITION TEMPLATE to modify the subpartition template for an existing partition. Partitions added after you set a new subpartition template have the new partition design. Existing partitions are not modified.

For example, to modify the subpartition design shown in Figure 1:

ALTER TABLE sales SET SUBPARTITION TEMPLATE
( SUBPARTITION usa VALUES ('usa'), 
  SUBPARTITION asia VALUES ('asia'), 
  SUBPARTITION europe VALUES ('europe'),
  SUBPARTITION africa VALUES ('africa')
  DEFAULT SUBPARTITION other );

When you add a date-range partition of the table sales, it includes the new regional list subpartition for Africa. For example, the following command creates the subpartitions usa, asia, europe, africa, and a default partition named other:

ALTER TABLE sales ADD PARTITION sales_prt_3
  START ('2009-03-01') INCLUSIVE 
  END ('2009-04-01') EXCLUSIVE );

To remove a subpartition template, use SET SUBPARTITION TEMPLATE with empty parentheses. For example, to clear the sales table subpartition template:

ALTER TABLE sales SET SUBPARTITION TEMPLATE ();

Creating and Using Sequences

You can use sequences to auto-increment unique ID columns of a table whenever a record is added. Sequences are often used to assign unique identification numbers to rows added to a table. You can declare an identifier column of type SERIAL to implicitly create a sequence for use with a column.

Creating a Sequence

The CREATE SEQUENCE command creates and initializes a special single-row sequence generator table with the given sequence name. The sequence name must be distinct from the name of any other sequence, table, index, or view in the same schema. For example:

CREATE SEQUENCE myserial START 101;

Using a Sequence

After you create a sequence generator table using CREATE SEQUENCE, you can use the nextval function to operate on the sequence. For example, to insert a row into a table that gets the next value of a sequence:

INSERT INTO vendors VALUES (nextval('myserial'), 'acme');

You can also use the setval function to reset a sequence's counter value. For example:

SELECT setval('myserial', 201);

A nextval operation is never rolled back. Afetched value is considered used, even if the transaction that performed the nextval fails. This means that failed transactions can leave unused holes in the sequence of assigned values. setval operations are never rolled back.

Note that the nextval function is not allowed in UPDATE or DELETE statements if mirroring is enabled, and the currval and lastval functions are not supported in Greenplum Database.

To examine the current settings of a sequence, query the sequence table:

SELECT * FROM myserial;

Altering a Sequence

The ALTER SEQUENCE command changes the parameters of an existing sequence generator. For example:

ALTER SEQUENCE myserial RESTART WITH 105;

Any parameters not set in the ALTER SEQUENCE command retain their prior settings.

Dropping a Sequence

The DROP SEQUENCE command removes a sequence generator table. For example:

DROP SEQUENCE myserial;

Using Indexes in Greenplum Database

In most traditional databases, indexes can greatly improve data access times. However, in a distributed database such as Greenplum, indexes should be used more sparingly. Greenplum Database performs very fast sequential scans; indexes use a random seek pattern to locate records on disk. Greenplum data is distributed across the segments, so each segment scans a smaller portion of the overall data to get the result. With table partitioning, the total data to scan may be even smaller. Because business intelligence (BI) query workloads generally return very large data sets, using indexes is not efficient.

Greenplum recommends trying your query workload without adding indexes. Indexes are more likely to improve performance for OLTP workloads, where the query is returning a single record or a small subset of data. Indexes can also improve performance on compressed append-optimized tables for queries that return a targeted set of rows, as the optimizer can use an index access method rather than a full table scan when appropriate. For compressed data, an index access method means only the necessary rows are uncompressed.

Greenplum Database automatically creates PRIMARY KEY constraints for tables with primary keys. To create an index on a partitioned table, index each partitioned child table. Indexes on the parent table do not apply to child table partitions.

To create an index on a partitioned table, create an index on the partitioned table that you create. The index is propagated to all the child tables created by Greenplum Database. Creating an index on a table that is created by Greenplum Database for use by a partitioned table is not supported.

Note that a UNIQUE CONSTRAINT (such as a PRIMARY KEY CONSTRAINT) implicitly creates a UNIQUE INDEX that must include all the columns of the distribution key and any partitioning key. The UNIQUE CONSTRAINT is enforced across the entire table, including all table partitions (if any).

Indexes add some database overhead — they use storage space and must be maintained when the table is updated. Ensure that the query workload uses the indexes that you create, and check that the indexes you add improve query performance (as compared to a sequential scan of the table). To determine whether indexes are being used, examine the query EXPLAIN plans. See Query Profiling.

Consider the following points when you create indexes.

  • Your Query Workload. Indexes improve performance for workloads where queries return a single record or a very small data set, such as OLTP workloads.
  • Compressed Tables. Indexes can improve performance on compressed append-optimized tables for queries that return a targeted set of rows. For compressed data, an index access method means only the necessary rows are uncompressed.
  • Avoid indexes on frequently updated columns. Creating an index on a column that is frequently updated increases the number of writes required when the column is updated.
  • Create selective B-tree indexes. Index selectivity is a ratio of the number of distinct values a column has divided by the number of rows in a table. For example, if a table has 1000 rows and a column has 800 distinct values, the selectivity of the index is 0.8, which is considered good. Unique indexes always have a selectivity ratio of 1.0, which is the best possible. Greenplum Database allows unique indexes only on distribution key columns.
  • Use Bitmap indexes for low selectivity columns. The Greenplum Database Bitmap index type is not available in regular PostgreSQL. See About Bitmap Indexes.
  • Index columns used in joins. An index on a column used for frequent joins (such as a foreign key column) can improve join performance by enabling more join methods for the query planner to use.
  • Index columns frequently used in predicates. Columns that are frequently referenced in WHERE clauses are good candidates for indexes.
  • Avoid overlapping indexes. Indexes that have the same leading column are redundant.
  • Drop indexes for bulk loads. For mass loads of data into a table, consider dropping the indexes and re-creating them after the load completes. This is often faster than updating the indexes.
  • Consider a clustered index. Clustering an index means that the records are physically ordered on disk according to the index. If the records you need are distributed randomly on disk, the database has to seek across the disk to fetch the records requested. If the records are stored close together, the fetching operation is more efficient. For example, a clustered index on a date column where the data is ordered sequentially by date. A query against a specific date range results in an ordered fetch from the disk, which leverages fast sequential access.

To cluster an index in Greenplum Database

Using the CLUSTER command to physically reorder a table based on an index can take a long time with very large tables. To achieve the same results much faster, you can manually reorder the data on disk by creating an intermediate table and loading the data in the desired order. For example:

CREATE TABLE new_table (LIKE old_table) 
       AS SELECT * FROM old_table ORDER BY myixcolumn;
DROP old_table;
ALTER TABLE new_table RENAME TO old_table;
CREATE INDEX myixcolumn_ix ON old_table;
VACUUM ANALYZE old_table;

Index Types

Greenplum Database supports the Postgres index types B-tree and GiST. Hash and GIN indexes are not supported. Each index type uses a different algorithm that is best suited to different types of queries. B-tree indexes fit the most common situations and are the default index type. See Index Types in the PostgreSQL documentation for a description of these types.

Note: Greenplum Database allows unique indexes only if the columns of the index key are the same as (or a superset of) the Greenplum distribution key. Unique indexes are not supported on append-optimized tables. On partitioned tables, a unique index cannot be enforced across all child table partitions of a partitioned table. A unique index is supported only within a partition.

About Bitmap Indexes

Greenplum Database provides the Bitmap index type. Bitmap indexes are best suited to data warehousing applications and decision support systems with large amounts of data, many ad hoc queries, and few data modification (DML) transactions.

An index provides pointers to the rows in a table that contain a given key value. A regular index stores a list of tuple IDs for each key corresponding to the rows with that key value. Bitmap indexes store a bitmap for each key value. Regular indexes can be several times larger than the data in the table, but bitmap indexes provide the same functionality as a regular index and use a fraction of the size of the indexed data.

Each bit in the bitmap corresponds to a possible tuple ID. If the bit is set, the row with the corresponding tuple ID contains the key value. A mapping function converts the bit position to a tuple ID. Bitmaps are compressed for storage. If the number of distinct key values is small, bitmap indexes are much smaller, compress better, and save considerable space compared with a regular index. The size of a bitmap index is proportional to the number of rows in the table times the number of distinct values in the indexed column.

Bitmap indexes are most effective for queries that contain multiple conditions in the WHERE clause. Rows that satisfy some, but not all, conditions are filtered out before the table is accessed. This improves response time, often dramatically.

When to Use Bitmap Indexes

Bitmap indexes are best suited to data warehousing applications where users query the data rather than update it. Bitmap indexes perform best for columns that have between 100 and 100,000 distinct values and when the indexed column is often queried in conjunction with other indexed columns. Columns with fewer than 100 distinct values, such as a gender column with two distinct values (male and female), usually do not benefit much from any type of index. On a column with more than 100,000 distinct values, the performance and space efficiency of a bitmap index decline.

Bitmap indexes can improve query performance for ad hoc queries. AND and OR conditions in the WHERE clause of a query can be resolved quickly by performing the corresponding Boolean operations directly on the bitmaps before converting the resulting bitmap to tuple ids. If the resulting number of rows is small, the query can be answered quickly without resorting to a full table scan.

When Not to Use Bitmap Indexes

Do not use bitmap indexes for unique columns or columns with high cardinality data, such as customer names or phone numbers. The performance gains and disk space advantages of bitmap indexes start to diminish on columns with 100,000 or more unique values, regardless of the number of rows in the table.

Bitmap indexes are not suitable for OLTP applications with large numbers of concurrent transactions modifying the data.

Use bitmap indexes sparingly. Test and compare query performance with and without an index. Add an index only if query performance improves with indexed columns.

Creating an Index

The CREATE INDEX command defines an index on a table. A B-tree index is the default index type. For example, to create a B-tree index on the column gender in the table employee:

CREATE INDEX gender_idx ON employee (gender);

To create a bitmap index on the column title in the table films:

CREATE INDEX title_bmp_idx ON films USING bitmap (title);

Examining Index Usage

Greenplum Database indexes do not require maintenance and tuning. You can check which indexes are used by the real-life query workload. Use the EXPLAIN command to examine index usage for a query.

The query plan shows the steps or plan nodes that the database will take to answer a query and time estimates for each plan node. To examine the use of indexes, look for the following query plan node types in your EXPLAIN output:

  • Index Scan - A scan of an index.
  • Bitmap Heap Scan - Retrieves all
  • from the bitmap generated by BitmapAnd, BitmapOr, or BitmapIndexScan and accesses the heap to retrieve the relevant rows.
  • Bitmap Index Scan - Compute a bitmap by OR-ing all bitmaps that satisfy the query predicates from the underlying index.
  • BitmapAnd or BitmapOr - Takes the bitmaps generated from multiple BitmapIndexScan nodes, ANDs or ORs them together, and generates a new bitmap as its output.

You have to experiment to determine the indexes to create. Consider the following points.

  • Run ANALYZE after you create or update an index. ANALYZE collects table statistics. The query planner uses table statistics to estimate the number of rows returned by a query and to assign realistic costs to each possible query plan.
  • Use real data for experimentation. Using test data for setting up indexes tells you what indexes you need for the test data, but that is all.
  • Do not use very small test data sets as the results can be unrealistic or skewed.
  • Be careful when developing test data. Values that are similar, completely random, or inserted in sorted order will skew the statistics away from the distribution that real data would have.
  • You can force the use of indexes for testing purposes by using run-time parameters to turn off specific plan types. For example, turn off sequential scans (enable_seqscan) and nested-loop joins (enable_nestloop), the most basic plans, to force the system to use a different plan. Time your query with and without indexes and use the EXPLAIN ANALYZE command to compare the results.

Managing Indexes

Use the REINDEX command to rebuild a poorly-performing index. REINDEX rebuilds an index using the data stored in the index's table, replacing the old copy of the index.

To rebuild all indexes on a table

REINDEX my_table;
REINDEX my_index;

Dropping an Index

The DROP INDEX command removes an index. For example:

DROP INDEX title_idx;

When loading data, it can be faster to drop all indexes, load, then recreate the indexes.

Creating and Managing Views

Views enable you to save frequently used or complex queries, then access them in a SELECT statement as if they were a table. A view is not physically materialized on disk: the query runs as a subquery when you access the view.

If a subquery is associated with a single query, consider using the WITH clause of the SELECT command instead of creating a seldom-used view.

Creating Views

The CREATE VIEW command defines a view of a query. For example:

CREATE VIEW comedies AS SELECT * FROM films WHERE kind = 'comedy';

Views ignore ORDER BY and SORT operations stored in the view.

Dropping Views

The DROP VIEW command removes a view. For example:

DROP VIEW topten;