Querying Data

Querying Data

This topic provides information about using SQL in Greenplum databases.

You enter SQL statements called queries to view, change, and analyze data in a database using the psql interactive SQL client and other client tools.

Defining Queries

This topic describes how to construct SQL queries in Greenplum Database.

SQL Lexicon

SQL is a standard language for accessing databases. The language consists of elements that enable data storage, retrieval, analysis, viewing, manipulation, and so on. You use SQL commands to construct queries and commands that the Greenplum Database engine understands. SQL queries consist of a sequence of commands. Commands consist of a sequence of valid tokens in correct syntax order, terminated by a semicolon (;). For more information about SQL commands, see the Greenplum Database Reference Guide.

Greenplum Database uses PostgreSQL's structure and syntax, with some exceptions. For more information about SQL rules and concepts in PostgreSQL, see "SQL Syntax" in the PostgreSQL documentation.

SQL Value Expressions

SQL value expressions consist of one or more values, symbols, operators, SQL functions, and data. The expressions compare data or perform calculations and return a value as the result. Calculations include logical, arithmetic, and set operations.

The following are value expressions:

  • An aggregate expression
  • An array constructor
  • A column reference
  • A constant or literal value
  • A correlated subquery
  • A field selection expression
  • A function call
  • A new column value in an INSERT or UPDATE
  • An operator invocation column reference
  • A positional parameter reference, in the body of a function definition or prepared statement
  • A row constructor
  • A scalar subquery
  • A search condition in a WHERE clause
  • A target list of a SELECT command
  • A type cast
  • A value expression in parentheses, useful to group sub-expressions and override precedence
  • A window expression

SQL constructs such as functions and operators are expressions but do not follow any general syntax rules. For more information about these constructs, see Using Functions and Operators.

Column References

A column reference has the form:

correlation.columnname

Here, correlation is the name of a table (possibly qualified with a schema name) or an alias for a table defined with a FROM clause or one of the keywords NEW or OLD. NEW and OLD can appear only in rewrite rules, but you can use other correlation names in any SQL statement. If the column name is unique across all tables in the query, you can omit the "correlation." part of the column reference.

Positional Parameters

Positional parameters are arguments to SQL statements or functions that you reference by their positions in a series of arguments. For example, $1 refers to the first argument, $2 to the second argument, and so on. The values of positional parameters are set from arguments external to the SQL statement or supplied when SQL functions are invoked. Some client libraries support specifying data values separately from the SQL command, in which case parameters refer to the out-of-line data values. A parameter reference has the form:

$number

For example:

CREATE FUNCTION dept(text) RETURNS dept
    AS $$ SELECT * FROM dept WHERE name = $1 $$
    LANGUAGE SQL;

Here, the $1 references the value of the first function argument whenever the function is invoked.

Subscripts

If an expression yields a value of an array type, you can extract a specific element of the array value as follows:

expression[subscript]

You can extract multiple adjacent elements, called an array slice, as follows (including the brackets):

expression[lower_subscript:upper_subscript]

Each subscript is an expression and yields an integer value.

Array expressions usually must be in parentheses, but you can omit the parentheses when the expression to be subscripted is a column reference or positional parameter. You can concatenate multiple subscripts when the original array is multidimensional. For example (including the parentheses):

mytable.arraycolumn[4]
mytable.two_d_column[17][34]
$1[10:42]
(arrayfunction(a,b))[42]

Field Selection

If an expression yields a value of a composite type (row type), you can extract a specific field of the row as follows:

expression.fieldname

The row expression usually must be in parentheses, but you can omit these parentheses when the expression to be selected from is a table reference or positional parameter. For example:

mytable.mycolumn
$1.somecolumn
(rowfunction(a,b)).col3

A qualified column reference is a special case of field selection syntax.

Operator Invocations

Operator invocations have the following possible syntaxes:

expression operator expression(binary infix operator)
operator expression(unary prefix operator)
expression operator(unary postfix operator)

Where operator is an operator token, one of the key words AND, OR, or NOT, or qualified operator name in the form:

OPERATOR(schema.operatorname)

Available operators and whether they are unary or binary depends on the operators that the system or user defines. For more information about built-in operators, see Built-in Functions and Operators.

Function Calls

The syntax for a function call is the name of a function (possibly qualified with a schema name), followed by its argument list enclosed in parentheses:

function ([expression [, expression ... ]])

For example, the following function call computes the square root of 2:

sqrt(2)

See the Greenplum Database Reference Guide for lists of the built-in functions by category. You can add custom functions, too.

Aggregate Expressions

An aggregate expression applies an aggregate function across the rows that a query selects. An aggregate function performs a calculation on a set of values and returns a single value, such as the sum or average of the set of values. The syntax of an aggregate expression is one of the following:

  • aggregate_name ([ , ... ] ) [FILTER (WHERE condition)] — operates across all input rows for which the expected result value is non-null. ALL is the default.
  • aggregate_name(ALLexpression[ , ... ] ) [FILTER (WHERE condition)] — operates identically to the first form because ALL is the default
  • aggregate_name(DISTINCT expression[ , ... ] ) [FILTER (WHERE condition)] — operates across all distinct non-null values of input rows
  • aggregate_name(*) [FILTER (WHERE condition)] — operates on all rows with values both null and non-null. Generally, this form is most useful for the count(*) aggregate function.

Where aggregate_name is a previously defined aggregate (possibly schema-qualified) and expression is any value expression that does not contain an aggregate expression.

For example, count(*) yields the total number of input rows, count(f1) yields the number of input rows in which f1 is non-null, and count(distinct f1) yields the number of distinct non-null values of f1.

You can specify a condition with the FILTER clause to limit the input rows to the aggregate function. For example:

SELECT count(*) FILTER (WHERE gender='F') FROM employee;

The WHERE condition of the FILTER clause cannot contain a set-returning function, subquery, window function, or outer reference. If you use a user-defined aggregate function, declare the state transition function as STRICT (see CREATE AGGREGATE).

For predefined aggregate functions, see Built-in Functions and Operators. You can also add custom aggregate functions.

Greenplum Database provides the MEDIAN aggregate function, which returns the fiftieth percentile of the PERCENTILE_CONT result and special aggregate expressions for inverse distribution functions as follows:

PERCENTILE_CONT(_percentage_) WITHIN GROUP (ORDER BY _expression_)
PERCENTILE_DISC(_percentage_) WITHIN GROUP (ORDER BY _expression_)

Currently you can use only these two expressions with the keyword WITHIN GROUP.

Limitations of Aggregate Expressions

The following are current limitations of the aggregate expressions:

  • Greenplum Database does not support the following keywords: ALL, DISTINCT, FILTER and OVER. See Table 5 for more details.
  • An aggregate expression can appear only in the result list or HAVING clause of a SELECT command. It is forbidden in other clauses, such as WHERE, because those clauses are logically evaluated before the results of aggregates form. This restriction applies to the query level to which the aggregate belongs.
  • When an aggregate expression appears in a subquery, the aggregate is normally evaluated over the rows of the subquery. If the aggregate's arguments contain only outer-level variables, the aggregate belongs to the nearest such outer level and evaluates over the rows of that query. The aggregate expression as a whole is then an outer reference for the subquery in which it appears, and the aggregate expression acts as a constant over any one evaluation of that subquery. See Scalar Subqueries and Built-in Functions and Operators.
  • Greenplum Database does not support DISTINCT with multiple input expressions.

Window Expressions

Window expressions allow application developers to more easily compose complex online analytical processing (OLAP) queries using standard SQL commands. For example, with window expressions, users can calculate moving averages or sums over various intervals, reset aggregations and ranks as selected column values change, and express complex ratios in simple terms.

A window expression represents the application of a window function applied to a window frame, which is defined in a special OVER() clause. A window partition is a set of rows that are grouped together to apply a window function. Unlike aggregate functions, which return a result value for each group of rows, window functions return a result value for every row, but that value is calculated with respect to the rows in a particular window partition. If no partition is specified, the window function is computed over the complete intermediate result set.

The syntax of a window expression is:

window_function ( [expression [, ...]] ) OVER ( window_specification )

Where window_function is one of the functions listed in Window Functions, expression is any value expression that does not contain a window expression, and window_specification is:

[window_name]
[PARTITION BY expression [, ...]]
[[ORDER BY expression [ASC | DESC | USING operator] [, ...]
    [{RANGE | ROWS} 
       { UNBOUNDED PRECEDING
       | expression PRECEDING
       | CURRENT ROW
       | BETWEEN window_frame_bound AND window_frame_bound }]]
    and where window_frame_bound can be one of:
    UNBOUNDED PRECEDING
    expression PRECEDING
    CURRENT ROW
    expression FOLLOWING
    UNBOUNDED FOLLOWING

A window expression can appear only in the select list of a SELECT command. For example:

SELECT count(*) OVER(PARTITION BY customer_id), * FROM sales;

The OVER clause differentiates window functions from other aggregate or reporting functions. The OVER clause defines the window_specification to which the window function is applied. A window specification has the following characteristics:

  • The PARTITION BY clause defines the window partitions to which the window function is applied. If omitted, the entire result set is treated as one partition.
  • The ORDER BY clause defines the expression(s) for sorting rows within a window partition. The ORDER BY clause of a window specification is separate and distinct from the ORDER BY clause of a regular query expression. The ORDER BY clause is required for the window functions that calculate rankings, as it identifies the measure(s) for the ranking values. For OLAP aggregations, the ORDER BY clause is required to use window frames (the ROWS | RANGE clause).
Note: Columns of data types without a coherent ordering, such as time, are not good candidates for use in the ORDER BY clause of a window specification. Time, with or without a specified time zone, lacks a coherent ordering because addition and subtraction do not have the expected effects. For example, the following is not generally true: x::time < x::time + '2 hour'::interval
  • The ROWS/RANGE clause defines a window frame for aggregate (non-ranking) window functions. A window frame defines a set of rows within a window partition. When a window frame is defined, the window function computes on the contents of this moving frame rather than the fixed contents of the entire window partition. Window frames are row-based (ROWS) or value-based (RANGE).

Type Casts

A type cast specifies a conversion from one data type to another. Greenplum Database accepts two equivalent syntaxes for type casts:

CAST ( expression AS type )
expression::type

The CAST syntax conforms to SQL; the syntax with :: is historical PostgreSQL usage.

A cast applied to a value expression of a known type is a run-time type conversion. The cast succeeds only if a suitable type conversion function is defined. This differs from the use of casts with constants. A cast applied to a string literal represents the initial assignment of a type to a literal constant value, so it succeeds for any type if the contents of the string literal are acceptable input syntax for the data type.

You can usually omit an explicit type cast if there is no ambiguity about the type a value expression must produce; for example, when it is assigned to a table column, the system automatically applies a type cast. The system applies automatic casting only to casts marked "OK to apply implicitly" in system catalogs. Other casts must be invoked with explicit casting syntax to prevent unexpected conversions from being applied without the user's knowledge.

Scalar Subqueries

A scalar subquery is a SELECT query in parentheses that returns exactly one row with one column. Do not use a SELECT query that returns multiple rows or columns as a scalar subquery. The query runs and uses the returned value in the surrounding value expression. A correlated scalar subquery contains references to the outer query block.

Correlated Subqueries

A correlated subquery (CSQ) is a SELECT query with a WHERE clause or target list that contains references to the parent outer clause. CSQs efficiently express results in terms of results of another query. Greenplum Database supports correlated subqueries that provide compatibility with many existing applications. A CSQ is a scalar or table subquery, depending on whether it returns one or multiple rows. Greenplum Database does not support correlated subqueries with skip-level correlations.

Correlated Subquery Examples

Example 1 – Scalar correlated subquery
SELECT * FROM t1 WHERE t1.x 
            > (SELECT MAX(t2.x) FROM t2 WHERE t2.y = t1.y);
Example 2 – Correlated EXISTS subquery
SELECT * FROM t1 WHERE 
EXISTS (SELECT 1 FROM t2 WHERE t2.x = t1.x);

Greenplum Database uses one of the following methods to run CSQs:

  • Unnest the CSQ into join operations – This method is most efficient, and it is how Greenplum Database runs most CSQs, including queries from the TPC-H benchmark.
  • Run the CSQ on every row of the outer query – This method is relatively inefficient, and it is how Greenplum Database runs queries that contain CSQs in the SELECT list or are connected by OR conditions.

The following examples illustrate how to rewrite some of these types of queries to improve performance.

Example 3 - CSQ in the Select List

Original Query

SELECT T1.a,
      (SELECT COUNT(DISTINCT T2.z) FROM t2 WHERE t1.x = t2.y) dt2 
FROM t1;

Rewrite this query to perform an inner join with t1 first and then perform a left join with t1 again. The rewrite applies for only an equijoin in the correlated condition.

Rewritten Query

SELECT t1.a, dt2 FROM t1 
       LEFT JOIN 
        (SELECT t2.y AS csq_y, COUNT(DISTINCT t2.z) AS dt2 
              FROM t1, t2 WHERE t1.x = t2.y 
              GROUP BY t1.x) 
       ON (t1.x = csq_y);

Example 4 - CSQs connected by OR Clauses

Original Query

SELECT * FROM t1 
WHERE 
x > (SELECT COUNT(*) FROM t2 WHERE t1.x = t2.x) 
OR x < (SELECT COUNT(*) FROM t3 WHERE t1.y = t3.y)

Rewrite this query to separate it into two parts with a union on the OR conditions.

Rewritten Query

SELECT * FROM t1 
WHERE x > (SELECT count(*) FROM t2 WHERE t1.x = t2.x) 
UNION 
SELECT * FROM t1 
WHERE x < (SELECT count(*) FROM t3 WHERE t1.y = t3.y)

To view the query plan, use EXPLAIN SELECT or EXPLAIN ANALYZE SELECT. Subplan nodes in the query plan indicate that the query will run on every row of the outer query, and the query is a candidate for rewriting. For more information about these statements, see Query Profiling.

Advanced Table Functions

Greenplum Database supports table functions with TABLE value expressions. You can sort input rows for advanced table functions with an ORDER BY clause. You can redistribute them with a SCATTER BY clause to specify one or more columns or an expression for which rows with the specified characteristics are available to the same process. This usage is similar to using a DISTRIBUTED BY clause when creating a table, but the redistribution occurs when the query runs.

The following command uses the TABLE function with the SCATTER BY clause in the the GPText function gptext.index() to populate the index mytest.articles with data from the messages table:

SELECT * FROM gptext.index(TABLE(SELECT * FROM messages 
SCATTER BY distrib_id), 'mytest.articles');
Note:

Based on the distribution of data, Greenplum Database automatically parallelizes table functions with TABLE value parameters over the nodes of the cluster.

For information about the function gptext.index(), see the Pivotal GPText documentation.

Array Constructors

An array constructor is an expression that builds an array value from values for its member elements. A simple array constructor consists of the key word ARRAY, a left square bracket [, one or more expressions separated by commas for the array element values, and a right square bracket ]. For example,

SELECT ARRAY[1,2,3+4];
  array
---------
 {1,2,7}

The array element type is the common type of its member expressions, determined using the same rules as for UNION or CASE constructs.

You can build multidimensional array values by nesting array constructors. In the inner constructors, you can omit the keyword ARRAY. For example, the following two SELECT statements produce the same result:

SELECT ARRAY[ARRAY[1,2], ARRAY[3,4]];
SELECT ARRAY[[1,2],[3,4]];
     array
---------------
 {{1,2},{3,4}}

Since multidimensional arrays must be rectangular, inner constructors at the same level must produce sub-arrays of identical dimensions.

Multidimensional array constructor elements are not limited to a sub-ARRAY construct; they are anything that produces an array of the proper kind. For example:

CREATE TABLE arr(f1 int[], f2 int[]);
INSERT INTO arr VALUES (ARRAY[[1,2],[3,4]], 
ARRAY[[5,6],[7,8]]);
SELECT ARRAY[f1, f2, '{{9,10},{11,12}}'::int[]] FROM arr;
                     array
------------------------------------------------
 {{{1,2},{3,4}},{{5,6},{7,8}},{{9,10},{11,12}}}

You can construct an array from the results of a subquery. Write the array constructor with the keyword ARRAY followed by a subquery in parentheses. For example:

SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%');
                          ?column?
-----------------------------------------------------------
 {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31}

The subquery must return a single column. The resulting one-dimensional array has an element for each row in the subquery result, with an element type matching that of the subquery's output column. The subscripts of an array value built with ARRAY always begin with 1.

Row Constructors

A row constructor is an expression that builds a row value (also called a composite value) from values for its member fields. For example,

SELECT ROW(1,2.5,'this is a test');

Row constructors have the syntax rowvalue.*, which expands to a list of the elements of the row value, as when you use the syntax .* at the top level of a SELECT list. For example, if table t has columns f1 and f2, the following queries are the same:

SELECT ROW(t.*, 42) FROM t;
SELECT ROW(t.f1, t.f2, 42) FROM t;

By default, the value created by a ROW expression has an anonymous record type. If necessary, it can be cast to a named composite type — either the row type of a table, or a composite type created with CREATE TYPE AS. To avoid ambiguity, you can explicitly cast the value if necessary. For example:

CREATE TABLE mytable(f1 int, f2 float, f3 text);
CREATE FUNCTION getf1(mytable) RETURNS int AS 'SELECT $1.f1' 
LANGUAGE SQL;

In the following query, you do not need to cast the value because there is only one getf1() function and therefore no ambiguity:

SELECT getf1(ROW(1,2.5,'this is a test'));
 getf1
-------
     1
CREATE TYPE myrowtype AS (f1 int, f2 text, f3 numeric);
CREATE FUNCTION getf1(myrowtype) RETURNS int AS 'SELECT 
$1.f1' LANGUAGE SQL;

Now we need a cast to indicate which function to call:

SELECT getf1(ROW(1,2.5,'this is a test'));
ERROR:  function getf1(record) is not unique
SELECT getf1(ROW(1,2.5,'this is a test')::mytable);
 getf1
-------
     1
SELECT getf1(CAST(ROW(11,'this is a test',2.5) AS 
myrowtype));
 getf1
-------
    11

You can use row constructors to build composite values to be stored in a composite-type table column or to be passed to a function that accepts a composite parameter.

Expression Evaluation Rules

The order of evaluation of subexpressions is undefined. The inputs of an operator or function are not necessarily evaluated left-to-right or in any other fixed order.

If you can determine the result of an expression by evaluating only some parts of the expression, then other subexpressions might not be evaluated at all. For example, in the following expression:

SELECT true OR somefunc();

somefunc() would probably not be called at all. The same is true in the following expression:

SELECT somefunc() OR true;

This is not the same as the left-to-right evaluation order that Boolean operators enforce in some programming languages.

Do not use functions with side effects as part of complex expressions, especially in WHERE and HAVING clauses, because those clauses are extensively reprocessed when developing an execution plan. Boolean expressions (AND/OR/NOT combinations) in those clauses can be reorganized in any manner that Boolean algebra laws allow.

Use a CASE construct to force evaluation order. The following example is an untrustworthy way to avoid division by zero in a WHERE clause:

SELECT ... WHERE x <> 0 AND y/x > 1.5;

The following example shows a trustworthy evaluation order:

SELECT ... WHERE CASE WHEN x <> 0 THEN y/x > 1.5 ELSE false 
END;

This CASE construct usage defeats optimization attempts; use it only when necessary.

Using Functions and Operators

Using Functions in Greenplum Database

Table 1. Functions in Greenplum Database
Function Type Greenplum Support Description Comments
IMMUTABLE Yes Relies only on information directly in its argument list. Given the same argument values, always returns the same result.  
STABLE Yes, in most cases Within a single table scan, returns the same result for same argument values, but results change across SQL statements. Results depend on database lookups or parameter values. current_timestamp family of functions is STABLE; values do not change within an execution.
VOLATILE Restricted Function values can change within a single table scan. For example: random(), currval(), timeofday(). Any function with side effects is volatile, even if its result is predictable. For example: setval().

In Greenplum Database, data is divided up across segments — each segment is a distinct PostgreSQL database. To prevent inconsistent or unexpected results, do not execute functions classified as VOLATILE at the segment level if they contain SQL commands or modify the database in any way. For example, functions such as setval() are not allowed to execute on distributed data in Greenplum Database because they can cause inconsistent data between segment instances.

To ensure data consistency, you can safely use VOLATILE and STABLE functions in statements that are evaluated on and run from the master. For example, the following statements run on the master (statements without a FROM clause):

SELECT setval('myseq', 201);
SELECT foo();

If a statement has a FROM clause containing a distributed table and the function in the FROM clause returns a set of rows, the statement can run on the segments:

SELECT * from foo();

Greenplum Database does not support functions that return a table reference (rangeFuncs) or functions that use the refCursor datatype.

User-Defined Functions

Greenplum Database supports user-defined functions. See Extending SQL in the PostgreSQL documentation for more information.

Use the CREATE FUNCTION statement to register user-defined functions that are used as described in Using Functions in Greenplum Database. By default, user-defined functions are declared as VOLATILE, so if your user-defined function is IMMUTABLE or STABLE, you must specify the correct volatility level when you register your function.

When you create user-defined functions, avoid using fatal errors or destructive calls. Greenplum Database may respond to such errors with a sudden shutdown or restart.

In Greenplum Database, the shared library files for user-created functions must reside in the same library path location on every host in the Greenplum Database array (masters, segments, and mirrors).

Built-in Functions and Operators

The following table lists the categories of built-in functions and operators supported by PostgreSQL. All functions and operators are supported in Greenplum Database as in PostgreSQL with the exception of STABLE and VOLATILE functions, which are subject to the restrictions noted in Using Functions in Greenplum Database. See the Functions and Operators section of the PostgreSQL documentation for more information about these built-in functions and operators.

Table 2. Built-in functions and operators
Operator/Function Category VOLATILE Functions STABLE Functions Restrictions
Logical Operators      
Comparison Operators      
Mathematical Functions and Operators random

setseed

   
String Functions and Operators All built-in conversion functions convert

pg_client_encoding

 
Binary String Functions and Operators      
Bit String Functions and Operators      
Pattern Matching      
Data Type Formatting Functions   to_char

to_timestamp

 
Date/Time Functions and Operators timeofday age

current_date

current_time

current_timestamp

localtime

localtimestamp

now

 
Geometric Functions and Operators      
Network Address Functions and Operators      
Sequence Manipulation Functions currval

lastval

nextval

setval

   
Conditional Expressions      
Array Functions and Operators   All array functions  
Aggregate Functions      
Subquery Expressions      
Row and Array Comparisons      
Set Returning Functions generate_series    
System Information Functions   All session information functions

All access privilege inquiry functions

All schema visibility inquiry functions

All system catalog information functions

All comment information functions

 
System Administration Functions set_config

pg_cancel_backend

pg_reload_conf

pg_rotate_logfile

pg_start_backup

pg_stop_backup

pg_size_pretty

pg_ls_dir

pg_read_file

pg_stat_file

current_setting

All database object size functions

 
XML Functions   xmlagg(xml)

xmlexists(text, xml)

xml_is_well_formed(text)

xml_is_well_formed_document(text)

xml_is_well_formed_content(text)

xpath(text, xml)

xpath(text, xml, text[])

xpath_exists(text, xml)

xpath_exists(text, xml, text[])

xml(text)

text(xml)

xmlcomment(xml)

xmlconcat2(xml, xml)

 

Window Functions

The following built-in window functions are Greenplum extensions to the PostgreSQL database. All window functions are immutable. For more information about window functions, see Window Expressions.

Table 3. Window functions
Function Return Type Full Syntax Description
cume_dist() double precision CUME_DIST() OVER ( [PARTITION BY expr ] ORDER BY expr ) Calculates the cumulative distribution of a value in a group of values. Rows with equal values always evaluate to the same cumulative distribution value.
dense_rank() bigint DENSE_RANK () OVER ( [PARTITION BY expr ] ORDER BY expr ) Computes the rank of a row in an ordered group of rows without skipping rank values. Rows with equal values are given the same rank value.
first_value(expr) same as input expr type FIRST_VALUE( expr ) OVER ( [PARTITION BY expr ] ORDER BY expr [ROWS|RANGE frame_expr ] ) Returns the first value in an ordered set of values.
lag(expr [,offset] [,default]) same as input expr type LAG( expr [, offset ] [, default ]) OVER ( [PARTITION BY expr ] ORDER BY expr ) Provides access to more than one row of the same table without doing a self join. Given a series of rows returned from a query and a position of the cursor, LAG provides access to a row at a given physical offset prior to that position. The default offset is 1. default sets the value that is returned if the offset goes beyond the scope of the window. If default is not specified, the default value is null.
last_valueexpr same as input expr type LAST_VALUE(expr) OVER ( [PARTITION BY expr] ORDER BY expr [ROWS|RANGE frame_expr] ) Returns the last value in an ordered set of values.
lead(expr [,offset] [,default]) same as input expr type LEAD(expr [,offset] [,exprdefault]) OVER ( [PARTITION BY expr] ORDER BY expr ) Provides access to more than one row of the same table without doing a self join. Given a series of rows returned from a query and a position of the cursor, lead provides access to a row at a given physical offset after that position. If offset is not specified, the default offset is 1. default sets the value that is returned if the offset goes beyond the scope of the window. If default is not specified, the default value is null.
ntile(expr) bigint NTILE(expr) OVER ( [PARTITION BY expr] ORDER BY expr ) Divides an ordered data set into a number of buckets (as defined by expr) and assigns a bucket number to each row.
percent_rank() double precision PERCENT_RANK () OVER ( [PARTITION BY expr] ORDER BY expr ) Calculates the rank of a hypothetical row R minus 1, divided by 1 less than the number of rows being evaluated (within a window partition).
rank() bigint RANK () OVER ( [PARTITION BY expr] ORDER BY expr ) Calculates the rank of a row in an ordered group of values. Rows with equal values for the ranking criteria receive the same rank. The number of tied rows are added to the rank number to calculate the next rank value. Ranks may not be consecutive numbers in this case.
row_number() bigint ROW_NUMBER () OVER ( [PARTITION BY expr] ORDER BY expr ) Assigns a unique number to each row to which it is applied (either each row in a window partition or each row of the query).

Advanced Analytic Functions

The following built-in advanced analytic functions are Greenplum extensions of the PostgreSQL database. Analytic functions are immutable.

Table 4. Advanced Analytic Functions
Function Return Type Full Syntax Description
matrix_add(array[], array[]) smallint[], int[], bigint[], float[] matrix_add( array[[1,1],[2,2]], array[[3,4],[5,6]]) Adds two two-dimensional matrices. The matrices must be conformable.
matrix_multiply( array[], array[]) smallint[]int[], bigint[], float[] matrix_multiply( array[[2,0,0],[0,2,0],[0,0,2]], array[[3,0,3],[0,3,0],[0,0,3]] ) Multiplies two, three- dimensional arrays. The matrices must be conformable.
matrix_multiply( array[], expr) int[], float[] matrix_multiply( array[[1,1,1], [2,2,2], [3,3,3]], 2) Multiplies a two-dimensional array and a scalar numeric value.
matrix_transpose( array[]) Same as input array type. matrix_transpose( array [[1,1,1],[2,2,2]]) Transposes a two-dimensional array.
pinv(array []) smallint[]int[], bigint[], float[] pinv(array[[2.5,0,0],[0,1,0],[0,0,.5]]) Calculates the Moore-Penrose pseudoinverse of a matrix.
unnest (array[]) set of anyelement unnest( array['one', 'row', 'per', 'item']) Transforms a one dimensional array into rows. Returns a set of anyelement, a polymorphic pseudotype in PostgreSQL.
Table 5. Advanced Aggregate Functions
Function Return Type Full Syntax Description
MEDIAN (expr) timestamp, timestampz, interval, float MEDIAN (expression)

Example:

SELECT department_id, MEDIAN(salary) 
FROM employees 
GROUP BY department_id; 
Can take a two-dimensional array as input. Treats such arrays as matrices.
PERCENTILE_CONT (expr) WITHIN GROUP (ORDER BY expr [DESC/ASC]) timestamp, timestampz, interval, float PERCENTILE_CONT(percentage) WITHIN GROUP (ORDER BY expression)

Example:

SELECT department_id,
PERCENTILE_CONT (0.5) WITHIN GROUP (ORDER BY salary DESC)
"Median_cont"; 
FROM employees GROUP BY department_id;
Performs an inverse distirbution function that assumes a continuous distribution model. It takes a percentile value and a sort specification and returns the same datatype as the numeric datatype of the argument. This returned value is a computed result after performing linear interpolation. Null are ignored in this calculation.
PERCENTILE_DESC (expr) WITHIN GROUP (ORDER BY expr [DESC/ASC]) timestamp, timestampz, interval, float PERCENTILE_DESC(percentage) WITHIN GROUP (ORDER BY expression)

Example:

SELECT department_id, 
PERCENTILE_DESC (0.5) WITHIN GROUP (ORDER BY salary DESC)
"Median_desc"; 
FROM employees GROUP BY department_id;
Performs an inverse distirbution function that assumes a discrete distribution model. It takes a percentile value and a sort specification. This returned value is an element from the set. Null are ignored in this calculation.
sum(array[]) smallint[]int[], bigint[], float[] sum(array[[1,2],[3,4]])

Example:

CREATE TABLE mymatrix (myvalue int[]);
INSERT INTO mymatrix VALUES (array[[1,2],[3,4]]);
INSERT INTO mymatrix VALUES (array[[0,1],[1,0]]);
SELECT sum(myvalue) FROM mymatrix;
 sum 
---------------
 {{1,3},{4,4}}
Performs matrix summation. Can take as input a two-dimensional array that is treated as a matrix.
pivot_sum (label[], label, expr) int[], bigint[], float[] pivot_sum( array['A1','A2'], attr, value) A pivot aggregation using sum to resolve duplicate entries.
mregr_coef(expr, array[]) float[] mregr_coef(y, array[1, x1, x2]) The four mregr_* aggregates perform linear regressions using the ordinary-least-squares method. mregr_coef calculates the regression coefficients. The size of the return array for mregr_coef is the same as the size of the input array of independent variables, since the return array contains the coefficient for each independent variable.
mregr_r2 (expr, array[]) float mregr_r2(y, array[1, x1, x2]) The four mregr_* aggregates perform linear regressions using the ordinary-least-squares method. mregr_r2 calculates the r-squared error value for the regression.
mregr_pvalues(expr, array[]) float[] mregr_pvalues(y, array[1, x1, x2]) The four mregr_* aggregates perform linear regressions using the ordinary-least-squares method. mregr_pvalues calculates the p-values for the regression.
mregr_tstats(expr, array[]) float[] mregr_tstats(y, array[1, x1, x2]) The four mregr_* aggregates perform linear regressions using the ordinary-least-squares method. mregr_tstats calculates the t-statistics for the regression.
nb_classify(text[], bigint, bigint[], bigint[]) text nb_classify(classes, attr_count, class_count, class_total) Classify rows using a Naive Bayes Classifier. This aggregate uses a baseline of training data to predict the classification of new rows and returns the class with the largest likelihood of appearing in the new rows.
nb_probabilities(text[], bigint, bigint[], bigint[]) text nb_probabilities(classes, attr_count, class_count, class_total) Determine probability for each class using a Naive Bayes Classifier. This aggregate uses a baseline of training data to predict the classification of new rows and returns the probabilities that each class will appear in new rows.

Advanced Analytic Function Examples

These examples illustrate selected advanced analytic functions in queries on simplified example data. They are for the multiple linear regression aggregate functions and for Naive Bayes Classification with nb_classify.

Linear Regression Aggregates Example

The following example uses the four linear regression aggregates mregr_coef, mregr_r2, mregr_pvalues, and mregr_tstats in a query on the example table regr_example. In this example query, all the aggregates take the dependent variable as the first parameter and an array of independent variables as the second parameter.

SELECT mregr_coef(y, array[1, x1, x2]), 
       mregr_r2(y, array[1, x1, x2]),
       mregr_pvalues(y, array[1, x1, x2]),
       mregr_tstats(y, array[1, x1, x2])
from regr_example;

Table regr_example:

 id | y  | x1 | x2
----+----+----+----
  1 |  5 |  2 |  1
  2 | 10 |  4 |  2
  3 |  6 |  3 |  1
  4 |  8 |  3 |  1

Running the example query against this table yields one row of data with the following values:

mregr_coef:

{-7.105427357601e-15,2.00000000000003,0.999999999999943}

mregr_r2:

0.86440677966103

mregr_pvalues:

{0.999999999999999,0.454371051656992,0.783653104061216}

mregr_tstats:

{-2.24693341988919e-15,1.15470053837932,0.35355339059327}

Greenplum Database returns NaN (not a number) if the results of any of these agregates are undefined. This can happen if there is a very small amount of data.

Note:

The intercept is computed by setting one of the independent variables to 1, as shown in the preceding example.

Naive Bayes Classification Examples

The aggregates nb_classify and nb_probabilities are used within a larger four-step classification process that involves the creation of tables and views for training data. The following two examples show all the steps. The first example shows a small data set with arbitrary values, and the second example is the Greenplum implementation of a popular Naive Bayes example based on weather conditions.

Overview

The following describes the Naive Bayes classification procedure. In the examples, the value names become the values of the field attr:

  1. Unpivot the data.

    If the data is not denormalized, create a view with the identification and classification that unpivots all the values. If the data is already in denormalized form, you do not need to unpivot the data.

  2. Create a training table.

    The training table shifts the view of the data to the values of the field attr.

  3. Create a summary view of the training data.
  4. Aggregate the data with nb_classify, nb_probabilities,or both.
Naive Bayes Example 1 – Small Table

This example begins with the normalized data in the example table class_example and proceeds through four discrete steps:

Table class_example:

 id | class | a1 | a2 | a3 
----+-------+----+----+----
  1 | C1    |  1 |  2 |  3
  2 | C1    |  1 |  4 |  3
  3 | C2    |  0 |  2 |  2
  4 | C1    |  1 |  2 |  1
  5 | C2    |  1 |  2 |  2
  6 | C2    |  0 |  1 |  3
  1. Unpivot the data.

    For use as training data, the data in class_example must be unpivoted because the data is in denormalized form. The terms in single quotation marks define the values to use for the new field attr. By convention, these values are the same as the field names in the normalized table. In this example, these values are capitalized to highlight where they are created in the command.

    CREATE view class_example_unpivot AS
    SELECT id, class, unnest(array['A1', 'A2', 'A3']) as attr, 
    unnest(array[a1,a2,a3]) as value FROM class_example; 
    

    The unpivoted view shows the normalized data. It is not necessary to use this view. Use the command SELECT * from class_example_unpivot to see the denormalized data:

     id | class | attr | value
    ----+-------+------+-------
      2 | C1    | A1   |     1
      2 | C1    | A2   |     2
      2 | C1    | A3   |     1
      4 | C2    | A1   |     1
      4 | C2    | A2   |     2
      4 | C2    | A3   |     2
      6 | C2    | A1   |     0
      6 | C2    | A2   |     1
      6 | C2    | A3   |     3
      1 | C1    | A1   |     1
      1 | C1    | A2   |     2
      1 | C1    | A3   |     3
      3 | C1    | A1   |     1
      3 | C1    | A2   |     4
      3 | C1    | A3   |     3
      5 | C2    | A1   |     0
      5 | C2    | A2   |     2
      5 | C2    | A3   |     2
    (18 rows)
    
  2. Create a training table from the unpivoted data.

    The terms in single quotation marks define the values to sum. The terms in the array passed into pivot_sum must match the number and names of classifications in the original data. In the example, C1 and C2:

    CREATE table class_example_nb_training AS
    SELECT attr, value, pivot_sum(array['C1', 'C2'], class, 1) 
    as class_count
    FROM   class_example_unpivot
    GROUP BY attr, value
    DISTRIBUTED by (attr); 
    

    The following is the resulting training table:

     attr | value | class_count
    ------+-------+-------------
     A3   |     1 | {1,0}
     A3   |     3 | {2,1}
     A1   |     1 | {3,1}
     A1   |     0 | {0,2}
     A3   |     2 | {0,2}
     A2   |     2 | {2,2}
     A2   |     4 | {1,0}
     A2   |     1 | {0,1}
    (8 rows)
    
  3. Create a summary view of the training data.
    CREATE VIEW class_example_nb_classify_functions AS
    SELECT attr, value, class_count, array['C1', 'C2'] as classes,
    sum(class_count) over (wa)::integer[] as class_total,
    count(distinct value) over (wa) as attr_count
    FROM class_example_nb_training
    WINDOW wa as (partition by attr);

    The following is the resulting training table:

    attr| value | class_count| classes | class_total |attr_count
    -----+-------+------------+---------+-------------+---------
     A2  |     2 | {2,2}      | {C1,C2} | {3,3}       |         3
     A2  |     4 | {1,0}      | {C1,C2} | {3,3}       |         3
     A2  |     1 | {0,1}      | {C1,C2} | {3,3}       |         3
     A1  |     0 | {0,2}      | {C1,C2} | {3,3}       |         2
     A1  |     1 | {3,1}      | {C1,C2} | {3,3}       |         2
     A3  |     2 | {0,2}      | {C1,C2} | {3,3}       |         3
     A3  |     3 | {2,1}      | {C1,C2} | {3,3}       |         3
     A3  |     1 | {1,0}      | {C1,C2} | {3,3}       |         3
    (8 rows)
    
  4. Classify rows with nb_classify and display the probability with nb_probabilities.

    After you prepare the view, the training data is ready for use as a baseline for determining the class of incoming rows. The following query predicts whether rows are of class C1 or C2 by using the nb_classify aggregate:

    SELECT nb_classify(classes, attr_count, class_count, 
    class_total) as class
    FROM class_example_nb_classify_functions
    where (attr = 'A1' and value = 0) or (attr = 'A2' and value = 
    2) or (attr = 'A3' and value = 1);
    

    Running the example query against this simple table yields one row of data displaying these values:

    This query yields the expected single-row result of C1.

    class 
    -------
    C2
    (1 row)
    

    Display the probabilities for each class with nb_probabilities.

    Once the view is prepared, the system can use the training data as a baseline for determining the class of incoming rows. The following query predicts whether rows are of class C1 or C2 by using the nb_probabilities aggregate:

    SELECT nb_probabilities(classes, attr_count, class_count, 
    class_total) as probability
    FROM class_example_nb_classify_functions
    where (attr = 'A1' and value = 0) or (attr = 'A2' and value = 
    2) or (attr = 'A3' and value = 1);
    

    Running the example query against this simple table yields one row of data displaying the probabilities for each class:

    This query yields the expected single-row result showing two probabilities, the first for C1,and the second for C2.

    probability
    -------------
     {0.4,0.6}
    (1 row)
    

    You can display the classification and the probabilities with the following query.

    SELECT nb_classify(classes, attr_count, class_count, 
    class_total) as class, nb_probabilities(classes, attr_count, 
    class_count, class_total) as probability FROM 
    class_example_nb_classify where (attr = 'A1' and value = 0) 
    or (attr = 'A2' and value = 2) or (attr = 'A3' and value = 
    1); 
    

    This query produces the following result:

  5.  class | probability
    -------+-------------
        C2 | {0.4,0.6}
     (1 row)
    

Actual data in production scenarios is more extensive than this example data and yields better results. Accuracy of classification with nb_classify and nb_probabilities improves significantly with larger sets of training data.

Naive Bayes Example 2 – Weather and Outdoor Sports

This example calculates the probabilities of whether the user will play an outdoor sport, such as golf or tennis, based on weather conditions. The table weather_example contains the example values. The identification field for the table is day. There are two classifications held in the field play: Yes or No. There are four weather attributes, outlook, temperature, humidity, and wind. The data is normalized.

 day | play | outlook  | temperature | humidity | wind
-----+------+----------+-------------+----------+--------
 2   | No   | Sunny    | Hot         | High     | Strong
 4   | Yes  | Rain     | Mild        | High     | Weak
 6   | No   | Rain     | Cool        | Normal   | Strong
 8   | No   | Sunny    | Mild        | High     | Weak
10   | Yes  | Rain     | Mild        | Normal   | Weak
12   | Yes  | Overcast | Mild        | High     | Strong
14   | No   | Rain     | Mild        | High     | Strong
 1   | No   | Sunny    | Hot         | High     | Weak
 3   | Yes  | Overcast | Hot         | High     | Weak
 5   | Yes  | Rain     | Cool        | Normal   | Weak
 7   | Yes  | Overcast | Cool        | Normal   | Strong
 9   | Yes  | Sunny    | Cool        | Normal   | Weak
11   | Yes  | Sunny    | Mild        | Normal   | Strong
13   | Yes  | Overcast | Hot         | Normal   | Weak
 (14 rows)

Because this data is normalized, all four Naive Bayes steps are required.

  1. Unpivot the data.
    CREATE view weather_example_unpivot AS SELECT day, play, 
    unnest(array['outlook','temperature', 'humidity','wind']) as 
    attr, unnest(array[outlook,temperature,humidity,wind]) as 
    value FROM weather_example; 
    

    Note the use of quotation marks in the command.

    The SELECT * from weather_example_unpivot displays the denormalized data and contains the following 56 rows.

     day | play | attr        | value
    -----+------+-------------+----------
       2 | No   | outlook     | Sunny
       2 | No   | temperature | Hot
       2 | No   | humidity    | High
       2 | No   | wind        | Strong
       4 | Yes  | outlook     | Rain
       4 | Yes  | temperature | Mild
       4 | Yes  | humidity    | High
       4 | Yes  | wind        | Weak
       6 | No   | outlook     | Rain
       6 | No   | temperature | Cool
       6 | No   | humidity    | Normal
       6 | No   | wind        | Strong
       8 | No   | outlook     | Sunny
       8 | No   | temperature | Mild
       8 | No   | humidity    | High
       8 | No   | wind        | Weak
      10 | Yes  | outlook     | Rain
      10 | Yes  | temperature | Mild
      10 | Yes  | humidity    | Normal
      10 | Yes  | wind        | Weak
      12 | Yes  | outlook     | Overcast
      12 | Yes  | temperature | Mild
      12 | Yes  | humidity    | High
      12 | Yes  | wind        | Strong
      14 | No   | outlook     | Rain
      14 | No   | temperature | Mild
      14 | No   | humidity    | High
      14 | No   | wind        | Strong
       1 | No   | outlook     | Sunny
       1 | No   | temperature | Hot
       1 | No   | humidity    | High
       1 | No   | wind        | Weak
       3 | Yes  | outlook     | Overcast
       3 | Yes  | temperature | Hot
       3 | Yes  | humidity    | High
       3 | Yes  | wind        | Weak
       5 | Yes  | outlook     | Rain
       5 | Yes  | temperature | Cool
       5 | Yes  | humidity    | Normal
       5 | Yes  | wind        | Weak
       7 | Yes  | outlook     | Overcast
       7 | Yes  | temperature | Cool
       7 | Yes  | humidity    | Normal
       7 | Yes  | wind        | Strong
       9 | Yes  | outlook     | Sunny
       9 | Yes  | temperature | Cool
       9 | Yes  | humidity    | Normal
       9 | Yes  | wind        | Weak
      11 | Yes  | outlook     | Sunny
      11 | Yes  | temperature | Mild
      11 | Yes  | humidity    | Normal
      11 | Yes  | wind        | Strong
      13 | Yes  | outlook     | Overcast
      13 | Yes  | temperature | Hot
      13 | Yes  | humidity    | Normal
      13 | Yes  | wind        | Weak
      (56 rows)
  2. Create a training table.
    CREATE table weather_example_nb_training AS SELECT attr, 
    value, pivot_sum(array['Yes','No'], play, 1) as class_count 
    FROM weather_example_unpivot GROUP BY attr, value 
    DISTRIBUTED by (attr); 
    

    The SELECT * from weather_example_nb_training displays the training data and contains the following 10 rows.

     attr        | value    | class_count
    -------------+----------+-------------
    outlook      | Rain     | {3,2}
    humidity     | High     | {3,4}
    outlook      | Overcast | {4,0}
    humidity     | Normal   | {6,1}
    outlook      | Sunny    | {2,3}
    wind         | Strong   | {3,3}
    temperature  | Hot      | {2,2}
    temperature  | Cool     | {3,1}
    temperature  | Mild     | {4,2}
    wind         | Weak     | {6,2}
    (10 rows)
  3. Create a summary view of the training data.
    CREATE VIEW weather_example_nb_classify_functions AS SELECT 
    attr, value, class_count, array['Yes','No'] as 
    classes,sum(class_count) over (wa)::integer[] as 
    class_total,count(distinct value) over (wa) as attr_count 
    FROM weather_example_nb_training WINDOW wa as (partition by attr);
    

    The SELECT * from weather_example_nb_classify_function displays the training data and contains the following 10 rows.

    attr        |  value  | class_count| classes | class_total| attr_count
    ------------+-------- +------------+---------+------------+-----------
    temperature | Mild    | {4,2}      | {Yes,No}| {9,5}      |         3
    temperature | Cool    | {3,1}      | {Yes,No}| {9,5}      |         3
    temperature | Hot     | {2,2}      | {Yes,No}| {9,5}      |         3
    wind        | Weak    | {6,2}      | {Yes,No}| {9,5}      |         2
    wind        | Strong  | {3,3}      | {Yes,No}| {9,5}      |         2
    humidity    | High    | {3,4}      | {Yes,No}| {9,5}      |         2
    humidity    | Normal  | {6,1}      | {Yes,No}| {9,5}      |         2
    outlook     | Sunny   | {2,3}      | {Yes,No}| {9,5}      |         3
    outlook     | Overcast| {4,0}      | {Yes,No}| {9,5}      |         3
    outlook     | Rain    | {3,2}      | {Yes,No}| {9,5}      |         3
    (10 rows)
  4. Aggregate the data with nb_classify, nb_probabilities, or both.

    Decide what to classify. To classify only one record with the following values:

    temperature | wind | humidity | outlook
    ------------+------+----------+---------
    Cool        | Weak | High     | Overcast
    

    Use the following command to aggregate the data. The result gives the classification Yes or No and the probability of playing outdoor sports under this particular set of conditions.

    SELECT nb_classify(classes, attr_count, class_count, 
    class_total) as class,
           nb_probabilities(classes, attr_count, class_count, 
    class_total) as probability
    FROM weather_example_nb_classify_functions where
      (attr = 'temperature' and value = 'Cool') or
      (attr = 'wind'        and value = 'Weak') or
      (attr = 'humidity'    and value = 'High') or
      (attr = 'outlook'     and value = 'Overcast');
    

    The result is a single row.

    class  |              probability
    -------+---------------------------------------
     Yes   | {0.858103353920726,0.141896646079274}
    (1 row)
    

    To classify a group of records, load them into a table. In this example, the table t1 contains the following records:

     day | outlook  | temperature | humidity |  wind
    -----+----------+-------------+----------+--------
      15 | Sunny    | Mild        | High     | Strong
      16 | Rain     | Cool        | Normal   | Strong
      17 | Overcast | Hot         | Normal   | Weak
      18 | Rain     | Hot         | High     | Weak
    (4 rows)
    

    The following command aggregates the data against this table. The result gives the classification Yes or No and the probability of playing outdoor sports for each set of conditions in the table t1. Both the nb_classify and nb_probabilities aggregates are used.

    SELECT t1.day, 
           t1.temperature, t1.wind, t1.humidity, t1.outlook,
           nb_classify(classes, attr_count, class_count, 
    class_total) as class,
           nb_probabilities(classes, attr_count, class_count, 
    class_total) as probability
    FROM t1, weather_example_nb_classify_functions
    WHERE
      (attr = 'temperature' and value = t1.temperature) or
      (attr = 'wind'        and value = t1.wind) or
      (attr = 'humidity'    and value = t1.humidity) or
      (attr = 'outlook'     and value = t1.outlook)
    GROUP BY t1.day, t1.temperature, t1.wind, t1.humidity, 
    t1.outlook;
    

    The result is a four rows, one for each record in t1.

    day| temp| wind   | humidity | outlook  | class | probability
    ---+-----+--------+----------+----------+-------+--------------
    15 | Mild| Strong | High     | Sunny    | No    | {0.244694132334582,0.755305867665418}
    16 | Cool| Strong | Normal   | Rain     | Yes   | {0.751471997809119,0.248528002190881}
    18 | Hot | Weak   | High     | Rain     | No    | {0.446387538890131,0.553612461109869}
    17 | Hot | Weak   | Normal   | Overcast | Yes   | {0.9297192642788,0.0702807357212004}
    (4 rows)
    

Query Performance

Greenplum Database dynamically eliminates irrelevant partitions in a table and optimally allocates memory for different operators in a query. These enhancements scan less data for a query, accelerate query processing, and support more concurrency.

  • Dynamic Partition Elimination

    In Greenplum Database, values available only when a query runs are used to dynamically prune partitions, which improves query processing speed. Enable or disable dynamic partition elimination by setting the server configuration parameter gp_dynamic_partition_pruning to ON or OFF; it is ON by default.

  • Memory Optimizations

    Greenplum Database allocates memory optimally for different operators in a query and frees and re-allocates memory during the stages of processing a query.

Managing Spill Files Generated by Queries

Greenplum Database creates spill files, also known as workfiles, on disk if it does not have sufficient memory to execute an SQL query in memory. The default value of 100,000 spill files is sufficient for the majority of queries. However, if a query creates more than the specified number of spill files, Greenplum Database returns this error:

ERROR: number of workfiles per query limit exceeded
Reasons that cause a large number of spill files to be generated include:
  • Data skew is present in the queried data.
  • The amount memory allocated for the query is too low.

You might be able to run the query successfully by changing the query, changing the data distribution, or changing the system memory configuration. You can use the gp_workfile_* views to see spill file usage information. You can control the maximum amount of memory that can used by a query with the Greenplum Database server configuration parameters max_statement_mem, statement_mem, or through resource queues.

Monitoring a Greenplum System contains the following information:
  • Information about skew and how to check for data skew
  • Information about creating and using the gp_workfile_* views

For information about server configuration parameters, see the Greenplum Database Reference Guide. For information about resource queues, see Managing Workload and Resources.

If you have determined that the query must create more spill files than allowed by the value of server configuration parameter gp_workfile_limit_files_per_query, you can increase the value of the parameter.

Query Profiling

Greenplum Database devises a query plan for each query. Choosing the right query plan to match the query and data structure is necessary for good performance. A query plan defines how Greenplum Database will run the query in the parallel execution environment. Examine the query plans of poorly performing queries to identify possible performance tuning opportunities.

The query planner uses data statistics maintained by the database to choose a query plan with the lowest possible cost. Cost is measured in disk I/O, shown as units of disk page fetches. The goal is to minimize the total execution cost for the plan.

View the plan for a given query with the EXPLAIN command. EXPLAIN shows the query planner's estimated cost for the query plan. For example:

EXPLAIN SELECT * FROM names WHERE id=22;

EXPLAIN ANALYZE runs the statement in addition to displaying its plan. This is useful for determining how close the planner's estimates are to reality. For example:

EXPLAIN ANALYZE SELECT * FROM names WHERE id=22;

Reading EXPLAIN Output

A query plan is a tree of nodes. Each node in the plan represents a single operation, such as a table scan, join, aggregation, or sort.

Read plans from the bottom to the top: each node feeds rows into the node directly above it. The bottom nodes of a plan are usually table scan operations: sequential, index, or bitmap index scans. If the query requires joins, aggregations, sorts, or other operations on the rows, there are additional nodes above the scan nodes to perform these operations. The topmost plan nodes are usually Greenplum Database motion nodes: redistribute, explicit redistribute, broadcast, or gather motions. These operations move rows between segment instances during query processing.

The output of EXPLAIN has one line for each node in the plan tree and shows the basic node type and the following execution cost estimates for that plan node:

  • cost —Measured in units of disk page fetches. 1.0 equals one sequential disk page read. The first estimate is the start-up cost of getting the first row and the second is the total cost of cost of getting all rows. The total cost assumes all rows will be retrieved, which is not always true; for example, if the query uses LIMIT, not all rows are retrieved.
  • rows —The total number of rows output by this plan node. This number is usually less than the number of rows processed or scanned by the plan node, reflecting the estimated selectivity of any WHERE clause conditions. Ideally, the estimate for the topmost node approximates the number of rows that the query actually returns, updates, or deletes.
  • width —The total bytes of all the rows that this plan node outputs.

Note the following:

  • The cost of a node includes the cost of its child nodes. The topmost plan node has the estimated total execution cost for the plan. This is the number the planner intends to minimize.
  • The cost reflects only the aspects of plan execution that the query planner takes into consideration. For example, the cost does not reflect time spent transmitting result rows to the client.

EXPLAIN Example

The following example describes how to read an EXPLAIN query plan for a query:

EXPLAIN SELECT * FROM names WHERE name = 'Joelle';
                     QUERY PLAN
------------------------------------------------------------
Gather Motion 2:1 (slice1) (cost=0.00..20.88 rows=1 width=13)

   -> Seq Scan on 'names' (cost=0.00..20.88 rows=1 width=13)
         Filter: name::text ~~ 'Joelle'::text

Read the plan from the bottom to the top. To start, the query planner sequentially scans the names table. Notice the WHERE clause is applied as a filter condition. This means the scan operation checks the condition for each row it scans and outputs only the rows that satisfy the condition.

The results of the scan operation are passed to a gather motion operation. In Greenplum Database, a gather motion is when segments send rows to the master. In this example, we have two segment instances that send to one master instance. This operation is working on slice1 of the parallel query execution plan. A query plan is divided into slices so the segments can work on portions of the query plan in parallel.

The estimated startup cost for this plan is 00.00 (no cost) and a total cost of 20.88 disk page fetches. The planner estimates this query will return one row.

Reading EXPLAIN ANALYZE Output

EXPLAIN ANALYZE plans and runs the statement. The EXPLAIN ANALYZE plan shows the actual execution cost along with the planner's estimates. This allows you to see if the planner's estimates are close to reality. EXPLAIN ANALYZE also shows the following:

  • The total runtime (in milliseconds) in which the query executed.
  • The memory used by each slice of the query plan, as well as the memory reserved for the whole query statement.
  • The number of workers (segments) involved in a plan node operation. Only segments that return rows are counted.
  • The maximum number of rows returned by the segment that produced the most rows for the operation. If multiple segments produce an equal number of rows, EXPLAIN ANALYZE shows the segment with the longest <time> to end.
  • The segment id of the segment that produced the most rows for an operation.
  • For relevant operations, the amount of memory (work_mem) used by the operation. If the work_mem was insufficient to perform the operation in memory, the plan shows the amount of data spilled to disk for the lowest-performing segment. For example:
    Work_mem used: 64K bytes avg, 64K bytes max (seg0).
    Work_mem wanted: 90K bytes avg, 90K byes max (seg0) to lessen 
    workfile I/O affecting 2 workers.
    
  • The time (in milliseconds) in which the segment that produced the most rows retrieved the first row, and the time taken for that segment to retrieve all rows. The result may omit <time> to first row if it is the same as the <time> to end.

EXPLAIN ANALYZE Example

This example describes how to read an EXPLAINANALYZE query plan using the same query. The bold parts of the plan show actual timing and rows returned for each plan node, as well as memory and time statistics for the whole query.

EXPLAIN ANALYZE SELECT * FROM names WHERE name = 'Joelle';
                     QUERY PLAN
------------------------------------------------------------
Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..20.88 rows=1 width=13)
    Rows out: 1 rows at destination with 0.305 ms to first row, 0.537 ms to end, start offset by 0.289 ms.
        -> Seq Scan on names (cost=0.00..20.88 rows=1 width=13)
             Rows out: Avg 1 rows x 2 workers. Max 1 rows (seg0) with 0.255 ms to first row, 0.486 ms to end, start offset by 0.968 ms.
                 Filter: name = 'Joelle'::text
 Slice statistics:

      (slice0) Executor memory: 135K bytes.

    (slice1) Executor memory: 151K bytes avg x 2 workers, 151K bytes max (seg0).

Statement statistics:
 Memory used: 128000K bytes
 Total runtime: 22.548 ms

Read the plan from the bottom to the top. The total elapsed time to run this query was 22.548 milliseconds.

The sequential scan operation had only one segment (seg0) that returned rows, and it returned just 1 row. It took 0.255 milliseconds to find the first row and 0.486 to scan all rows. This result is close to the planner's estimate: the query planner estimated it would return one row for this query. The gather motion (segments sending data to the master) received 1 row . The total elapsed time for this operation was 0.537 milliseconds.

Examining Query Plans to Solve Problems

If a query performs poorly, examine its query plan and ask the following questions:

  • Do operations in the plan take an exceptionally long time? Look for an operation consumes the majority of query processing time. For example, if an index scan takes longer than expected, the index could be out-of-date and need to be reindexed. Or, adjust enable_<operator> parameters to see if you can force the planner to choose a different plan by disabling a particular query plan operator for that query.
  • Are the planner's estimates close to reality? Run EXPLAIN ANALYZE and see if the number of rows the planner estimates is close to the number of rows the query operation actually returns. If there is a large discrepancy, collect more statistics on the relevant columns. See the Greenplum Database Reference Guide for more information on the EXPLAIN ANALYZE and ANALYZE commands.
  • Are selective predicates applied early in the plan? Apply the most selective filters early in the plan so fewer rows move up the plan tree. If the query plan does not correctly estimate query predicate selectivity, collect more statistics on the relevant columns. See the ANALYZE command in the Greenplum Database Reference Guide for more information collecting statistics. You can also try reordering the WHERE clause of your SQL statement.
  • Does the planner choose the best join order? When you have a query that joins multiple tables, make sure that the planner chooses the most selective join order. Joins that eliminate the largest number of rows should be done earlier in the plan so fewer rows move up the plan tree.

    If the plan is not choosing the optimal join order, set join_collapse_limit=1 and use explicit JOIN syntax in your SQL statement to force the planner to the specified join order. You can also collect more statistics on the relevant join columns. See the ANALYZE command in the Greenplum Database Reference Guide for more information collecting statistics.

  • Does the planner selectively scan partitioned tables? If you use table partitioning, is the planner selectively scanning only the child tables required to satisfy the query predicates? Scans of the parent tables should return 0 rows since the parent tables do not contain any data. See Verifying Your Partition Strategy for an example of a query plan that shows a selective partition scan.
  • Does the planner choose hash aggregate and hash join operations where applicable? Hash operations are typically much faster than other types of joins or aggregations. Row comparison and sorting is done in memory rather than reading/writing from disk. To enable the query planner to choose hash operations, there must be sufficient memory available to hold the estimated number of rows. Try increasing work memory to improve performance for a query. If possible, run an EXPLAIN ANALYZE for the query to show which plan operations spilled to disk, how much work memory they used, and how much memory was required to avoid spilling to disk. For example:

    Work_mem used: 23430K bytes avg, 23430K bytes max (seg0). Work_mem wanted: 33649K bytes avg, 33649K bytes max (seg0) to lessen workfile I/O affecting 2 workers.

    The "bytes wanted" message from EXPLAIN ANALYZE is based on the amount of data written to work files and is not exact. The minimum work_mem needed can differ from the suggested value.