Greenplum Database Data Types

A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 5.x documentation.

Greenplum Database Data Types

Greenplum Database has a rich set of native data types available to users. Users may also define new data types using the CREATE TYPE command. This reference shows all of the built-in data types. In addition to the types listed here, there are also some internally used data types, such as oid (object identifier), but those are not documented in this guide.

Optional modules in the contrib directory may also install new data types. The hstore module, for example, introduces a new data type and associated functions for working with key-value pairs. See hstore Functions. The citext module adds a case-insensitive text data type. See citext Data Type.

The following data types are specified by SQL: bit, bit varying, boolean, character varying, varchar, character, char, date, double precision, integer, interval, numeric, decimal, real, smallint, time (with or without time zone), and timestamp (with or without time zone).

Each data type has an external representation determined by its input and output functions. Many of the built-in types have obvious external formats. However, several types are either unique to PostgreSQL (and Greenplum Database), such as geometric paths, or have several possibilities for formats, such as the date and time types. Some of the input and output functions are not invertible. That is, the result of an output function may lose accuracy when compared to the original input.

Table 1. Greenplum Database Built-in Data Types
Name Alias Size Range Description
bigint int8 8 bytes -922337203​6854775808 to 922337203​6854775807 large range integer
bigserial serial8 8 bytes 1 to 922337203​6854775807 large autoincrementing integer
bit [ (n) ]   n bits bit string constant fixed-length bit string
bit varying [ (n) ]1 varbit actual number of bits bit string constant variable-length bit string
boolean bool 1 byte true/false, t/f, yes/no, y/n, 1/0 logical boolean (true/false)
box   32 bytes ((x1,y1),(x2,y2)) rectangular box in the plane - not allowed in distribution key columns.
bytea1   1 byte + binary string sequence of octets variable-length binary string
character [ (n) ]1 char [ (n) ] 1 byte + n strings up to n characters in length fixed-length, blank padded
character varying [ (n) ]1 varchar [ (n) ] 1 byte + string size strings up to n characters in length variable-length with limit
cidr   12 or 24 bytes   IPv4 and IPv6 networks
circle   24 bytes <(x,y),r> (center and radius) circle in the plane - not allowed in distribution key columns.
date   4 bytes 4713 BC - 294,277 AD calendar date (year, month, day)
decimal [ (p, s) ]1 numeric [ (p, s) ] variable no limit user-specified precision, exact
double precision float8


8 bytes 15 decimal digits precision variable-precision, inexact
inet   12 or 24 bytes   IPv4 and IPv6 hosts and networks
integer int, int4 4 bytes -2147483648 to +2147483647 usual choice for integer
interval [ (p) ]   12 bytes -178000000 years - 178000000 years time span
json   1 byte + json size json of any length variable unlimited length
lseg   32 bytes ((x1,y1),(x2,y2)) line segment in the plane - not allowed in distribution key columns.
macaddr   6 bytes   MAC addresses
money   8 bytes -92233720368547758.08 to +92233720368547758.07 currency amount
path1   16+16n bytes [(x1,y1),...] geometric path in the plane - not allowed in distribution key columns.
point   16 bytes (x,y) geometric point in the plane - not allowed in distribution key columns.
polygon   40+16n bytes ((x1,y1),...) closed geometric path in the plane - not allowed in distribution key columns.
real float4 4 bytes 6 decimal digits precision variable-precision, inexact
serial serial4 4 bytes 1 to 2147483647 autoincrementing integer
smallint int2 2 bytes -32768 to +32767 small range integer
text1   1 byte + string size strings of any length variable unlimited length
time [ (p) ] [ without time zone ]   8 bytes 00:00:00[.000000] - 24:00:00[.000000] time of day only
time [ (p) ] with time zone timetz 12 bytes 00:00:00+1359 - 24:00:00-1359 time of day only, with time zone
timestamp [ (p) ] [ without time zone ]   8 bytes 4713 BC - 294,277 AD both date and time
timestamp [ (p) ] with time zone timestamptz 8 bytes 4713 BC - 294,277 AD both date and time, with time zone
uuid   16 bytes   Universally Unique Identifiers according to RFC 4122, ISO/IEC 9834-8:2005
xml1   1 byte + xml size xml of any length variable unlimited length
txid_snapshot       user-level transaction ID snapshot


Greenplum Database supports special-purpose data type entries that are collectively called pseudo-types. A pseudo-type cannot be used as a column data type, but it can be used to declare a function's argument or result type. Each of the available pseudo-types is useful in situations where a function's behavior does not correspond to simply taking or returning a value of a specific SQL data type.

Functions coded in procedural languages can use pseudo-types only as allowed by their implementation languages. The procedural languages all forbid use of a pseudo-type as an argument type, and allow only void and record as a result type.

A function with the pseudo-type record as a return data type returns an unspecified row type. The record represents an array of possibly-anonymous composite types. Since composite datums carry their own type identification, no extra knowledge is needed at the array level.

The pseudo-type void indicates that a function returns no value.

Note: Greenplum Database does not support triggers and the pseudo-type trigger.

The types anyelement, anyarray, anynonarray, and anyenum are pseudo-types called polymorphic types. Some procedural languages also support polymorphic functions using the types anyarray, anyelement, anyenum, and anynonarray.

The pseudo-type anytable is a Greenplum Database type that specifies a table expression—an expression that computes a table. Greenplum Database allows this type only as an argument to a user-defined function. See Table Value Expressions for more about the anytable pseudo-type.

For more information about pseudo-types, see the Postgres documentation about Pseudo-Types.

Polymorphic Types

Four pseudo-types of special interest are anyelement, anyarray, anynonarray, and anyenum, which are collectively called polymorphic types. Any function declared using these types is said to be a polymorphic function. A polymorphic function can operate on many different data types, with the specific data types being determined by the data types actually passed to it at runtime.

Polymorphic arguments and results are tied to each other and are resolved to a specific data type when a query calling a polymorphic function is parsed. Each position (either argument or return value) declared as anyelement is allowed to have any specific actual data type, but in any given call they must all be the same actual type. Each position declared as anyarray can have any array data type, but similarly they must all be the same type. If there are positions declared anyarray and others declared anyelement, the actual array type in the anyarray positions must be an array whose elements are the same type appearing in the anyelement positions. anynonarray is treated exactly the same as anyelement, but adds the additional constraint that the actual type must not be an array type. anyenum is treated exactly the same as anyelement, but adds the additional constraint that the actual type must be an enum type.

When more than one argument position is declared with a polymorphic type, the net effect is that only certain combinations of actual argument types are allowed. For example, a function declared as equal(anyelement, anyelement) takes any two input values, so long as they are of the same data type.

When the return value of a function is declared as a polymorphic type, there must be at least one argument position that is also polymorphic, and the actual data type supplied as the argument determines the actual result type for that call. For example, if there were not already an array subscripting mechanism, one could define a function that implements subscripting as subscript(anyarray, integer) returns anyelement. This declaration constrains the actual first argument to be an array type, and allows the parser to infer the correct result type from the actual first argument's type. Another example is that a function declared as myfunc(anyarray) returns anyenum will only accept arrays of enum types.

Note that anynonarray and anyenum do not represent separate type variables; they are the same type as anyelement, just with an additional constraint. For example, declaring a function as myfunc(anyelement, anyenum) is equivalent to declaring it as myfunc(anyenum, anyenum): both actual arguments must be the same enum type.

A variadic function (one taking a variable number of arguments) is polymorphic when its last parameter is declared as VARIADIC anyarray. For purposes of argument matching and determining the actual result type, such a function behaves the same as if you had declared the appropriate number of anynonarray parameters.

For more information about polymorphic types, see the Postgres documentation about Polymorphic Arguments and Return Types.

Table Value Expressions

The anytable pseudo-type declares a function argument that is a table value expression. The notation for a table value expression is a SELECT statement enclosed in a TABLE() function. You can specify a distribution policy for the table by adding SCATTER RANDOMLY, or a SCATTER BY clause with a column list to specify the distribution key.

The SELECT statement is executed when the function is called and the result rows are distributed to segments so that each segment executes the function with a subset of the result table.

For example, this table expression selects three columns from a table named customer and sets the distribution key to the first column:

TABLE(SELECT cust_key, name, address FROM customer SCATTER BY 1)

The SELECT statement may include joins on multiple base tables, WHERE clauses, aggregates, and any other valid query syntax.

The anytable type is only permitted in functions implemented in the C or C++ languages. The body of the function can access the table using the Greenplum Database Server Programming Interface (SPI) or the Greenplum Partner Connector (GPPC) API.

The anytable type is used in some user-defined functions in the Pivotal GPText API. The following GPText example uses the TABLE function with the SCATTER BY clause in the GPText function gptext.index() to populate the index mydb.mytest.articles with data from the messages table:
SELECT * FROM gptext.index(TABLE(SELECT * FROM mytest.messages 
          SCATTER BY distrib_id), 'mydb.mytest.messages');

For information about the function gptext.index(), see the Pivotal GPText documentation.

1 For variable length data types, if the data is greater than or equal to 127 bytes, the storage overhead is 4 bytes instead of 1.