The gp_toolkit Administrative Schema

The gp_toolkit Administrative Schema

Greenplum Database provides an administrative schema called gp_toolkit that you can use to query the system catalogs, log files, and operating environment for system status information. The gp_toolkit schema contains a number of views that you can access using SQL commands. The gp_toolkit schema is accessible to all database users, although some objects may require superuser permissions. For convenience, you may want to add the gp_toolkit schema to your schema search path. For example:

=> ALTER ROLE myrole SET search_path TO myschema,gp_toolkit;

This documentation describes the most useful views in gp_toolkit. You may notice other objects (views, functions, and external tables) within the gp_toolkit schema that are not described in this documentation (these are supporting objects to the views described in this section).

Warning: Do not change database objects in the gp_toolkit schema. Do not create database objects in the schema. Changes to objects in the schema might affect the accuracy of administrative information returned by schema objects. Any changes made in the gp_toolkit schema are lost when the database is backed up and then restored with the gpcrondump and gpdbrestore utilities.

Checking for Tables that Need Routine Maintenance

The following views can help identify tables that need routine table maintenance (VACUUM and/or ANALYZE).

The VACUUM or VACUUM FULL command reclaims disk space occupied by deleted or obsolete rows. Because of the MVCC transaction concurrency model used in Greenplum Database, data rows that are deleted or updated still occupy physical space on disk even though they are not visible to any new transactions. Expired rows increase table size on disk and eventually slow down scans of the table.

The ANALYZE command collects column-level statistics needed by the query optimizer. Greenplum Database uses a cost-based query optimizer that relies on database statistics. Accurate statistics allow the query optimizer to better estimate selectivity and the number of rows retrieved by a query operation in order to choose the most efficient query plan.

gp_bloat_diag

This view shows regular heap-storage tables that have bloat (the actual number of pages on disk exceeds the expected number of pages given the table statistics). Tables that are bloated require a VACUUM or a VACUUM FULL in order to reclaim disk space occupied by deleted or obsolete rows. This view is accessible to all users, however non-superusers will only be able to see the tables that they have permission to access.
Note: For diagnostic functions that return append-optimized table information, see Checking Append-Optimized Tables.
Table 1. gp_bloat_diag view
Column Description
bdirelid Table object id.
bdinspname Schema name.
bdirelname Table name.
bdirelpages Actual number of pages on disk.
bdiexppages Expected number of pages given the table data.
bdidiag Bloat diagnostic message.

gp_stats_missing

This view shows tables that do not have statistics and therefore may require an ANALYZE be run on the table.

Table 2. gp_stats_missing view
Column Description
smischema Schema name.
smitable Table name.
smisize Does this table have statistics? False if the table does not have row count and row sizing statistics recorded in the system catalog, which may indicate that the table needs to be analyzed. This will also be false if the table does not contain any rows. For example, the parent tables of partitioned tables are always empty and will always return a false result.
smicols Number of columns in the table.
smirecs Number of rows in the table.

Checking for Locks

When a transaction accesses a relation (such as a table), it acquires a lock. Depending on the type of lock acquired, subsequent transactions may have to wait before they can access the same relation. For more information on the types of locks, see "Managing Data" in the Greenplum Database Administrator Guide. Greenplum Database resource queues (used for workload management) also use locks to control the admission of queries into the system.

The gp_locks_* family of views can help diagnose queries and sessions that are waiting to access an object due to a lock.

gp_locks_on_relation

This view shows any locks currently being held on a relation, and the associated session information about the query associated with the lock. For more information on the types of locks, see "Managing Data" in the Greenplum Database Administrator Guide. This view is accessible to all users, however non-superusers will only be able to see the locks for relations that they have permission to access.

Table 3. gp_locks_on_relation view
Column Description
lorlocktype Type of the lockable object: relation, extend, page, tuple, transactionid, object, userlock, resource queue, or advisory
lordatabase Object ID of the database in which the object exists, zero if the object is a shared object.
lorrelname The name of the relation.
lorrelation The object ID of the relation.
lortransaction The transaction ID that is affected by the lock.
lorpid Process ID of the server process holding or awaiting this lock. NULL if the lock is held by a prepared transaction.
lormode Name of the lock mode held or desired by this process.
lorgranted Displays whether the lock is granted (true) or not granted (false).
lorcurrentquery The current query in the session.

gp_locks_on_resqueue

This view shows any locks currently being held on a resource queue, and the associated session information about the query associated with the lock. This view is accessible to all users, however non-superusers will only be able to see the locks associated with their own sessions.

Table 4. gp_locks_on_resqueue view
Column Description
lorusename Name of the user executing the session.
lorrsqname The resource queue name.
lorlocktype Type of the lockable object: resource queue
lorobjid The ID of the locked transaction.
lortransaction The ID of the transaction that is affected by the lock.
lorpid The process ID of the transaction that is affected by the lock.
lormode The name of the lock mode held or desired by this process.
lorgranted Displays whether the lock is granted (true) or not granted (false).
lorwaiting Displays whether or not the session is waiting.

Checking Append-Optimized Tables

The gp_toolkit schema includes a set of diagnostic functions you can use to investigate the state of append-optimized tables.

When an append-optimized table (or column-oriented append-optimized table) is created, another table is implicitly created, containing metadata about the current state of the table. The metadata includes information such as the number of records in each of the table's segments.

Append-optimized tables may have non-visible rows—rows that have been updated or deleted, but remain in storage until the table is compacted using VACUUM. The hidden rows are tracked using an auxiliary visibility map table, or visimap.

The following functions let you access the metadata for append-optimized and column-oriented tables and view non-visible rows. Some of the functions have two versions: one that takes the oid of the table, and one that takes the name of the table. The latter version has "_name" appended to the function name.

__gp_aovisimap_compaction_info(oid)

This function displays compaction information for an append-optimized table. The information is for the on-disk data files on Greenplum Database segments that store the table data. You can use the information to determine the data files that will be compacted by a VACUUM operation on an append-optimized table.

Note: Until a VACUUM operation deletes the row from the data file, deleted or updated data rows occupy physical space on disk even though they are hidden to new transactions. The configuration parameter gp_appendonly_compaction controls the functionality of the VACUUM command.

This table describes the __gp_aovisimap_compaction_info function output table.

Table 5. __gp_aovisimap_compaction_info output table
Column Description
content Greenplum Database segment ID.
datafile ID of the data file on the segment.
compaction_possible The value is either t or f. The value t indicates that the data in data file be compacted when a VACUUM operation is performed.

The server configuration parameter gp_appendonly_compaction_threshold affects this value.

hidden_tupcount In the data file, the number of hidden (deleted or updated) rows.
total_tupcount In the data file, the total number of rows.
percent_hidden In the data file, the ratio (as a percentage) of hidden (deleted or updated) rows to total rows.
Note: If you upgraded your cluster to Greenplum Database 4.3.5.0 or later, you can create the __gp_aovisimap_compaction_info function in an existing Greenplum database by running run the script $GPHOME/share/postgresql/compaction_info.sql once for each database. For example, to install the functions in database testdb, use this command:
$ psql -d testdb -f $GPHOME/share/postgresql/compaction_info.sql

If you created the database with Greenplum Database 4.3.5.0 or later, this function is automatically created in the database.

__gp_aoseg_name('table_name')

This function returns metadata information contained in the append-optimized table's on-disk segment file.

Table 6. __gp_aoseg_name output table
Column Description
segno The file segment number.
eof The effective end of file for this file segment.
tupcount The total number of tuples in the segment, including invisible tuples.
varblockcount The total number of varblocks in the file segment.
eof_uncompressed The end of file if the file segment were uncompressed.
modcount The number of data modification operations.
state The state of the file segment. Indicates if the segment is active or ready to be dropped after compaction.

__gp_aoseg_history(oid)

This function returns metadata information contained in the append-optimized table's on-disk segment file. It displays all different versions (heap tuples) of the aoseg meta information. The data is complex, but users with a deep understanding of the system may find it usefulfor debugging.

The input argument is the oid of the append-optimized table.

Call __gp_aoseg_history_name('table_name') to get the same result with the table name as an argument.

Table 7. __gp_aoseg_history output table
Column Description
gp_tid The id of the tuple.
gp_xmin The id of the earliest transaction.
gp_xmin_status Status of the gp_xmin transaction.
gp_xmin_commit_ The commit distribution id of the gp_xmin transaction.
gp_xmax The id of the latest transaction.
gp_xmax_status The status of the latest transaction.
gp_xmax_commit_ The commit distribution id of the gp_xmax transaction.
gp_command_id The id of the query command.
gp_infomask A bitmap containing state information.
gp_update_tid The ID of the newer tuple if the row is updated.
gp_visibility The tuple visibility status.
segno The number of the segment in the segment file.
tupcount The number of tuples, including hidden tuples.
eof The effective end of file for the segment.
eof_uncompressed The end of file for the segment if data were uncompressed.
modcount A count of data modifications.
state The status of the segment.

__gp_aocsseg(oid)

This function returns metadata information contained in a column-oriented append-optimized table's on-disk segment file, excluding non-visible rows. Each row describes a segment for a column in the table.

The input argument is the oid of a column-oriented append-optimized table. Call as __gp_aocsseg_name('table_name') to get the same result with the table name as an argument.

Table 8. __gp_aocsseg(oid) output table
Column Description
gp_tid The table id.
segno The segment number.
column_num The column number.
physical_segno The number of the segment in the segment file.
tupcount The number of rows in the segment, excluding hidden tuples.
eof The effective end of file for the segment.
eof_uncompressed The end of file for the segment if the data were uncompressed.
modcount A count of data modification operations for the segment.
state The status of the segment.

__gp_aocsseg_history(oid)

This function returns metadata information contained in a column-oriented append-optimized table's on-disk segment file. Each row describes a segment for a column in the table.The data is complex, but users with a deep understanding of the system may find it useful for debugging.

The input argument is the oid of a column-oriented append-optimized table. Call as __gp_aocsseg_history_name('table_name') to get the same result with the table name as argument.

Table 9. __gp_aocsseg_history output table
Column Description
gp_tid The oid of the tuple.
gp_xmin The earliest transaction.
gp_xmin_status The status of the gp_xmin transaction.
gp_xmin_ Text representation of gp_xmin.
gp_xmax The latest transaction.
gp_xmax_status The status of the gp_xmax transaction.
gp_xmax_ Text representation of gp_max.
gp_command_id ID of the command operating on the tuple.
gp_infomask A bitmap containing state information.
gp_update_tid The ID of the newer tuple if the row is updated.
gp_visibility The tuple visibility status.
segno The segment number in the segment file.
column_num The column number.
physical_segno The segment containing data for the column.
tupcount The total number of tuples in the segment.
eof The effective end of file for the segment.
eof_uncompressed The end of file for the segment if the data were uncompressed.
modcount A count of the data modification operations.
state The state of the segment.

__gp_aovisimap(oid)

This function returns the tuple id, the segment file, and the row number of each non-visible tuple according to the visibility map.

The input argument is the oid of an append-optimized table.

Use __gp_aovisimap_name('table_name') to get the same result with the table name as argument.

Column Description
tid The tuple id.
segno The number of the segment file.
row_num The row number of a row that has been deleted or updated.

__gp_aovisimap_hidden_info(oid)

This function returns the numbers of hidden and visible tuples in the segment files for an append-optimized table.

The input argument is the oid of the append-optimized table.

Call __gp_aovisimap_hidden_info_name('table_name') to get the same result with a table name argument.

Column Description
segno The number of the segment file.
hidden_tupcount The number of hidden tuples in the segment file.
total_tupcount The total number of tuples in the segment file.

__gp_aovisimap_entry(oid)

This function returns information about each visibility map entry for the table.

The input argument is the oid of an append-optimized table.

Call __gp_aovisimap_entry_name('table_name') to get the same result with a table name argument.

Table 10. __gp_aovisimap_entry output table
Column Description
segno Segment number of the visibility map entry.
first_row_num The first row number of the entry.
hidden_tupcount The number of hidden tuples in the entry.
bitmap A text representation of the visibility bitmap.

Viewing Greenplum Database Server Log Files

Each component of a Greenplum Database system (master, standby master, primary segments, and mirror segments) keeps its own server log files. The gp_log_* family of views allows you to issue SQL queries against the server log files to find particular entries of interest. The use of these views require superuser permissions.

gp_log_command_timings

This view uses an external table to read the log files on the master and report the execution time of SQL commands executed in a database session. The use of this view requires superuser permissions.

Table 11. gp_log_command_timings view
Column Description
logsession The session identifier (prefixed with "con").
logcmdcount The command number within a session (prefixed with "cmd").
logdatabase The name of the database.
loguser The name of the database user.
logpid The process id (prefixed with "p").
logtimemin The time of the first log message for this command.
logtimemax The time of the last log message for this command.
logduration Statement duration from start to end time.

gp_log_database

This view uses an external table to read the server log files of the entire Greenplum system (master, segments, and mirrors) and lists log entries associated with the current database. Associated log entries can be identified by the session id (logsession) and command id (logcmdcount). The use of this view requires superuser permissions.

Table 12. gp_log_database view
Column Description
logtime The timestamp of the log message.
loguser The name of the database user.
logdatabase The name of the database.
logpid The associated process id (prefixed with "p").
logthread The associated thread count (prefixed with "th").
loghost The segment or master host name.
logport The segment or master port.
logsessiontime Time session connection was opened.
logtransaction Global transaction id.
logsession The session identifier (prefixed with "con").
logcmdcount The command number within a session (prefixed with "cmd").
logsegment The segment content identifier (prefixed with "seg" for primary or "mir" for mirror. The master always has a content id of -1).
logslice The slice id (portion of the query plan being executed).
logdistxact Distributed transaction id.
loglocalxact Local transaction id.
logsubxact Subtransaction id.
logseverity LOG, ERROR, FATAL, PANIC, DEBUG1 or DEBUG2.
logstate SQL state code associated with the log message.
logmessage Log or error message text.
logdetail Detail message text associated with an error message.
loghint Hint message text associated with an error message.
logquery The internally-generated query text.
logquerypos The cursor index into the internally-generated query text.
logcontext The context in which this message gets generated.
logdebug Query string with full detail for debugging.
logcursorpos The cursor index into the query string.
logfunction The function in which this message is generated.
logfile The log file in which this message is generated.
logline The line in the log file in which this message is generated.
logstack Full text of the stack trace associated with this message.

gp_log_master_concise

This view uses an external table to read a subset of the log fields from the master log file. The use of this view requires superuser permissions.

Table 13. gp_log_master_concise view
Column Description
logtime The timestamp of the log message.
logdatabase The name of the database.
logsession The session identifier (prefixed with "con").
logcmdcount The command number within a session (prefixed with "cmd").
logmessage Log or error message text.

gp_log_system

This view uses an external table to read the server log files of the entire Greenplum system (master, segments, and mirrors) and lists all log entries. Associated log entries can be identified by the session id (logsession) and command id (logcmdcount). The use of this view requires superuser permissions.

Table 14. gp_log_system view
Column Description
logtime The timestamp of the log message.
loguser The name of the database user.
logdatabase The name of the database.
logpid The associated process id (prefixed with "p").
logthread The associated thread count (prefixed with "th").
loghost The segment or master host name.
logport The segment or master port.
logsessiontime Time session connection was opened.
logtransaction Global transaction id.
logsession The session identifier (prefixed with "con").
logcmdcount The command number within a session (prefixed with "cmd").
logsegment The segment content identifier (prefixed with "seg" for primary or "mir" for mirror. The master always has a content id of -1).
logslice The slice id (portion of the query plan being executed).
logdistxact Distributed transaction id.
loglocalxact Local transaction id.
logsubxact Subtransaction id.
logseverity LOG, ERROR, FATAL, PANIC, DEBUG1 or DEBUG2.
logstate SQL state code associated with the log message.
logmessage Log or error message text.
logdetail Detail message text associated with an error message.
loghint Hint message text associated with an error message.
logquery The internally-generated query text.
logquerypos The cursor index into the internally-generated query text.
logcontext The context in which this message gets generated.
logdebug Query string with full detail for debugging.
logcursorpos The cursor index into the query string.
logfunction The function in which this message is generated.
logfile The log file in which this message is generated.
logline The line in the log file in which this message is generated.
logstack Full text of the stack trace associated with this message.

Checking Server Configuration Files

Each component of a Greenplum Database system (master, standby master, primary segments, and mirror segments) has its own server configuration file (postgresql.conf). The following gp_toolkit objects can be used to check parameter settings across all primary postgresql.conf files in the system:

gp_param_setting('parameter_name')

This function takes the name of a server configuration parameter and returns the postgresql.conf value for the master and each active segment. This function is accessible to all users.

Table 15. gp_param_setting('parameter_name') function
Column Description
paramsegment The segment content id (only active segments are shown). The master content id is always -1.
paramname The name of the parameter.
paramvalue The value of the parameter.

Example:

SELECT * FROM gp_param_setting('max_connections');

gp_param_settings_seg_value_diffs

Server configuration parameters that are classified as local parameters (meaning each segment gets the parameter value from its own postgresql.conf file), should be set identically on all segments. This view shows local parameter settings that are inconsistent. Parameters that are supposed to have different values (such as port) are not included. This view is accessible to all users.

Table 16. gp_param_settings_seg_value_diffs view
Column Description
psdname The name of the parameter.
psdvalue The value of the parameter.
psdcount The number of segments that have this value.

Checking for Failed Segments

The gp_pgdatabase_invalid view can be used to check for down segments.

gp_pgdatabase_invalid

This view shows information about segments that are marked as down in the system catalog. This view is accessible to all users.

Table 17. gp_pgdatabase_invalid view
Column Description
pgdbidbid The segment dbid. Every segment has a unique dbid.
pgdbiisprimary Is the segment currently acting as the primary (active) segment? (t or f)
pgdbicontent The content id of this segment. A primary and mirror will have the same content id.
pgdbivalid Is this segment up and valid? (t or f)
pgdbidefinedprimary Was this segment assigned the role of primary at system initialization time? (t or f)

Checking Resource Queue Activity and Status

The purpose of resource queues is to limit the number of active queries in the system at any given time in order to avoid exhausting system resources such as memory, CPU, and disk I/O. All database users are assigned to a resource queue, and every statement submitted by a user is first evaluated against the resource queue limits before it can run. The gp_resq_* family of views can be used to check the status of statements currently submitted to the system through their respective resource queue. Note that statements issued by superusers are exempt from resource queuing.

gp_resq_activity

For the resource queues that have active workload, this view shows one row for each active statement submitted through a resource queue. This view is accessible to all users.

Table 18. gp_resq_activity view
Column Description
resqprocpid Process ID assigned to this statement (on the master).
resqrole User name.
resqoid Resource queue object id.
resqname Resource queue name.
resqstart Time statement was issued to the system.
resqstatus Status of statement: running, waiting or cancelled.

gp_resq_activity_by_queue

For the resource queues that have active workload, this view shows a summary of queue activity. This view is accessible to all users.

Table 19. gp_resq_activity_by_queue Column
Column Description
resqoid Resource queue object id.
resqname Resource queue name.
resqlast Time of the last statement issued to the queue.
resqstatus Status of last statement: running, waiting or cancelled.
resqtotal Total statements in this queue.

gp_resq_priority_statement

This view shows the resource queue priority, session ID, and other information for all statements currently running in the Greenplum Database system. This view is accessible to all users.

Table 20. gp_resq_priority_statement view
Column Description
rqpdatname The database name that the session is connected to.
rqpusename The user who issued the statement.
rqpsession The session ID.
rqpcommand The number of the statement within this session (the command id and session id uniquely identify a statement).
rqppriority The resource queue priority for this statement (MAX, HIGH, MEDIUM, LOW).
rqpweight An integer value associated with the priority of this statement.
rqpquery The query text of the statement.

gp_resq_role

This view shows the resource queues associated with a role. This view is accessible to all users.

Table 21. gp_resq_role view
Column Description
rrrolname Role (user) name.
rrrsqname The resource queue name assigned to this role. If a role has not been explicitly assigned to a resource queue, it will be in the default resource queue (pg_default).

gp_resqueue_status

This view allows administrators to see status and activity for a workload management resource queue. It shows how many queries are waiting to run and how many queries are currently active in the system from a particular resource queue.

Table 22. gp_resqueue_status view
Column Description
queueid The ID of the resource queue.
rsqname The name of the resource queue.
rsqcountlimit The active query threshold of the resource queue. A value of -1 means no limit.
rsqcountvalue The number of active query slots currently being used in the resource queue.
rsqcostlimit The query cost threshold of the resource queue. A value of -1 means no limit.
rsqcostvalue The total cost of all statements currently in the resource queue.
rsqmemorylimit The memory limit for the resource queue.
rsqmemoryvalue The total memory used by all statements currently in the resource queue.
rsqwaiters The number of statements currently waiting in the resource queue.
rsqholders The number of statements currently running on the system from this resource queue.

Checking Query Disk Spill Space Usage

The gp_workfile_* views show information about all the queries that are currently using disk spill space. Greenplum Database creates work files on disk if it does not have sufficient memory to execute the query in memory. This information can be used for troubleshooting and tuning queries. The information in the views can also be used to specify the values for the Greenplum Database configuration parameters gp_workfile_limit_per_query and gp_workfile_limit_per_segment.

gp_workfile_entries

This view contains one row for each operator using disk space for workfiles on a segment at the current time. The view is accessible to all users, however non-superusers only to see information for the databases that they have permission to access.

Table 23. gp_workfile_entries
Column Type References Description
command_cnt integer   Command ID of the query.
content smallint   The content identifier for a segment instance.
current_query text   Current query that the process is running.
datname name   Greenplum database name.
directory text   Path to the work file.
optype text   The query operator type that created the work file.
procpid integer   Process ID of the server process.
sess_id integer   Session ID.
size bigint   The size of the work file in bytes.
numfiles bigint   The number of files created.
slice smallint   The query plan slice. The portion of the query plan that is being executed.
state text   The state of the query that created the work file.
usename name   Role name.
workmem integer   The amount of memory allocated to the operator in KB.

gp_workfile_usage_per_query

This view contains one row for each query using disk space for workfiles on a segment at the current time. The view is accessible to all users, however non-superusers only to see information for the databases that they have permission to access.

Table 24. gp_workfile_usage_per_query
Column Type References Description
command_cnt integer   Command ID of the query.
content smallint   The content identifier for a segment instance.
current_query text   Current query that the process is running.
datname name   Greenplum database name.
procpid integer   Process ID of the server process.
sess_id integer   Session ID.
size bigint   The size of the work file in bytes.
numfiles bigint   The number of files created.
state text   The state of the query that created the work file.
usename name   Role name.

gp_workfile_usage_per_segment

This view contains one row for each segment. Each row displays the total amount of disk space used for workfiles on the segment at the current time. The view is accessible to all users, however non-superusers only to see information for the databases that they have permission to access.

Table 25. gp_workfile_usage_per_segment
Column Type References Description
content smallint   The content identifier for a segment instance.
size bigint   The total size of the work files on a segment.
numfiles bigint   The number of files created.

Viewing Users and Groups (Roles)

It is frequently convenient to group users (roles) together to ease management of object privileges: that way, privileges can be granted to, or revoked from, a group as a whole. In Greenplum Database this is done by creating a role that represents the group, and then granting membership in the group role to individual user roles.

The gp_roles_assigned view can be used to see all of the roles in the system, and their assigned members (if the role is also a group role).

gp_roles_assigned

This view shows all of the roles in the system, and their assigned members (if the role is also a group role). This view is accessible to all users.

Table 26. gp_roles_assigned view
Column Description
raroleid The role object ID. If this role has members (users), it is considered a group role.
rarolename The role (user or group) name.
ramemberid The role object ID of the role that is a member of this role.
ramembername Name of the role that is a member of this role.

Checking Database Object Sizes and Disk Space

The gp_size_* family of views can be used to determine the disk space usage for a distributed Greenplum Database, schema, table, or index. The following views calculate the total size of an object across all primary segments (mirrors are not included in the size calculations).

The table and index sizing views list the relation by object ID (not by name). To check the size of a table or index by name, you must look up the relation name (relname) in the pg_class table. For example:

SELECT relname as name, sotdsize as size, sotdtoastsize as 
toast, sotdadditionalsize as other 
FROM gp_size_of_table_disk as sotd, pg_class 
WHERE sotd.sotdoid=pg_class.oid ORDER BY relname;

gp_size_of_all_table_indexes

This view shows the total size of all indexes for a table. This view is accessible to all users, however non-superusers will only be able to see relations that they have permission to access.

Table 27. gp_size_of_all_table_indexes view
Column Description
soatioid The object ID of the table
soatisize The total size of all table indexes in bytes
soatischemaname The schema name
soatitablename The table name

gp_size_of_database

This view shows the total size of a database. This view is accessible to all users, however non-superusers will only be able to see databases that they have permission to access.

Table 28. gp_size_of_database view
Column Description
sodddatname The name of the database
sodddatsize The size of the database in bytes

gp_size_of_index

This view shows the total size of an index. This view is accessible to all users, however non-superusers will only be able to see relations that they have permission to access.

Table 29. gp_size_of_index view
Column Description
soioid The object ID of the index
soitableoid The object ID of the table to which the index belongs
soisize The size of the index in bytes
soiindexschemaname The name of the index schema
soiindexname The name of the index
soitableschemaname The name of the table schema
soitablename The name of the table

gp_size_of_partition_and_indexes_disk

This view shows the size on disk of partitioned child tables and their indexes. This view is accessible to all users, however non-superusers will only be able to see relations that they have permission to access..

Table 30. gp_size_of_partition_and_indexes_disk view
Column Description
sopaidparentoid The object ID of the parent table
sopaidpartitionoid The object ID of the partition table
sopaidpartitiontablesize The partition table size in bytes
sopaidpartitionindexessize The total size of all indexes on this partition
Sopaidparentschemaname The name of the parent schema
Sopaidparenttablename The name of the parent table
Sopaidpartitionschemaname The name of the partition schema
sopaidpartitiontablename The name of the partition table

gp_size_of_schema_disk

This view shows schema sizes for the public schema and the user-created schemas in the current database. This view is accessible to all users, however non-superusers will be able to see only the schemas that they have permission to access.

Table 31. gp_size_of_schema_disk view
Column Description
sosdnsp The name of the schema
sosdschematablesize The total size of tables in the schema in bytes
sosdschemaidxsize The total size of indexes in the schema in bytes

gp_size_of_table_and_indexes_disk

This view shows the size on disk of tables and their indexes. This view is accessible to all users, however non-superusers will only be able to see relations that they have permission to access.

Table 32. gp_size_of_table_and_indexes_disk view
Column Description
sotaidoid The object ID of the parent table
sotaidtablesize The disk size of the table
sotaididxsize The total size of all indexes on the table
sotaidschemaname The name of the schema
sotaidtablename The name of the table

gp_size_of_table_and_indexes_licensing

This view shows the total size of tables and their indexes for licensing purposes. The use of this view requires superuser permissions.

Table 33. gp_size_of_table_and_indexes_licensing view
Column Description
sotailoid The object ID of the table
sotailtablesizedisk The total disk size of the table
sotailtablesizeuncompressed If the table is a compressed append-optimized table, shows the uncompressed table size in bytes.
sotailindexessize The total size of all indexes in the table
sotailschemaname The schema name
sotailtablename The table name

gp_size_of_table_disk

This view shows the size of a table on disk. This view is accessible to all users, however non-superusers will only be able to see tables that they have permission to access

Table 34. gp_size_of_table_disk view
Column Description
sotdoid The object ID of the table
sotdsize The size of the table in bytes. The size is only the main table size. The size does not include auxiliary objects such as oversized (toast) attributes, or additional storage objects for AO tables.
sotdtoastsize The size of the TOAST table (oversized attribute storage), if there is one.
sotdadditionalsize Reflects the segment and block directory table sizes for append-optimized (AO) tables.
sotdschemaname The schema name
sotdtablename The table name

gp_size_of_table_uncompressed

This view shows the uncompressed table size for append-optimized (AO) tables. Otherwise, the table size on disk is shown. The use of this view requires superuser permissions.

Table 35. gp_size_of_table_uncompressed view
Column Description
sotuoid The object ID of the table
sotusize The uncomressed size of the table in bytes if it is a compressed AO table. Otherwise, the table size on disk.
sotuschemaname The schema name
sotutablename The table name

gp_disk_free

This external table runs the df (disk free) command on the active segment hosts and reports back the results. Inactive mirrors are not included in the calculation. The use of this external table requires superuser permissions.

Table 36. gp_disk_free external table
Column Description
dfsegment The content id of the segment (only active segments are shown)
dfhostname The hostname of the segment host
dfdevice The device name
dfspace Free disk space in the segment file system in kilobytes

Checking for Uneven Data Distribution

All tables in Greenplum Database are distributed, meaning their data is divided across all of the segments in the system. If the data is not distributed evenly, then query processing performance may suffer. The following views can help diagnose if a table has uneven data distribution:

gp_skew_coefficients

This view shows data distribution skew by calculating the coefficient of variation (CV) for the data stored on each segment. This view is accessible to all users, however non-superusers will only be able to see tables that they have permission to access

Table 37. gp_skew_coefficients view
Column Description
skcoid The object id of the table.
skcnamespace The namespace where the table is defined.
skcrelname The table name.
skccoeff The coefficient of variation (CV) is calculated as the standard deviation divided by the average. It takes into account both the average and variability around the average of a data series. The lower the value, the better. Higher values indicate greater data skew.

gp_skew_idle_fractions

This view shows data distribution skew by calculating the percentage of the system that is idle during a table scan, which is an indicator of processing data skew. This view is accessible to all users, however non-superusers will only be able to see tables that they have permission to access

Table 38. gp_skew_idle_fractions view
Column Description
sifoid The object id of the table.
sifnamespace The namespace where the table is defined.
sifrelname The table name.
siffraction The percentage of the system that is idle during a table scan, which is an indicator of uneven data distribution or query processing skew. For example, a value of 0.1 indicates 10% skew, a value of 0.5 indicates 50% skew, and so on. Tables that have more than 10% skew should have their distribution policies evaluated.