Managing Spill Files Generated by Queries

Managing Spill Files Generated by Queries

Greenplum Database creates spill files, also known as workfiles, on disk if it does not have sufficient memory to execute an SQL query in memory.
The maximum number of spill files for a given query is governed by the gp_workfile_limit_files_per_query server configuration parameter setting. The default value of 100,000 spill files is sufficient for the majority of queries.

If a query creates more than the configured number of spill files, Greenplum Database returns this error:

ERROR: number of workfiles per query limit exceeded
Greenplum Database may generate a large number of spill files when:
  • Data skew is present in the queried data. To check for data skew, see Checking for Data Distribution Skew.
  • The amount of memory allocated for the query is too low. You control the maximum amount of memory that can be used by a query with the Greenplum Database server configuration parameters max_statement_mem and statement_mem, or through resource group or resource queue configuration.

You might be able to run the query successfully by changing the query, changing the data distribution, or changing the system memory configuration. The gp_toolkit gp_workfile_* views display spill file usage information. You can use this information to troubleshoot and tune queries. The gp_workfile_* views are described in Checking Query Disk Spill Space Usage.

Additional documentation resources: