About PXF Filter Pushdown
A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 6.x documentation.
PXF supports filter pushdown. When filter pushdown is enabled, the constraints from the WHERE
clause of a SELECT
query can be extracted and passed to the external data source for filtering. This process can improve query performance, and can also reduce the amount of data that is transferred to Greenplum Database.
You enable or disable filter pushdown for all external table protocols, including pxf
, by setting the gp_external_enable_filter_pushdown
server configuration parameter. The default value of this configuration parameter is on
; set it to off
to disable filter pushdown. For example:
SHOW gp_external_enable_filter_pushdown;
SET gp_external_enable_filter_pushdown TO 'on';
Note: Some external data sources do not support filter pushdown. Also, filter pushdown may not be supported with certain data types or operators. If a query accesses a data source that does not support filter push-down for the query constraints, the query is instead executed without filter pushdown (the data is filtered after it is transferred to Greenplum Database).
PXF accesses data sources using different connectors, and filter pushdown support is determined by the specific connector implementation. The following PXF connectors support filter pushdown:
- Hive Connector, all profiles
- HBase Connector
- JDBC Connector
- S3 Connector using the Amazon S3 Select service to access CSV and Parquet data
PXF filter pushdown can be used with these data types (connector-specific):
INT2
,INT4
,INT8
CHAR
,TEXT
FLOAT
NUMERIC
(not available with the S3 connector when using S3 Select)BOOL
DATE
,TIMESTAMP
(available only with the JDBC connector and the S3 connector when using S3 Select)
You can use PXF filter pushdown with these operators:
<
,<=
,>=
,>
<>
,=
AND
,OR
,NOT
IN
operator on arrays ofINT
andTEXT
(JDBC connector only)LIKE
(TEXT
fields, JDBC connector only)
To summarize, all of the following criteria must be met for filter pushdown to occur:
- You enable external table filter pushdown by setting the
gp_external_enable_filter_pushdown
server configuration parameter to'on'
. - The Greenplum Database protocol that you use to access external data source must support filter pushdown. The
pxf
external table protocol supports pushdown. - The external data source that you are accessing must support pushdown. For example, HBase and Hive support pushdown.
For queries on external tables that you create with the
pxf
protocol, the underlying PXF connector must also support filter pushdown. For example, the PXF Hive, HBase, and JDBC connectors support pushdown.- Refer to Hive Partition Filter Pushdown for more information about Hive support for this feature.