Pivotal Query Optimizer Features and Enhancements
Pivotal Query Optimizer Features and Enhancements
- Improved join ordering
- Join-Aggregate reordering
- Sort order optimization
- Data skew estimates included in query optimization
Queries Against Partitioned Tables
- Partition elimination is improved.
- Query plan can contain the Partition selector operator.
- Partitions are not enumerated in EXPLAIN plans. For queries that involve static partition selection where the partitioning key is compared to a constant, the Pivotal Query Optimizer lists the number of partitions to be scanned in the EXPLAIN output under the Partition Selector operator. This example Partition Selector operator shows the filter and number of partitions selected:
Partition Selector for Part_Table (dynamic scan id: 1) Filter: a > 10 Partitions selected: 1 (out of 3)
For queries that involve dynamic partition selection where the partitioning key is compared to a variable, the number of partitions that are scanned will be known only during query execution. The partitions selected are not shown in the EXPLAIN output.
- Plan size is independent of number of partitions.
- Out of memory errors caused by number of partitions are reduced.
This example CREATE TABLE command creates a range partitioned table.
CREATE TABLE sales(order_id int, item_id int, amount numeric(15,2), date date, yr_qtr int) range partitioned by yr_qtr;
Pivotal Query Optimizer improves on these types of queries against partitioned tables:
- Full table scan. Partitions are not enumerated in
SELECT * FROM sales;
- Query with a constant filter predicate. Partition elimination is
SELECT * FROM sales WHERE yr_qtr = 201201;
- Range selection. Partition elimination is
SELECT * FROM sales WHERE yr_qtr BETWEEN 201301 AND 201404 ;
- Joins involving partitioned tables. In this example, the partitioned dimension table
date_dim is joined with fact table
SELECT * FROM catalog_sales WHERE date_id IN (SELECT id FROM date_dim WHERE month=12);
Queries that Contain Subqueries
SELECT * FROM part WHERE price > (SELECT avg(price) FROM part);
SELECT * FROM part p1 WHERE price > (SELECT avg(price) FROM part p2 WHERE p2.brand = p1.brand);
- CSQ in the SELECT
SELECT *, (SELECT min(price) FROM part p2 WHERE p1.brand = p2.brand) AS foo FROM part p1;
- CSQ in disjunctive (OR) filters.
SELECT FROM part p1 WHERE p_size > 40 OR p_retailprice > (SELECT avg(p_retailprice) FROM part p2 WHERE p2.p_brand = p1.p_brand)
- Nested CSQ with skip level
SELECT * FROM part p1 WHERE p1.p_partkey IN (SELECT p_partkey FROM part p2 WHERE p2.p_retailprice = (SELECT min(p_retailprice) FROM part p3 WHERE p3.p_brand = p1.p_brand) );Note: Nested CSQ with skip level correlations are not supported by the legacy query optimizer.
- CSQ with aggregate and inequality. This example contains a CSQ with an
SELECT * FROM part p1 WHERE p1.p_retailprice = (SELECT min(p_retailprice) FROM part p2 WHERE p2.p_brand <> p1.p_brand);
- CSQ that must return one
SELECT p_partkey, (SELECT p_retailprice FROM part p2 WHERE p2.p_brand = p1.p_brand ) FROM part p1;
Queries that Contain Common Table Expressions
WITH v AS (SELECT a, sum(b) as s FROM T where c < 10 GROUP BY a) SELECT *FROM v AS v1 , v AS v2 WHERE v1.a <> v2.a AND v1.s < v2.s;
WITH v AS (SELECT a, sum(b) as s FROM T GROUP BY a) SELECT * FROM v as v1, v as v2, v as v3 WHERE v1.a < v2.a AND v1.s < v3.s AND v1.a = 10 AND v2.a = 20 AND v3.a = 30;
- CTE that defines one or multiple tables. In this query, the CTE defines two
WITH cte1 AS (SELECT a, sum(b) as s FROM T where c < 10 GROUP BY a), cte2 AS (SELECT a, s FROM cte1 where s > 1000) SELECT * FROM cte1 as v1, cte2 as v2, cte2 as v3 WHERE v1.a < v2.a AND v1.s < v3.s;
- Nested CTEs.
WITH v AS (WITH w AS (SELECT a, b FROM foo WHERE b < 5) SELECT w1.a, w2.b FROM w AS w1, w AS w2 WHERE w1.a = w2.a AND w1.a > 2) SELECT v1.a, v2.a, v2.b FROM v as v1, v as v2 WHERE v1.a < v2.a;
DML Operation Enhancements with Pivotal Query Optimizer
- A DML node in a query plan is a query plan operator.
- Can appear anywhere in the plan, as a regular node (top slice only for now)
- Can have consumers
UPDATE operations use the query plan operator
Split and supports these operations:
This example plan shows the Split operator.
- UPDATE operations on the table distribution key columns.
- UPDATE operations on the table on the partition key column.
QUERY PLAN -------------------------------------------------------------- Update (cost=0.00..5.46 rows=1 width=1) -> Redistribute Motion 2:2 (slice1; segments: 2) Hash Key: a -> Result (cost=0.00..3.23 rows=1 width=48) -> Split (cost=0.00..2.13 rows=1 width=40) -> Result (cost=0.00..1.05 rows=1 width=40) -> Table Scan on dmltest
- New query plan operator Assert is used for constraints
checking.This example plan shows the Assert operator.
QUERY PLAN ------------------------------------------------------------ Insert (cost=0.00..4.61 rows=3 width=8) -> Assert (cost=0.00..3.37 rows=3 width=24) Assert Cond: (dmlsource.a > 2) IS DISTINCT FROM false -> Assert (cost=0.00..2.25 rows=3 width=24) Assert Cond: NOT dmlsource.b IS NULL -> Result (cost=0.00..1.14 rows=3 width=24) -> Table Scan on dmlsource