Revisit of correlated subquery as a select list item


By:
Overview

This section revisits the earlier overage of uncorrelated and correlated subqueries to provide additional perspectives.

Revisit of correlated subquery as a select list item

The introduction to Subqueries section included coverage of how to use uncorrelated and correlated subqueries as select list items.  This section revisits that earlier application of uncorrelated and correlated subqueries from three perspectives.

  • First, this section reviews the initial application and presents an alternative approach based on joins instead of subqueries.
  • Second, the use of the intersect operator is demonstrated to confirm that the subquery-based and join-based approaches return identical results.
  • Third, execution plans for the subquery-based and join-based approaches are contrasted with one another.

Here’s a previously presented code block that demonstrates the use of an uncorrelated subquery and a correlated subquery as items in the select list of a query.

  • The first subquery with an alias of total_sub_cat_count is an uncorrelated subquery that returns the count of subcategory values in the join of the ProductCategory and ProductSubcategory tables from the AdventureWorks2014 database. This initial subquery returns the same value for each row of the outer query.
  • The other subquery labeled sub_cat_count is a correlated subquery; the return value of this second query depends on the category name value of the outer query.  The subquery computes the count of subcategory values within the category for the current row of the outer query.  Because there are four product category names in the AdventureWorks2014 database, the correlated subquery returns one of four count values.
-- select list items with uncorrelated and correlated subqueries
-- select list items with uncorrelated and correlated subqueries
SELECT
  outer_cat.ProductCategoryID cat_id,
  outer_cat.Name outer_cat_name,
  subcat.Name subcat_name,
  (SELECT
     COUNT(ProductSubcategoryID) subcat_id_count
   FROM [AdventureWorks2014].[Production].[ProductCategory] outer_cat
   INNER JOIN [AdventureWorks2014].[Production].[ProductSubcategory] subcat
     ON outer_cat.ProductCategoryID = subcat.ProductCategoryID
  ) total_sub_cat_count,
  (SELECT
     COUNT(ProductSubcategoryID) subcat_id_count
   FROM [AdventureWorks2014].[Production].[ProductCategory] cat
   INNER JOIN [AdventureWorks2014].[Production].[ProductSubcategory] subcat
     ON cat.ProductCategoryID = subcat.ProductCategoryID
   GROUP BY cat.name
   HAVING cat.name = outer_cat.Name
  ) sub_cat_count
FROM [AdventureWorks2014].[Production].[ProductCategory] outer_cat
INNER JOIN [AdventureWorks2014].[Production].[ProductSubcategory] subcat
  ON outer_cat.ProductCategoryID = subcat.ProductCategoryID

It is often possible to reformulate T-SQL queries based on subqueries with joins and where clause criteria.  This redesign of a query can eliminate the need for correlated subqueries.  Some believe that this reformulation can improve the performance of a query.  A good presentation of the issues for and against replacing correlated subqueries with joins appears in this blog (be sure to read comments too).  To help evaluate the impact of replacing subqueries with joins, the preceding script sample is re-designed to use joins instead of subqueries.  See the next code block for a join-based implementation of the preceding query.

  • The first three columns of the query below are from the join of the ProductCategory and ProductSubcategory tables.  This code is identical to the preceding query.
  • A cross join adds the count of subcategories across categories.  When you need to add an identical set values to all rows in another query, then a cross join is a good tool to use.  In the example below, the result set from the subcat_id_count subquery is added to the rowset from the join of the ProductCategory and ProductSubcategory tables.  The subcat_id_count subquery is used as a derived table.
  • A left join adds the result set from the  cat_id_count_by_cat_id subquery.  The left join through its on clause allows you to specify which result set values from a derived table to apply to which values from the join of the ProductCategory and ProductSubcategory tables.
-- with joins and derived tables
SELECT
  cat.ProductCategoryID cat_id,
  cat.Name cat_name,
  subcat.Name subcat_name,
  subcat_id_count total_sub_cat_count,
  cat_id_count_by_cat_id.cat_id_count_by_cat_id sub_cat_count
FROM [AdventureWorks2014].[Production].[ProductCategory] cat
INNER JOIN [AdventureWorks2014].[Production].[ProductSubcategory] subcat
  ON cat.ProductCategoryID = subcat.ProductCategoryID
CROSS JOIN 
   (-- count of subcategories across categories
    SELECT
      COUNT(ProductSubcategoryID) subcat_id_count
    FROM [AdventureWorks2014].[Production].[ProductSubcategory] subcat
   ) subcat_id_count
LEFT JOIN 
  (-- count of subcategories within categories
   SELECT
      ProductCategoryID,
      COUNT(*) cat_id_count_by_cat_id
   FROM [AdventureWorks2014].[Production].[ProductSubcategory] subcat
   GROUP BY ProductCategoryID
  ) cat_id_count_by_cat_id
    ON cat.ProductCategoryID = cat_id_count_by_cat_id.ProductCategoryID

Here are excerpts from the result set with uncorrelated subqueries and correlated subqueries versus the result set for the join-based approach with derived tables. 

  • The first eleven rows from the subquery approach appear in the top pane.
  • Also, the first eleven rows from the join-based approach with derived tables appear in the second pane.
  • Each of the two result sets contains thirty-seven rows, but the order of the rows is different, so it is not easy to determine by visual inspection if the two result sets are the same.
Temporary Data Store Tutorial Part 4_fig01

The intersect operator is a useful tool for comparing two result sets no matter what the order of rows is in each one.  An intersect operator between the result sets for the two approaches returns thirty-seven rows.  In other words, the thirty-seven-row result set from each query is identical.  The following code block shows how to use the intersect operator to make the comparison.

-- all 37 rows from each query intersect with one another
-- select list items with uncorrelated and correlated subqueries
SELECT
  outer_cat.ProductCategoryID cat_id,
  outer_cat.Name outer_cat_name,
  subcat.Name subcat_name,
  (SELECT
     COUNT(ProductSubcategoryID) subcat_id_count
   FROM [AdventureWorks2014].[Production].[ProductCategory] outer_cat
   INNER JOIN [AdventureWorks2014].[Production].[ProductSubcategory] subcat
     ON outer_cat.ProductCategoryID = subcat.ProductCategoryID
  ) total_sub_cat_count,
  (SELECT
     COUNT(ProductSubcategoryID) subcat_id_count
   FROM [AdventureWorks2014].[Production].[ProductCategory] cat
   INNER JOIN [AdventureWorks2014].[Production].[ProductSubcategory] subcat
     ON cat.ProductCategoryID = subcat.ProductCategoryID
   GROUP BY cat.name
   HAVING cat.name = outer_cat.Name
  ) sub_cat_count
FROM [AdventureWorks2014].[Production].[ProductCategory] outer_cat
INNER JOIN [AdventureWorks2014].[Production].[ProductSubcategory] subcat
  ON outer_cat.ProductCategoryID = subcat.ProductCategoryID

INTERSECT

-- with joins and derived tables
SELECT
  cat.ProductCategoryID cat_id,
  cat.Name cat_name,
  subcat.Name subcat_name,
  subcat_id_count total_sub_cat_count,
  cat_id_count_by_cat_id.cat_id_count_by_cat_id sub_cat_count
FROM [AdventureWorks2014].[Production].[ProductCategory] cat
INNER JOIN [AdventureWorks2014].[Production].[ProductSubcategory] subcat
  ON cat.ProductCategoryID = subcat.ProductCategoryID
CROSS JOIN 
   (-- count of subcategories
    SELECT
      COUNT(ProductSubcategoryID) subcat_id_count
    FROM [AdventureWorks2014].[Production].[ProductSubcategory] subcat
   ) subcat_id_count
LEFT JOIN 
  (-- count of subcategories within categories
   SELECT
     ProductCategoryID,
     COUNT(*) cat_id_count_by_cat_id
   FROM [AdventureWorks2014].[Production].[ProductSubcategory] subcat
   GROUP BY ProductCategoryID
  ) cat_id_count_by_cat_id
    ON cat.ProductCategoryID = cat_id_count_by_cat_id.ProductCategoryID

The next screen shot displays the thirty-seven rows returned by the intersect operator between the result sets for the two approaches.

  • The intersect operator returns a result set which confirms that all thirty-seven rows from each formulation are the same.
  • The order of the rows is different, but the intersect operator programmatically matches rows with corresponding column values.
  • This correspondence confirms that the subquery-based formulation and the join-based formulation produce identical results.
Temporary Data Store Tutorial Part 4_fig02

The next screen shot shows the execution plan comparison of the subquery-based and join-based queries.

  • The subquery-based execution plan appears above the join-based execution plan.
  • The first point to observe is that both query plans have the same execution cost.  The two costs are highlighted in the screen shot below.
    • In other words, for these two query designs there is no difference in execution cost between the two alternative approaches.  Both queries take fifty percent of the relative cost.
    • You may be able to discover other subquery-based execution plans with greater costs than join-based execution plans for different source data or slightly different query designs, but for the alternative approaches reviewed in this tutorial section there is no difference.
    • In any event, you should consider comparing execution plans whenever there is a concern that one query approach takes longer to run than another approach for the same source data.
  • As you can see, the arrangement of operations within each execution plan do vary.  Therefore, SQL Server did not perform identical steps to obtain the result set for each query design.
Temporary Data Store Tutorial Part 4_fig03





Comments For This Article

















get free sql tips
agree to terms