By: Aaron Bertrand | Updated: 2023-11-08 | Comments (1) | Related: > TSQL
Problem
I've written about UNPIVOT
before –
see
Use SQL Server's UNPIVOT operator to help normalize output and
Use SQL Server's UNPIVOT operator to dynamically normalize output. It's
a powerful language feature that lets you flip results sideways, sort of like the
opposite of PIVOT
. This is great if you have columns
like Phone1
and Phone2
but want to collapse them into one column. One of the challenges is that if you
have other columns you also want to collapse similarly, like
Email1
and Email2
, you
need to add an additional UNPIVOT
operator or use
a different approach. Is there a way to do this in a less complicated way?
Solution
A heavily underused feature in SQL Server is CROSS APPLY
.
One of the more common uses for CROSS APPLY
is to
simulate a correlated subquery, for example, to get the latest answer from an author's
posts. For example:
-- source: www.mssqltips.com SELECT a.AuthorID, a.AuthorName, p.PostID FROM dbo.Authors AS a CROSS APPLY ( SELECT PostID = MAX(PostID) FROM dbo.Posts WHERE AuthorID = a.AuthorID ) AS p;
A less common use is to build row constructors. Let's say we have a denormalized table that stores a user's three favorite colors:
-- source: www.mssqltips.com CREATE TABLE dbo.UserColors ( UserID int PRIMARY KEY, FavoriteColor1 varchar(12) , FavoriteColor2 varchar(12) , FavoriteColor3 varchar(12) );
Selecting a user's favorite colors is simple:
-- source: www.mssqltips.com SELECT UserID, FavoriteColor1, FavoriteColor2, FavoriteColor3 FROM dbo.UserColors WHERE UserID = 1;
If the requirement is to show a user's favorite colors in a single column rather than a single row, then it gets a little more complicated:
One long-standing method to do this is to use UNION ALL
:
-- source: www.mssqltips.com SELECT UserID, FavoriteColor = FavoriteColor1 FROM dbo.UserColors WHERE UserID = 1 UNION ALL SELECT UserID, FavoriteColor = FavoriteColor2 FROM dbo.UserColors WHERE UserID = 1 UNION ALL SELECT UserID, FavoriteColor = FavoriteColor3 FROM dbo.UserColors WHERE UserID = 1;
Which works, of course:
But it is a lot of redundant code, and it won't be the most efficient method in many cases. In this case, it will require three clustered index seeks, which isn't too bad, but picture cases where the filters are more complicated, aren't supported by an index, or both.
To do this with UNPIVOT
, we can make a single pass
at the data. One thing I don't like about this is that you must come up with
a virtual column name to make the syntax work, e.g., in the following example,
FavoriteColors
came out of nowhere:
-- source: www.mssqltips.com SELECT favs.UserID, favs.FavoriteColor FROM dbo.UserColors AS uc UNPIVOT ( FavoriteColor FOR FavoriteColors -- fake column! IN (uc.FavoriteColor1, uc.FavoriteColor2, uc.FavoriteColor3) ) AS favs WHERE UserID = 1;
It's also not intuitive why UserID
comes
from favs
and not uc
.
I won't get into the quirkiness of the syntax here, but here are the results:
Here is the plan that shows a single table access:
CROSS APPLY
can help in a similar way – still
making only a single pass of the data but with fewer quirky syntax elements:
-- source: www.mssqltips.com SELECT u.UserID, favs.FavoriteColor FROM dbo.UserColors AS u CROSS APPLY ( VALUES(u.FavoriteColor1), (u.FavoriteColor2), (u.FavoriteColor3) ) AS favs(FavoriteColor) WHERE u.UserID = 1;
The row constructor essentially treats the three referenced columns as a union and, like UNPIVOT, allows us to flip those columns into rows. Here are the results:
And here is the (slightly simpler) plan:
As suggested earlier, things get more interesting if you have multiple sets of columns to unpivot. Let's say the table is really denormalized and contains the user's favorite songs, too.
-- source: www.mssqltips.com CREATE TABLE dbo.UserFavorites ( UserID int NOT NULL, FavoriteColor1 varchar(12), FavoriteColor2 varchar(12), FavoriteColor3 varchar(12), FavoriteSong1 varchar(64), FavoriteArtist1 varchar(64), FavoriteSong2 varchar(64), FavoriteArtist2 varchar(64), FavoriteSong3 varchar(64), FavoriteArtist3 varchar(64), CONSTRAINT PK_UC PRIMARY KEY (UserID) ); -- let's just insert one row, to keep it simple INSERT dbo.UserFavorites VALUES (1, 'green', 'blue', 'black', 'Heartache Tonight', 'The Eagles', 'Comfortably Numb', 'Pink Floyd', 'Bobcaygeon', 'The Tragically Hip' );
Now, if I want a result like this:
I can still use a UNION
:
-- source: www.mssqltips.com SELECT UserID, FavoriteColor = FavoriteColor1, FavoriteSong = FavoriteSong1, FavoriteArtist = FavoriteArtist1 FROM dbo.UserColors WHERE UserID = 1 UNION ALL SELECT UserID, FavoriteColor = FavoriteColor2, FavoriteSong = FavoriteSong2, FavoriteArtist = FavoriteArtist2 FROM dbo.UserColors WHERE UserID = 1 UNION ALL SELECT UserID, FavoriteColor = FavoriteColor3, FavoriteSong = FavoriteSong3, FavoriteArtist = FavoriteArtist3 FROM dbo.UserColors WHERE UserID = 1;
The results are as expected, and the plan looks like the previous union variation. But you can see how, as the query gets more complicated, it's harder to maintain and prone to human error.
An UNPIVOT, you would think, would have to look like this:
-- source: www.mssqltips.com SELECT UserID, FavoriteColor, FavoriteSong, FavoriteArtist FROM UserFavorites AS uf UNPIVOT ( FavoriteColor FOR FavoriteColors IN (FavoriteColor1, FavoriteColor2, FavoriteColor3) ) AS favColors UNPIVOT ( FavoriteSong FOR FavoriteSongs IN (FavoriteSong1, FavoriteSong2, FavoriteSong3) ) AS favSongs UNPIVOT ( FavoriteArtist FOR FavoriteArtists IN (FavoriteArtist1, FavoriteArtist2, FavoriteArtist3) ) AS favArtists WHERE UserID = 1;
That's ugly, too; worse, it produces a '"cartesian unpivot!'"
One way to resolve this is to add these clauses, which take advantage of the
convenient fact that we want Song1
and
Artist1
to be related, for example. But that's
even more gross:
AND RIGHT(FavoriteColors,1) = RIGHT(FavoriteSongs, 1) AND RIGHT(FavoriteSongs, 1) = RIGHT(FavoriteArtists,1);
And the plan isn't great, either:
CROSS APPLY
, on the other hand, gives us a little
more flexibility here:
-- source: www.mssqltips.com SELECT uf.UserID, favs.FavoriteColor, favs.FavoriteSong, favs.FavoriteArtist FROM dbo.UserFavorites AS uf CROSS APPLY ( VALUES(FavoriteColor1,FavoriteSong1,FavoriteArtist1), (FavoriteColor2,FavoriteSong2,FavoriteArtist2), (FavoriteColor3,FavoriteSong3,FavoriteArtist3) ) AS favs (FavoriteColor, FavoriteSong, FavoriteArtist) WHERE UserID = 1;
The syntax is way simpler, and the plan is a thing of beauty, too:
I know which syntax I'll be using from now on.
Next Steps
Look at your codebase and see if there are places where you're compensating
for bad normalization by using UNION
. There may be
cases where you can improve the performance and maintainability of your code by
using CROSS APPLY
(or OUTER
APPLY
) instead.
See these tips and other resources:
- Use SQL Server's UNPIVOT operator to help normalize output
- Use SQL Server's UNPIVOT operator to dynamically normalize output
- SQL Server CROSS APPLY and OUTER APPLY
- Return TOP (N) Rows using APPLY or ROW_NUMBER() in SQL Server
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2023-11-08