By: Fikrat Azizov | Updated: 2018-04-25 | Comments | Related: > Azure SQL Database
Problem
Indexes play an important role in improving performance of queries, by allowing SQL Server to scan a limited number of rows. When queries involve tables from the same database, SQL Server can choose the best indexes available. Can SQL Server take advantage of indexes when querying multiple databases? In this tip, I'll explore the performance of cross-database queries in Azure.
Solution
Provisioning test environment
For the purpose of this exercise, I'll provision two Azure SQL databases on the same Azure logical server. To provision the first database I've downloaded and created a database from the sample Contoso data warehouse from here. Then I deployed the Contoso database to Azure using 'Deploy Database to SQL Azure Wizard', described in this tip. Finally, I created a copy of the Azure SQL database. I named the Azure databases ContosoRetailDW and ContosoRetailDW2 and assigned them the 'S1 Standard' price tier.
Through this tip I'll refer to the ContosoRetailDW database as the main database and ContosoRetailDW2 as the external database.
Using the below script, I've created an Extended Event to capture query text and execution plans from the external database:
CREATE EVENT SESSION [SqlText_ExecPlanExt] ON DATABASE ADD EVENT sqlserver.query_post_execution_showplan( ACTION(sqlserver.database_name,sqlserver.sql_text,sqlserver.tsql_stack) WHERE ([sqlserver].[like_i_sql_unicode_string]([sqlserver].[sql_text],N'%FactOnlineSales%'))), ADD EVENT sqlserver.sql_statement_completed( ACTION(sqlserver.database_name,sqlserver.sql_text,sqlserver.tsql_stack) WHERE ([sqlserver].[like_i_sql_unicode_string]([sqlserver].[sql_text],N'%FactOnlineSales%'))) ADD TARGET package0.ring_buffer WITH (STARTUP_STATE=OFF) GO ALTER EVENT SESSION [SqlText_ExecPlanExt] ON DATABASE STATE = START GO
The output from this trace will be stored in memory (ring buffer) in XML format and I'll use the following query to extract the query text and execution plans from the XML data:
DECLARE @extevents XML SELECT @extevents = CAST([t].[target_data] AS XML) FROM sys.[dm_xe_database_sessions] AS s JOIN [sys].[dm_xe_database_session_targets] AS t ON [t].[event_session_address] = [s].[address] WHERE [s].[name] = N'SqlText_ExecPlanExt' AND [t].[target_name] = N'ring_buffer' ; SELECT DATEADD(hh, DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), n.value('(@timestamp)[1]', 'datetime2')) AS [timestamp], n.value('(data[@name="duration"]/value)[1]', 'bigint') as duration, n.value('(action[@name="sql_text"]/value)[1]', 'varchar(max)') as sql_text, replace(replace(cast(n.query('(data[@name="showplan_xml"]/value/.)[1]') as varchar(max)),'<value>','<?xml version="1.0" encoding="utf-16"?>'),'</value>','') as exec_plan2 FROM @extevents.nodes('RingBufferTarget/event[@name=''query_post_execution_showplan'']') AS q(n)
Performance test exploring local queries
Let's start with a simple query running locally within the main database:
--Q1 SELECT [StoreKey],[SalesAmount] FROM [dbo].[FactSales] WHERE storeKey = 223
This query took around 9 seconds and if we check the execution plan, we'll notice SQL has scanned the entire clustered index to get the results.
Now, let's create an index including the required fields and re-run the query again:
CREATE INDEX IDX_FactOnlineSales ON [dbo].[FactOnlineSales] (StoreKey,[SalesAmount])
As expected, the new execution was much faster (1 ms) with the following plan:
Let's make the query more complex by joining a table:
--Q2 SELECT F.[StoreKey],F.[SalesAmount],S.StoreDescription FROM [dbo].[FactSales] F JOIN DimStore S ON F.StoreKey=S.StoreKey WHERE F.StoreKey = 223
The new query is still fast (49 ms) and is using the index we created:
Finally, let's try an aggregate query:
--Q3 SELECT S.[StoreKey], SUM(F.[SalesAmount]) FROM [dbo].[FactSales] F JOIN DimStore S ON F.StoreKey=S.StoreKey GROUP BY S.StoreKey
It took 10906 ms and produced the following plan:
Elastic queries
It's time to investigate how Azure SQL would behave when the query involves external tables. To refer to tables in other Azure SQL databases we need to use the Azure elastic queries feature. This feature allows you to split your data with vertical or horizontal partitions (sharding) (see more about it here).
I'll focus on the vertical partitions feature, for the purpose of this exercise.
Let's create a master key, credential, data source and external table using the below script inside main database:
CREATE MASTER KEY ; CREATE DATABASE SCOPED credential ext_cred with identity ='fazizov' ,secret='Password201)'; CREATE EXTERNAL DATA SOURCE ExtDS WITH ( TYPE=RDBMS, LOCATION='fasqlsrv.database.windows.net', DATABASE_NAME='ContosoRetailDW2', CREDENTIAL= ext_cred); CREATE EXTERNAL TABLE FactSales_Ext ( [SalesKey] [int] NOT NULL, [DateKey] [datetime] NOT NULL, [channelKey] [int] NOT NULL, [StoreKey] [int] NOT NULL, [ProductKey] [int] NOT NULL, [PromotionKey] [int] NOT NULL, [CurrencyKey] [int] NOT NULL, [UnitCost] [money] NOT NULL, [UnitPrice] [money] NOT NULL, [SalesQuantity] [int] NOT NULL, [ReturnQuantity] [int] NOT NULL, [ReturnAmount] [money] NULL, [DiscountQuantity] [int] NULL, [DiscountAmount] [money] NULL, [TotalCost] [money] NOT NULL, [SalesAmount] [money] NOT NULL, [ETLLoadID] [int] NULL, [LoadDate] [datetime] NULL, [UpdateDate] [datetime] NULL) WITH ( DATA_SOURCE = ExtDS, SCHEMA_NAME='dbo', OBJECT_NAME = N'FactSales'); GO
Please note, this external table points to the same FactSales table located in the external database.
Since we've already seen benefit of the index, let's create the index IDX_FactOnlineSales inside the external database as well:
CREATE INDEX IDX_FactOnlineSales ON [dbo].[FactOnlineSales] (StoreKey,[SalesAmount])
Now, let's query the external table using a query similar to the one we've used earlier (Q1) and execute it inside the main database:
SELECT [StoreKey],[SalesAmount] FROM [dbo].[FactSales_Ext] WHERE storeKey = 223
Although we were are querying same table with the same index, it took much longer (74 ms), when called from the external database, which could be explained by overhead related to the bridge between the databases.
Examining the execution plan from main database doesn't allow us to confirm whether or not the index has been used. We can see this is shows as a Remote Query:
Let's switch to external database and use the Extended Events related to the query from above to get the SQL queries and execution plans from the ring buffer. Here's the query text and execution plan that we find in the Extended Events:
SELECT [T1_1].[StoreKey] AS [StoreKey],[T1_1].[SalesAmount] AS [SalesAmount] FROM (SELECT [T2_1].[StoreKey] AS [StoreKey],[T2_1].[SalesAmount] AS [SalesAmount] FROM [dbo].[FactSales] AS T2_1 WHERE ([T2_1].[StoreKey] = CAST ((223) AS INT))) AS T1_1
As you can see, the SQL has parameterized the filter value and used the index.
Now, let's run a cross-database version of the join query we've ran earlier (Q2):
SELECT F.[StoreKey],F.[SalesAmount],S.StoreDescription FROM [dbo].[FactSales_ext] F JOIN DimStore S ON F.StoreKey=S.StoreKey WHERE S.storeKey =223
It took 81 ms with the following execution plan from the main and external databases accordingly:
Although our filtering parameter was linked to the main table, amazingly SQL was smart enough to push the parameter to the external table. Here is the query captured from external database:
SELECT [T1_1].[StoreKey] AS [StoreKey],[T1_1].[SalesAmount] AS [SalesAmount] FROM (SELECT [T2_1].[StoreKey] AS [StoreKey],[T2_1].[SalesAmount] AS [SalesAmount] FROM [dbo].[FactSales] AS T2_1 WHERE ([T2_1].[StoreKey] = CAST ((223) AS INT))) AS T1_1
Encouraged by this result, I ran a cross-database version of the aggregate query from above (Q3):
SELECT S.storeKey,SUM(F.[SalesAmount]) FROM [dbo].[FactSales_ext] F JOIN DimStore S ON F.StoreKey=S.StoreKey GROUP BY S.storeKey
The execution time was the same as its local equivalent and as you already guessed, it was also fast because SQL intelligently re-shaped the external database related part of it as follows:
SELECT [T1_1].[StoreKey] AS [StoreKey],[T1_1].[col] AS [col] FROM (SELECT SUM([T2_1].[col]) AS [col],[T2_1].[StoreKey] AS [StoreKey] FROM (SELECT SUM([T3_1].[SalesAmount]) AS [col], [T3_1].[StoreKey] AS [StoreKey] FROM [dbo].[FactSales] AS T3_1 GROUP BY [T3_1].[StoreKey]) AS T2_1 GROUP BY [T2_1].[StoreKey]) AS T1_1
To confirm the index usage, let's examine execution plan from external database:
Conclusion
Azure SQL Database can intelligently optimize performance of cross-database queries. However, care must be taken when allocating tables into different Azure databases, as in some cases overhead of merging data from different databases could be significant.
Next Steps
- Read more about Azure SQL Database from Azure SQL Database documentation
- Read more about elastic queries here
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2018-04-25