By: Artemakis Artemiou | Updated: 2017-08-10 | Comments (2) | Related: > In Memory OLTP
Problem
SQL Server In-Memory OLTP is a very powerful feature which is fully integrated into SQL Server's database engine. In previous tips we saw ways of assessing the SQL Server In-Memory OLTP performance benefits as well as a simple experiment where In-Memory OLTP processing was 79 times faster over disk-based. With this tip, we start a series of tips which will be featuring different examples of disk-based workloads, which could be migrated to Memory-Optimized processing and achieve performance improvements.
In this tip we will talk about a simple ETL process, see how we can migrate it to In-Memory OLTP processing in SQL Server, and check the performance improvement.
Solution
The scenario of the ETL process in this example is the following:
A source SQL Server instance, has one disk-based table named “dSrcSampleCustomers” that has 10M records. The contents of this table are transferred into two tables on another destination SQL Server instance (step 1). One of the tables on the destination SQL Server instance is disk-based and the other one is memory-optimized. After the data is fetched from the source instance, an INSERT INTO statement runs against each table and based on some WHERE conditions, it stores the results in a new disk-based and memory-optimized tables (step 2).
All the above execution times are being measured and we examine how much faster the process runs when using memory-optimized tables on the destination SQL Server instance. Note that the source SQL Server instance's table is disk-based. We don't modify anything on the source SQL Server instance, but rather we focus on optimizing the destination SQL Server tables using In-Memory OLTP.
The below diagram illustrates the ETL process:
Table Sizes
In this example, the table's size at the source instance is:
- dSrcSampleCustomers: 10M records
Similarly, since the data will be imported from the source to the destination SQL Server instance, the table sizes on the destination SQL Server instance will be:
- dSampleCustomers (disk-based): 10M records
- mSampleCustomers (memory-optimized): 10M records
Moreover, on the destination SQL Server instance, there are another two tables which will host the results of the queries that will be executed against the previous two tables:
- dProcessResult (disk-based)
- mProcessResult (memory-optimized)
Sample Database
In this tip, I'm using an In-Memory OLTP-enabled sample database which I created
and named "IMOLTPSampleDB".
Linked Server
The below screenshot shows the linked server used in this tip:
Disk-Based Table Definitions
-- --Disk-Based Tables -- USE [IMOLTPSampleDB] GO --Disk-Based Table - Data CREATE TABLE [dbo].[dSampleCustomers]( [cusID] [INT] NOT NULL PRIMARY KEY, [firstName] [VARCHAR](100) NOT NULL, [lastName] [VARCHAR](100) NOT NULL, [address1] [VARCHAR](200) NOT NULL, [address2] [VARCHAR](200) NULL, [countryCode] [CHAR](3) NOT NULL, [regDate] [DATE] NOT NULL, [discount] [FLOAT] NULL ); GO --Disk-Based Table - Process Result CREATE TABLE [dbo].[dProcessResult] ( [cusID] [INT] NOT NULL PRIMARY KEY, [firstName] [VARCHAR](100), [lastName] [VARCHAR](100), [address1] [VARCHAR](200), [address2] [VARCHAR](200), [countryCode] [CHAR](3), [regDate] [DATE] NOT NULL, [discount] [FLOAT] NULL ); GO
Memory-Optimized Table Definitions
-- --Memory-Optimized Tables -- USE [IMOLTPSampleDB] GO --Memory-Optimized Table - Data CREATE TABLE [dbo].[mSampleCustomers] ( [cusID] [INT] NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH ( BUCKET_COUNT = 10000000), [firstName] [VARCHAR](100) NOT NULL, [lastName] [VARCHAR](100) NOT NULL, [address1] [VARCHAR](200) NOT NULL, [address2] [VARCHAR](200) NULL, [countryCode] [CHAR](3) NOT NULL, [regDate] [DATE] NOT NULL, [discount] [FLOAT] NULL )WITH (MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_AND_DATA); GO --Memory-Optimized Table - Process Result CREATE TABLE [dbo].[mProcessResult] ( [cusID] [INT] NOT NULL PRIMARY KEY NONCLUSTERED, [firstName] [VARCHAR](100), [lastName] [VARCHAR](100), [address1] [VARCHAR](200), [address2] [VARCHAR](200), [countryCode] [CHAR](3), [regDate] [DATE] NOT NULL, [discount] [FLOAT] NULL )WITH (MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_AND_DATA); GO
As you can see above, in this tip's example, the memory-optimized tables are durable, that is why I have set DURABILITY = SCHEMA_AND_DATA.
Note that besides the primary keys in each table, for this example, we are not investigating the behavior of the process when using different indexes as this not within the scope of this tip.
Below you can find the T-SQL script that fetches the data from the linked server into the disk-based and memory-optimized tables on the destination SQL Server instance and measures the execution times.
Fetch Data from Linked Server Into Disk-Based and Memory-Optimized Tables
USE IMOLTPSampleDB GO --Initialize tables TRUNCATE TABLE dbo.[dSampleCustomers] GO DELETE dbo.[mSampleCustomers] GO --Initialize DECLARE @starttime DATETIME2; DECLARE @timeDiff INT; --Clear buffers CHECKPOINT; DBCC DROPCLEANBUFFERS; --Fetch data into disk-based table SET @starttime=SYSDATETIME(); INSERT INTO dbo.[dSampleCustomers] SELECT * FROM [SQLSourceServer].[SampleDB].dbo.dSrcSampleCustomers; SET @timeDiff = datediff(ms, @starttime, sysdatetime()); PRINT 'Total Time to Fetch 10M Records to Disk-Based Table: ' + CAST(@timeDiff as VARCHAR(10)) + ' ms'; --Clear buffers CHECKPOINT; DBCC DROPCLEANBUFFERS; --Fetch data into memory-optimized table SET @starttime=SYSDATETIME(); INSERT INTO dbo.[mSampleCustomers] SELECT * FROM [SQLSourceServer].[SampleDB].dbo.dSrcSampleCustomers; SET @timeDiff = datediff(ms, @starttime, sysdatetime()); PRINT 'Total Time to Fetch 10M Records to Memory-Optimized Table: ' + CAST(@timeDiff as VARCHAR(10)) + ' ms';
Below you can find the T-SQL scripts for the corresponding processes for each execution mode:
Disk-Based Table Processing
USE IMOLTPSampleDB; GO TRUNCATE TABLE [dbo].[dProcessResult]; GO CHECKPOINT; DBCC DROPCLEANBUFFERS; GO DECLARE @starttime DATETIME2; DECLARE @timeDiff INT; --Process Disk-Based Table SET @starttime = SYSDATETIME(); INSERT INTO [dbo].[dProcessResult] SELECT cusID , firstName , lastName , address1 , address2 , countryCode , regDate , discount FROM dbo.dSampleCustomers WHERE DATEDIFF(YEAR, regDate, GETDATE()) > 10 AND discount < 20; SET @timeDiff = DATEDIFF(ms, @starttime, SYSDATETIME()); PRINT 'Total Time for Disk-Based Table: ' + CAST(@timeDiff AS VARCHAR(10)) + ' ms';
Memory-Optimized Table Processing
USE IMOLTPSampleDB; GO DELETE [dbo].[mProcessResult]; GO CHECKPOINT; DBCC DROPCLEANBUFFERS; GO DECLARE @starttime DATETIME2; DECLARE @timeDiff INT; --Process Memory-Optimized Table SET @starttime = SYSDATETIME(); INSERT INTO [dbo].[mProcessResult] SELECT cusID , firstName , lastName , address1 , address2 , countryCode , regDate , discount FROM dbo.mSampleCustomers WHERE DATEDIFF(YEAR, regDate, GETDATE()) > 10 AND discount < 20; SET @timeDiff = DATEDIFF(ms, @starttime, SYSDATETIME()); PRINT 'Total Time for Memory-Optimized Table: ' + CAST(@timeDiff AS VARCHAR(10)) + ' ms';
The below images illustrate the output of all steps.
Fetch Data from Linked Server - Disk-Based and Memory-Optimized Execution
Disk-Based Table Processing
Memory-Optimized Processing
Analysis
The below table summarizes the execution times for all the above:
Execution Mode | Step 1: Fetch Data from Remote Server | Step 1 Speedups | Step 2: Processing | Step 2 Speedups | Total Time | Total Speedup |
---|---|---|---|---|---|---|
Disk-Based | 191552 ms | 1x (baseline) | 33498 ms | 1x (baseline) | 225050 ms | 1 (baseline) |
Memory-Optimized | 42815 ms | 4.47x | 15253 ms | 2.20x | 58068 ms | 3.88x |
As you can see in the above summary table, in all the steps of the ETL process, the Memory-Optimized execution mode is faster when compared to Disk-Based execution with an overall speedup of 3.88 times faster over disk-based execution. More analytically, in Step 1, which was the process of fetching data from the linked server, the memory-optimized execution was 4.47 times faster over disk-based execution. In Step 2, the memory-optimized execution was 2.20 times faster over disk-based execution.
The above suggests that if you have a similar ETL process like the one examined in this tip and migrate to memory-optimized tables, there are good chances to achieve performance improvement, or at least, it is a possibility which you could investigate.
The below graphs illustrate the performance improvement observations.
Conclusion
SQL Server In-Memory OLTP is a very powerful technology which can help you achieve significant performance improvements over the "traditional" disk-based architecture. It however depends on the specific workload type you want to optimize using this technology. In this tip, we optimized a simple ETL process and, with memory-optimized tables, we achieved an overall speedup of 3.88 times faster over disk-based processing. There are other, even more suitable workload types, where you can achieve much larger speedups. For example, in a previous tip I wrote, I have achieved a speedup of 79x using memory-optimized tables.
Workload types that is proven that they can be significantly optimized using SQL Server In-Memory OLTP are those that have the following characteristics:
- High data insert rate
- Read performance
- Heavy data processing
- Low-latency
- Session state management
Independently of whether your workload type falls under the above types or not, you should still examine the possibility of migrating it to In-Memory OLTP, as you could still achieve major performance improvements. Towards this goal, you can perform analysis and run simulations in order to check if In-Memory OLTP can be beneficial for your workload type.
Next Steps
Review the following tips and other resources:
- MSSQLTips Article: Assessing the SQL Server In-Memory OLTP Performance Benefits
- MSSQLTips Article: Simple Experiment with SQL Server In-Memory OLTP is 79 Times Faster
- Microsoft Docs: In-Memory OLTP
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2017-08-10