ETL Peformance Gains Using SQL Server In-Memory OLTP

By:   |   Updated: 2017-08-10   |   Comments (2)   |   Related: > In Memory OLTP


Problem

SQL Server In-Memory OLTP is a very powerful feature which is fully integrated into SQL Server's database engine. In previous tips we saw ways of assessing the SQL Server In-Memory OLTP performance benefits as well as a simple experiment where In-Memory OLTP processing was 79 times faster over disk-based. With this tip, we start a series of tips which will be featuring different examples of disk-based workloads, which could be migrated to Memory-Optimized processing and achieve performance improvements.

In this tip we will talk about a simple ETL process, see how we can migrate it to In-Memory OLTP processing in SQL Server, and check the performance improvement.

Solution

The scenario of the ETL process in this example is the following:

A source SQL Server instance, has one disk-based table named “dSrcSampleCustomers” that has 10M records. The contents of this table are transferred into two tables on another destination SQL Server instance (step 1). One of the tables on the destination SQL Server instance is disk-based and the other one is memory-optimized. After the data is fetched from the source instance, an INSERT INTO statement runs against each table and based on some WHERE conditions, it stores the results in a new disk-based and memory-optimized tables (step 2).

All the above execution times are being measured and we examine how much faster the process runs when using memory-optimized tables on the destination SQL Server instance. Note that the source SQL Server instance's table is disk-based. We don't modify anything on the source SQL Server instance, but rather we focus on optimizing the destination SQL Server tables using In-Memory OLTP.

The below diagram illustrates the ETL process:

Sample ETL Process Diagram

Table Sizes

In this example, the table's size at the source instance is:

  • dSrcSampleCustomers: 10M records

Similarly, since the data will be imported from the source to the destination SQL Server instance, the table sizes on the destination SQL Server instance will be:

  • dSampleCustomers (disk-based): 10M records
  • mSampleCustomers (memory-optimized): 10M records

Moreover, on the destination SQL Server instance, there are another two tables which will host the results of the queries that will be executed against the previous two tables:

  • dProcessResult (disk-based)
  • mProcessResult (memory-optimized)

Sample Database

In this tip, I'm using an In-Memory OLTP-enabled sample database which I created and named "IMOLTPSampleDB".

Linked Server

The below screenshot shows the linked server used in this tip:

Sample Linked Server

Disk-Based Table Definitions

--
--Disk-Based Tables
--
USE [IMOLTPSampleDB]
GO

--Disk-Based Table - Data
CREATE TABLE [dbo].[dSampleCustomers](
 [cusID] [INT] NOT NULL PRIMARY KEY,
 [firstName] [VARCHAR](100) NOT NULL,
 [lastName] [VARCHAR](100) NOT NULL,
 [address1] [VARCHAR](200) NOT NULL,
 [address2] [VARCHAR](200) NULL,
 [countryCode] [CHAR](3) NOT NULL,
 [regDate] [DATE] NOT NULL,
 [discount] [FLOAT] NULL
);
GO

--Disk-Based Table - Process Result
CREATE TABLE [dbo].[dProcessResult]
(
 [cusID] [INT] NOT NULL PRIMARY KEY,
 [firstName] [VARCHAR](100),
 [lastName] [VARCHAR](100),
 [address1] [VARCHAR](200),
 [address2] [VARCHAR](200),
 [countryCode] [CHAR](3),
 [regDate] [DATE] NOT NULL,
 [discount] [FLOAT] NULL
);
GO

Memory-Optimized Table Definitions

--
--Memory-Optimized Tables
--
USE [IMOLTPSampleDB]
GO

--Memory-Optimized Table - Data
CREATE TABLE [dbo].[mSampleCustomers]
(
 [cusID] [INT] NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH ( BUCKET_COUNT = 10000000),
 [firstName] [VARCHAR](100) NOT NULL,
 [lastName] [VARCHAR](100) NOT NULL,
 [address1] [VARCHAR](200) NOT NULL,
 [address2] [VARCHAR](200) NULL,
 [countryCode] [CHAR](3) NOT NULL,
 [regDate] [DATE] NOT NULL,
 [discount] [FLOAT] NULL 
)WITH (MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_AND_DATA);
GO

--Memory-Optimized Table - Process Result
CREATE TABLE [dbo].[mProcessResult]
(
 [cusID] [INT] NOT NULL PRIMARY KEY NONCLUSTERED,
 [firstName] [VARCHAR](100),
 [lastName] [VARCHAR](100),
 [address1] [VARCHAR](200),
 [address2] [VARCHAR](200),
 [countryCode] [CHAR](3),
 [regDate] [DATE] NOT NULL,
 [discount] [FLOAT] NULL
)WITH (MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_AND_DATA);
GO

As you can see above, in this tip's example, the memory-optimized tables are durable, that is why I have set DURABILITY = SCHEMA_AND_DATA.

Note that besides the primary keys in each table, for this example, we are not investigating the behavior of the process when using different indexes as this not within the scope of this tip.

Below you can find the T-SQL script that fetches the data from the linked server into the disk-based and memory-optimized tables on the destination SQL Server instance and measures the execution times.

Fetch Data from Linked Server Into Disk-Based and Memory-Optimized Tables

USE IMOLTPSampleDB
GO

--Initialize tables
TRUNCATE TABLE dbo.[dSampleCustomers]
GO
DELETE dbo.[mSampleCustomers]
GO

--Initialize 
DECLARE @starttime DATETIME2;  
DECLARE @timeDiff INT;  

--Clear buffers
CHECKPOINT;
DBCC DROPCLEANBUFFERS;

--Fetch data into disk-based table
SET @starttime=SYSDATETIME();
INSERT INTO dbo.[dSampleCustomers]
SELECT * FROM [SQLSourceServer].[SampleDB].dbo.dSrcSampleCustomers;
SET @timeDiff = datediff(ms, @starttime, sysdatetime());  
PRINT 'Total Time to Fetch 10M Records to Disk-Based Table: ' + CAST(@timeDiff as VARCHAR(10)) + ' ms';  

--Clear buffers
CHECKPOINT;
DBCC DROPCLEANBUFFERS;

--Fetch data into memory-optimized table
SET @starttime=SYSDATETIME();
INSERT INTO dbo.[mSampleCustomers]
SELECT * FROM [SQLSourceServer].[SampleDB].dbo.dSrcSampleCustomers;
SET @timeDiff = datediff(ms, @starttime, sysdatetime());  
PRINT 'Total Time to Fetch 10M Records to Memory-Optimized Table: ' + CAST(@timeDiff as VARCHAR(10)) + ' ms';  

Below you can find the T-SQL scripts for the corresponding processes for each execution mode:

Disk-Based Table Processing

USE IMOLTPSampleDB;
GO

TRUNCATE TABLE [dbo].[dProcessResult];
GO

CHECKPOINT;
DBCC DROPCLEANBUFFERS;
GO

DECLARE @starttime DATETIME2;
DECLARE @timeDiff INT;

--Process Disk-Based Table
SET @starttime = SYSDATETIME();
INSERT INTO [dbo].[dProcessResult]
            SELECT cusID ,
                   firstName ,
                   lastName ,
                   address1 ,
                   address2 ,
                   countryCode ,
                   regDate ,
                   discount
            FROM   dbo.dSampleCustomers
            WHERE  DATEDIFF(YEAR, regDate, GETDATE()) > 10
                   AND discount < 20;
SET @timeDiff = DATEDIFF(ms, @starttime, SYSDATETIME());
PRINT 'Total Time for Disk-Based Table: ' + CAST(@timeDiff AS VARCHAR(10)) + ' ms';

Memory-Optimized Table Processing

USE IMOLTPSampleDB;
GO

DELETE [dbo].[mProcessResult];
GO

CHECKPOINT;
DBCC DROPCLEANBUFFERS;
GO

DECLARE @starttime DATETIME2;
DECLARE @timeDiff INT;

--Process Memory-Optimized Table
SET @starttime = SYSDATETIME();
INSERT INTO [dbo].[mProcessResult]
            SELECT cusID ,
                   firstName ,
                   lastName ,
                   address1 ,
                   address2 ,
                   countryCode ,
                   regDate ,
                   discount
            FROM   dbo.mSampleCustomers
            WHERE  DATEDIFF(YEAR, regDate, GETDATE()) > 10
                   AND discount < 20;
SET @timeDiff = DATEDIFF(ms, @starttime, SYSDATETIME());
PRINT 'Total Time for Memory-Optimized Table: ' + CAST(@timeDiff AS VARCHAR(10)) + ' ms';

The below images illustrate the output of all steps.

Fetch Data from Linked Server - Disk-Based and Memory-Optimized Execution

Fetch Data From Linked Server

Disk-Based Table Processing

Disk-Based Table Processing

Memory-Optimized Processing

Memory-Optimized Processing

Analysis

The below table summarizes the execution times for all the above:

Execution Mode Step 1: Fetch Data from Remote Server Step 1 Speedups Step 2: Processing Step 2 Speedups Total Time Total Speedup
Disk-Based 191552 ms 1x (baseline) 33498 ms 1x (baseline) 225050 ms 1 (baseline)
Memory-Optimized 42815 ms 4.47x 15253 ms 2.20x 58068 ms 3.88x

As you can see in the above summary table, in all the steps of the ETL process, the Memory-Optimized execution mode is faster when compared to Disk-Based execution with an overall speedup of 3.88 times faster over disk-based execution. More analytically, in Step 1, which was the process of fetching data from the linked server, the memory-optimized execution was 4.47 times faster over disk-based execution. In Step 2, the memory-optimized execution was 2.20 times faster over disk-based execution.

The above suggests that if you have a similar ETL process like the one examined in this tip and migrate to memory-optimized tables, there are good chances to achieve performance improvement, or at least, it is a possibility which you could investigate.

The below graphs illustrate the performance improvement observations.

Memory-Optimized Speedup for Sample ETL Process - Step 1
Memory-Optimized Speedup for Sample ETL Process - Step 2
Memory-Optimized Speedup for Sample ETL Process - Overall

Conclusion

SQL Server In-Memory OLTP is a very powerful technology which can help you achieve significant performance improvements over the "traditional" disk-based architecture. It however depends on the specific workload type you want to optimize using this technology. In this tip, we optimized a simple ETL process and, with memory-optimized tables, we achieved an overall speedup of 3.88 times faster over disk-based processing. There are other, even more suitable workload types, where you can achieve much larger speedups. For example, in a previous tip I wrote, I have achieved a speedup of 79x using memory-optimized tables.

Workload types that is proven that they can be significantly optimized using SQL Server In-Memory OLTP are those that have the following characteristics:

  • High data insert rate
  • Read performance
  • Heavy data processing
  • Low-latency
  • Session state management

Independently of whether your workload type falls under the above types or not, you should still examine the possibility of migrating it to In-Memory OLTP, as you could still achieve major performance improvements. Towards this goal, you can perform analysis and run simulations in order to check if In-Memory OLTP can be beneficial for your workload type.

Next Steps

Review the following tips and other resources:



sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Artemakis Artemiou Artemakis Artemiou is a Senior SQL Server and Software Architect, Author, and a former Microsoft Data Platform MVP (2009-2018).

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2017-08-10

Comments For This Article




Tuesday, January 23, 2018 - 5:53:07 PM - Artemakis Artemiou Back To Top (75024)

Hi Craig,

Thank you for your comment. Yes, all my disk-based tables (and the entire database) were hosted on SSD disk.

 

Kind Regards,

Artemakis


Tuesday, January 23, 2018 - 3:26:27 PM - Craig Back To Top (75022)

 Were the disks SSDs?

 















get free sql tips
agree to terms