Options to Improve SQL Server Bulk Load Performance

By:   |   Updated: 2014-12-31   |   Comments (6)   |   Related: More > Import and Export


Problem

As a DBA I am always looking for ways to improve SQL Server performance. SQL Server bulk loading data while it seems like a pretty straightforward task does have some different options that can be implemented to both improve the speed at which data is loaded as well as decrease the total amount of resources used to perform the data load. This tip will look at a few of those options.

Solution

In this tip we are going to focus our performance test on options using the BULK INSERT T-SQL command. More information on other methods of doing bulk loads can be found in this tip on Minimally logging bulk load insert into SQL Server.

Sample Table Setup for Bulk Insert Testing

In order to perform this test we are going to need some sample data to load so let's first create a simple table with a few indexes on it and load some sample data into it. The T-SQL code to perform these tasks is below.

-- Create sample table and indexes
CREATE TABLE testtable ([col1] [int] NOT NULL primary key clustered,
                        [col2] [int] NULL,
                        [col3] [int] NULL,
                        [col4] [varchar](50) NULL); 

CREATE INDEX idx_testtable_col2 on testtable (col2 asc);
CREATE INDEX idx_testtable_col3 on testtable (col3 asc);

-- Load sample data into table
DECLARE @val INT
SELECT @val=1
WHILE @val < 5000000
BEGIN  
   INSERT INTO testtable (col1, col2, col3, col4) 
       VALUES (@val,@val % 10,@val,'TEST' + CAST(@val AS VARCHAR))
   SELECT @val=@val+1
END
GO

Since in most bulk load cases you are pulling in data from a flat file let's use the bcp command to move the data we created above out to the file system. To perform this task we will also require a format file. The commands to do these steps are as follows.

-- Create a format file
bcp -S . testdb.dbo.testtable format nul -c -t, -f c:\testtable_formatfile.fmt -T

-- Create sample data file
bcp testdb.dbo.testtable out c:\testtable.dat -c -t, -f c:\testtable_formatfile.fmt -T -S .

Now we just have one last step to perform in order to complete our preparation. We need to empty the table that is in our database so we can reload data into it and to do this we will use the TRUNCATE command. Note: You will need to perform this step before testing each load scenario.

TRUNCATE TABLE testdb.dbo.testtable

Test Scenarios for SQL Server Bulk Insert

For this performance test we will look at the following 4 scenarios. Each scenario builds on the previous by adding a new option which will hopefully speed up performance.

  1. BULK load
  2. BULK load with tablock
  3. BULK load with tablock and drop/recreate indexes
  4. BULK load with tablock and drop/recreate indexes and change recovery model

The following is the code to perform each scenario. Remember as noted above that you will need to TRUNCATE the target table before executing each scenario.

--BULK load
BULK INSERT testdb.dbo.testtable
FROM 'C:\testtable.dat'
WITH (FORMATFILE = 'C:\testtable_formatfile.fmt');


--BULK load with tablock
BULK INSERT testdb.dbo.testtable
FROM 'C:\testtable.dat'
WITH (FORMATFILE = 'C:\testtable_formatfile.fmt', TABLOCK);


--BULK load with tablock and drop/recreate indexes
DROP INDEX testtable.idx_testtable_col2;
DROP INDEX testtable.idx_testtable_col3;

BULK INSERT testdb.dbo.testtable
FROM 'C:\testtable.dat'
WITH (FORMATFILE = 'C:\testtable_formatfile.fmt', TABLOCK);

CREATE INDEX idx_testtable_col2 on testtable (col2 asc);
CREATE INDEX idx_testtable_col3 on testtable (col3 asc);


--BULK load with tablock and drop/recreate indexes and change recovery model
ALTER DATABASE testdb SET RECOVERY SIMPLE;

DROP INDEX testtable.idx_testtable_col2;
DROP INDEX testtable.idx_testtable_col3;

BULK INSERT testdb.dbo.testtable
FROM 'C:\testtable.dat'
WITH (FORMATFILE = 'C:\testtable_formatfile.fmt', TABLOCK);

CREATE INDEX idx_testtable_col2 on testtable (col2 asc);
CREATE INDEX idx_testtable_col3 on testtable (col3 asc);
 
ALTER DATABASE testdb SET RECOVERY FULL;

Test Results

Let's take a look at the results after running each scenario. You can capture your statistics using either SQL Profiler or SET STATISTICS. I used SQL Profiler in this case.

Test Scenario CPU (ms) Reads Writes Duration (ms)
1 52291 75138341 57704 142978
2 30763 15165138 57572 96844
3 20763 195589 39629 67510
4 53006 195916 39651 62766

Looking at the results above as you would expect the performance gets better with each scenario. The one interesting thing to note (and I tested this multiple times to make sure it was accurate) is that although the last scenario completed faster than any of the other options it did use just as much CPU as the original option and more than options 2 and 3.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Ben Snaidero Ben Snaidero has been a SQL Server and Oracle DBA for over 10 years and focuses on performance tuning.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2014-12-31

Comments For This Article




Thursday, January 31, 2019 - 9:38:06 AM - David Plaut Back To Top (78924)

I have found it helpful to disable and rebuild indexes rather than drop/recreate.


Thursday, January 15, 2015 - 3:41:37 AM - Chris Vandekerkhove Back To Top (35950)

Hi,

When working with big tables you might want to modify the "batchsize" parameter: a bigger value will improve your performance. The default value is 1000 but I often set it to 100000.

E.g.:

bcp mydatabase..mytable in myfile.txt -f myformatfile.fmt -T -S myserver -b 100000


Wednesday, January 14, 2015 - 5:17:06 PM - Ben Snaidero Back To Top (35945)

To answer some of the questions posed.

I was only capturing file reads/writes through profiler, so not transaction log writes.

I chose simple since there was no other activity during this test and it would yield the same results as bulk-logged anyway.  Real world you would probably want to use bulk-logged if there is other activity going on at the same time that you need logged.

BATCHSIZE was not specified as I assume this would only slow it down if we broke it up into multiple transactions.

Truncate was used between each scenario but no checkpoints.

Thanks for reading

Ben.


Wednesday, January 14, 2015 - 12:56:18 PM - Eric Russell Back To Top (35942)

It's interesting how scenario #4 resulted in 2.5 x the CPU utilization and slightly more reads and writes while offering on slight decrease in duration.

Are you setting BATCHSIZE parameter? Only the first batch is minimally logged.

Are you truncating the table between each test?

Also, between each test, are you performing CHECKPOINT to write all pending I/O to disk and DBCC FREESYSTEMCACHE to clear page cache buffer?


Wednesday, January 14, 2015 - 12:41:32 PM - JeremyH Back To Top (35941)

Any theory as to why the writes didn't really go for test scenario 4? I would expect a lot less writes to the t-log since it should have been minimally logged. Were you only capturing data file writes/reads?


Wednesday, January 14, 2015 - 11:05:53 AM - logitestus Back To Top (35939)

I noticed that you set the recovery model to SIMPLE why not to BULK_LOGGED? 















get free sql tips
agree to terms