Data Import Performance Comparison T-SQL vs SSIS for large import

By:   |   Updated: 2016-04-12   |   Comments (7)   |   Related: 1 | 2 | More > Import and Export


Problem

I need to import an Excel file to SQL Server that has 1 million rows. Do you recommend using T-SQL or SSIS? Can you provide a comparison of the options? Which option is easier to code? Which option performs better?

Solution

We are going to try both options (T-SQL and SSIS) and you will be able to compare the results!

Requirements

  1. SQL Server relational database engine installed
  2. SSIS installed
  3. SQL Server Data Tools Installed (SSDT) or BIDS (Business Intelligence Development Studio) installed.
  4. You can use SQL Server 2005 or later versions. In this example, we are using SQL Server 2014.
  5. Microsoft Excel 2013 installed.

Import 1,000,000 Rows with SQL Server Integration Services

  1. In this fist example, we are going to import a million rows from Excel to SQL Server.
  2. In the SSDT, create a new SSIS project.
  3. Drag and drop the Data Flow Task to the design pane.


  4. Drag and drop the Data Flow Task to the design pane


  5. Double click on the task and drag and drop the Excel source and the SQL Server Destination and join the tasks.


  6. Double click on the task and drag and drop the Excel source and the SQL Server Destination and join the tasks


  7. In the Excel Source add a new connection to the Excel file.


  8. In the Excel Source add a new connection to the Excel file


  9. Double click on the SQL Destination Task and select a SQL connection and in the use a table or view, press the new button to create a new table. Click on the Mappings page to match columns.


  10. Double click on the SQL Destination Task and select a SQL connection and in the use a table or view, press the new button to create a new table


  11. If you run the package, you will notice that only 65,535 rows were copied.


  12. If you run the package, you will notice that only 65,535 rows were copied


  13. It took just 5 seconds, but we need to import a million rows!


  14. It took just 5 seconds, but we need to import a million rows


  15. The easiest way to solve this problem is to save the Excel file as a CSV file.


  16. The easiest way to solve this problem is to save the Excel file as a CSV file


  17. Let's remove the other tasks and add the Flat File Source and a SQL Server Destination.


  18. Let's remove the other tasks and add the Flat File Source and a SQL Server Destination


  19. Press the browse button and select the CSV file.


  20. Press the browse button and select the csv file


  21. Go to the columns page to match the columns.


  22. Go to the columns page to match the columns


  23. If you run the package, you will see that the million rows were imported to SQL Server successfully.


  24. If you run the package, you will see that the million rows were imported to SQL Server successfully


  25. If we check the time in the progress, we will notice that it took 15.959 seconds.


  26. If we check the time in the progress, we will notice that it took 15 seconds almost 16 seconds


Import 1,000,000 Rows with T-SQL

  1. Let's try the same with T-SQL. We are going to import the data using Linked Servers.
  2. If you are not familiar with Linked servers, we strongly recommend you to check our tip about Linked Servers for Microsoft Excel.
  3. The following code will create the Linked Server:

  4. EXEC sp_addlinkedserver 
    @server = N'ExcelDataSource', 
    @srvproduct=N'ExcelData', 
    @provider=N'Microsoft.ACE.OLEDB.12.0', 
    @datasrc=N'C:\scripts\Excelfile.xlsx',
    @provstr=N'EXCEL 12.0' ;

  5. Try to expand the tables of the Linked Server:


  6. Try to expand the tables of the Linked Server


  7. A typical error is this: The OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "ExcelDataSource" reported and error. Access denied. Cannot obtain the required interface ("IID_IDBSchemaRowset") from OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "ExcelDataSource". (Microsoft SQL Server, Error: 7399)


  8. The OLE DB provider Microsoft.ACE.OLEDB.12.0 for linked server ExcelDataSource reported and error. Access denied. Cannot obtain the required interface


  9. One of the solutions is to install the following driver for MS Office. By default, this driver is not installed. You will need to restart the machine after installing this driver.
  10. If that does not work, run the following commands. The AllowInProcess is used to run the process as a SQL Server Process and the DynamicParameter option is used to run parameterized queries.

      USE[master]
      GO
      EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0',N'AllowInProcess',1
      GO
      EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0',N'DynamicParameters',1
      GO
     

  11. Also, make sure that the query is running under an account with access to the Excel file.
  12. Once the Linked Server is working, run the following query to create a SQL Server table named MylinkedExcel.


  13. SELECT [BusinessEntityID]
    ,[PersonType]
    ,[NameStyle]
    ,[Title]
    ,[FirstName]
    ,[MiddleName]
    ,[LastName]
    INTO MylinkedExcel
    FROM [ExcelDataSource]...[Sheet1$]

  14. If we measure the time, we will notice that it takes more than 7 minutes and 47 seconds!


  15. If we measure the time, we will notice that it takes more than 7 minutes and 47 seconds


  16. OK, now we know that using a Linked Server to import data from Excel is very slow. We are going to use the CSV we created above and then use the Bulk Insert command.
  17. To use the Bulk Insert command we will need to first create the table manually:


  18. CREATE TABLE [dbo].[excelcustomer2](	[BusinessEntityID] [varchar](50) NULL,
    	[PersonType] [varchar](50) NULL,
    	[NameStyle] [varchar](50) NULL,
    	[Title] [varchar](50) NULL,
    	[FirstName] [varchar](50) NULL,
    	[MiddleName] [varchar](50) NULL,
    	[LastName] [varchar](50) NULL,
    	[Column 7] [varchar](50) NULL,
    	[Column 8] [varchar](50) NULL,
    	[Column 9] [varchar](50) NULL
    ) ON [PRIMARY]
    
    GO;
  19. This code will import the data from the CSV file to the table we just created.


  20. BULK
    INSERT excelcustomer2
    FROM 'c:\scripts\excelfile.csv'
    WITH
    (
    FIELDTERMINATOR = ',',
    ROWTERMINATOR = '\n'
    )
    GO;
  21. If we check the time, we will notice that it took 31 seconds to import the data.


  22. Executing the package

Conclusion

To summarize, SSIS has an Excel limitation of 65535 rows. We had to work with the flat file component and convert from Excel to a CSV file to import a million rows.

On the other hand, T-SQL had several problems to access the Linked Server file from Excel, but finally when it was working it took more than 7 minutes to import the data. We then tried again with the CSV file and used the Bulk Insert command to improve the performance.

Who is the winner?....the table summarizes the results:

Solution Time
SSIS using the Excel component 5 seconds, but it is limited to only 65535 rows
SSIS converting the Excel to CSV and using the flat file component 16 seconds
Linked Server to Excel 7 minutes, 47 seconds (but it worked)
The Bulk Insert statement converting the Excel file to a CSV file 31 seconds

As you can see, SSIS has very fast tools to import and export files. It also has the advantage that the destination table is created automatically while the Bulk Insert statement requires creating the table manually. In this example SSIS is the winner. Using SSIS can be hard at the beginning and you will find some limitations, but T-SQL has also limitations, so we recommend studying both options to find the best solution for your problem.

Next Steps

Here are several links that will be useful to you:



sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Daniel Calbimonte Daniel Calbimonte is a Microsoft SQL Server MVP, Microsoft Certified Trainer and 6-time Microsoft Certified IT Professional. Daniel started his career in 2001 and has worked with SQL Server 6.0 to 2022. Daniel is a DBA as well as specializes in Business Intelligence (SSIS, SSAS, SSRS) technologies.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2016-04-12

Comments For This Article




Tuesday, August 13, 2019 - 8:42:49 AM - Gintaras Back To Top (82047)

Hi,

what about the CPU used by SSIS vs Linked server?

During my testing, SSIS used ~1.5 CPU core, when Linked server used 1 CPU. 

So in terms of CPU used per row transferred, Linked server is is quite close to SSIS. And since we're paying for SQL license per CPU core, Linked servers might be more cost-efficient.

The accuracy of my testing was very approximate though, so would be good if somebody did a more accurate testing.


Wednesday, September 12, 2018 - 6:08:30 AM - rmpbklyn Back To Top (77516)

Can you provide instructions for visual studio communty ssis install? 


Friday, May 27, 2016 - 5:30:18 PM - Igor Back To Top (41577)

Hi

Microsoft.ACE.OLEDB.12.0 has big problems with the types. It decides which type to use using first 8 rows every time when you run package.  
There are potential data loss, because if convertation is failed, driver will load it as null.

I think better to use custom loading using OpenXmlReader (SAX method).
Better not use some libraries like ClosedXML and so on because of problems with memory.

Sunday, April 17, 2016 - 7:31:24 AM - Daniel Back To Top (41245)

 

Thanks, I will definitibly talk about BCP in future articles.


Sunday, April 17, 2016 - 6:05:47 AM - Laslo Back To Top (41244)

65535 rows is an old Excel limitation. New formats (xlsx) can store a million rows on a sheet. 
It would be worth adding BCP option though to complete this article as this is a freqently reocurring task. 

 


Tuesday, April 12, 2016 - 4:13:04 PM - Jon Morisi Back To Top (41204)

My first thougth would have been to save as csv and bcp it in.  How does that compare with your dataset?

https://msdn.microsoft.com/en-us/library/ms162802.aspx


Tuesday, April 12, 2016 - 7:41:40 AM - Koen Verbeeck Back To Top (41200)

Hi Daniel,

how did you configure you excel connection manager?

I just tested on a .xlsx file with a Excel 2007 excel connection manager, and I have no trouble reading one million rows.

Regarding SSIS: you probably can make it faster by using the fastparse options on the flat file (depending on the columns) and by setting a bigger buffer size.

Regarding SQL: using linked servers is obviously slow. Using bcp (which is not really SQL though) is probably much faster, like the BULK INSERT command.















get free sql tips
agree to terms