Importing Data From Excel Using SSIS - Part 2

By:   |   Updated: 2012-10-02   |   Comments (24)   |   Related: More > Integration Services Excel


Problem

In my last tip, "Importing Data From Excel Using SSIS - Part 1" we found out that the SSIS Excel Connection Manager determines the data type of each column of the worksheet on the basis of data for that particular column from the first 8 rows. This is the default behavior and connection manager uses a value in the registry key to consider the number of rows for data type determination. We also saw that the data import fails if the data in others rows (other than first 8 rows) is not compatible or has a longer length than the data in the first 8 rows. I talked about different ways to handle this to make the data import succeed. I also talked about the easiest solution which is to change the registry key value, but this has its own implications and hence the question; what other options are there without creating too much additional overhead?

Solution

In my last tip, the easiest solution was to make a change in the registry but in many scenarios you might not have control to make this change for varous reasons.  Even if you have control in changing this setting, this change might cause your other SSIS packages to perform poorly (based on the amount of data you have in an Excel worksheet) and it may impact other systems as well wherever this registry key is being referenced. So I have another solution for this issue.

The basic idea of this solution is to convert the Excel worksheet to CSV file and use the Flat File Connection Manager/Source adapter to import data from the CSV file. With the Flat File Connection Manager/Source Adapter we have more control to define the data type and length of each column.

These are questions that come to mind when thinking about converting an Excel worksheet to a CSV file:

  • When saving an Excel worksheet using a csv extension does this make it a CSV file?
  • A CSV file uses a comma as column separator, but what if there are commas in the data itself?
  • What is the impact of converting an Excel worksheet to CSV file?

Well, simply saving the Excel worksheet using a CSV extension will not make it CSV file as the storage format is different for both of these file types. Rather we can use the Excel Object Library to save the Excel worksheet as a CSV file using the Script Task in SSIS and then we can import the data directly from the CSV file as shown below. 

Importing Data From Excel Using SSIS

In the Script Task, when writing code for conversion, first of all you need to add a reference to Microsoft.Office.Interop.Excel under .NET components as shown below: 

you need to add a reference to Microsoft.Office.Interop.Excel under .NET component

After adding the required reference, the References node in the Solution Explorer will look like this:

the References node in the Solution Explorer will look like this

Once you have added the required reference, you can add these lines of code. The complete list of code for converting Excel worksheet to CSV file is provided below. You need to provide the location and name of the Excel worksheet along with the name of the worksheet itself and then the location and name for the CSV file which will be created:

/*
   Microsoft SQL Server Integration Services Script Task
   Write scripts using Microsoft Visual C# 2008.
   The ScriptMain is the entry point class of the script.
*/
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using Excel = Microsoft.Office.Interop.Excel;
using Microsoft.Office.Interop.Excel; 
namespace ST_6dc747ff29cf41c6ac11b7c0bca33d19.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion
        /*
  The execution engine calls this method when the task executes.
  To access the object model, use the Dts property. Connections, variables, events,
  and logging features are available as members of the Dts property as shown in the following examples.
  To reference a variable, call Dts.Variables["MyCaseSensitiveVariableName"].Value;
  To post a log entry, call Dts.Log("This is my log text", 999, null);
  To fire an event, call Dts.Events.FireInformation(99, "test", "hit the help message", "", 0, true);
  To use the connections collection use something like the following:
  ConnectionManager cm = Dts.Connections.Add("OLEDB");
  cm.ConnectionString = "Data Source=localhost;Initial Catalog=AdventureWorks;Provider=SQLNCLI10;Integrated Security=SSPI;Auto Translate=False;";
  Before returning from this method, set the value of Dts.TaskResult to indicate success or failure.
  
  To open Help, press F1.
 */
        private static Workbook mWorkBook;
        private static Sheets mWorkSheets;
        private static Worksheet mWSheet1;
        private static Excel.Application oXL;
        private static string ErrorMessage = string.Empty;
        public void Main()
        {
            try
            {
                string sourceExcelPathAndName = @"D:\Excel Import\Excel Import.xls";
                string targetCSVPathAndName = @"D:\Excel Import\Excel Import.csv";
                string excelSheetName = @"Sheet1";
                string columnDelimeter = @"|#|";
                int headerRowsToSkip = 0;
                if (ConvertExcelToCSV(sourceExcelPathAndName, targetCSVPathAndName, excelSheetName, columnDelimeter, headerRowsToSkip) == true)
                {
                    Dts.TaskResult = (int)ScriptResults.Success;
                }
                else
                {
                    Dts.TaskResult = (int)ScriptResults.Failure;
                }
            }
            catch (Exception ex)
            {
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }
        public static bool ConvertExcelToCSV(string sourceExcelPathAndName, string targetCSVPathAndName, string excelSheetName, string columnDelimeter, int headerRowsToSkip)
        {
            try
            {
                oXL = new Excel.Application();
                oXL.Visible = false;
                oXL.DisplayAlerts = false;
                Excel.Workbooks workbooks = oXL.Workbooks;
                mWorkBook = workbooks.Open(sourceExcelPathAndName, 0, false, 5, "", "", false, XlPlatform.xlWindows, "", true, false, 0, true, false, false);
                //Get all the sheets in the workbook
                mWorkSheets = mWorkBook.Worksheets;
                //Get the specified sheet
                mWSheet1 = (Worksheet)mWorkSheets.get_Item(excelSheetName);
                Excel.Range range = mWSheet1.UsedRange;
                //deleting the specified number of rows from the top
                Excel.Range rngCurrentRow;
                for (int i = 0; i < headerRowsToSkip; i++)
                {
                    rngCurrentRow = range.get_Range("A1", Type.Missing).EntireRow;
                    rngCurrentRow.Delete(XlDeleteShiftDirection.xlShiftUp);
                }
                //replacing ENTER with a space
                range.Replace("\n", " ", Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing);
                //replacing COMMA with the column delimeter
                range.Replace(",", columnDelimeter, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing);
                mWorkBook.SaveAs(targetCSVPathAndName, XlFileFormat.xlCSVMSDOS,
                Type.Missing, Type.Missing, Type.Missing, Type.Missing, Microsoft.Office.Interop.Excel.XlSaveAsAccessMode.xlExclusive,
                Type.Missing, Type.Missing, Type.Missing,
                Type.Missing, false);
                return true;
            }
            catch (Exception ex)
            {
                ErrorMessage = ex.ToString();
                return false;
            }
            finally
            {
                if (mWSheet1 != null) mWSheet1 = null;
                if (mWorkBook != null) mWorkBook.Close(Type.Missing, Type.Missing, Type.Missing);
                if (mWorkBook != null) mWorkBook = null;
                if (oXL != null) oXL.Quit();
                System.Runtime.InteropServices.Marshal.ReleaseComObject(oXL);
                if (oXL != null) oXL = null;
                GC.WaitForPendingFinalizers();
                GC.Collect();
                GC.WaitForPendingFinalizers();
                GC.Collect();
            }
        }
    }
}

Below you can see the Excel worksheet which I am using as the source file for conversion.  You will notice row number 15 has a long string and also the description of this row contains commas in it as well.

you can see the Excel worksheet which I have used as source file for conversion

After the conversion, the CSV file will look like this. You will notice where there were commas we now have some special characters "|#|":

After the conversion, the CSV file will look like this

The idea behind this is, before conversion replace all the commas with some special characters and after importing the data from the CSV file update the special characters back to a comma.  This can be done using code such as this.

SELECT * FROM [dbo].[ProductInformation]
UPDATE [dbo].[ProductInformation]
SET [Description] = REPLACE([Description], '|#|', ',')
SELECT * FROM [dbo].[ProductInformation] 

In order to use this approach, we need to have extra storage space for having both a CSV file along with an Excel file. Apart from that, the Excel object library will take a few seconds to save the file as a CSV file. I haven't tried it on very large file, though I think this should not take much longer.  Once you have the data loaded you can do anything else you need to at that point.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Arshad Ali Arshad Ali is a SQL and BI Developer focusing on Data Warehousing projects for Microsoft.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2012-10-02

Comments For This Article




Tuesday, April 12, 2016 - 6:35:14 PM - Bruce Hendry Back To Top (41207)

To Pablo and anyone else who is using the code in the article and getting the "one or more types required to compile a dynamic expression cannot be found" error...add a reference to the Microsoft.CSharp .NET assembly in project references.

Well written article and good quality code sample, thanks!

 

 


Sunday, February 14, 2016 - 11:53:51 PM - Douglas Back To Top (40680)

 

 Found the probelm I was having. I did a manual find and replace in the file to remove commas and Excel said, 'Are you trying to change a formula?' 

Formulas were at the tail end of my second row. So I got rid of the formulas and have no more problems with this process. 

 

Thank you so much for this code.


Sunday, February 14, 2016 - 7:59:50 PM - Douglas Back To Top (40677)

Thanks fo the code.

In some files, range.replace of ',' is not finding all commas. It is consistantly only doing the replace in the first two rows. As a debug I check the used range immediatly after the replace step and find that it is covering 2,000 rows as I expect.

This does work for many other files of the same format. Has anyone else had this problem? I can't identify anything wrong with the file. 

 

 


Tuesday, April 14, 2015 - 5:50:21 PM - Pablo Guereca Back To Top (36936)

Hi, I am having an error with this code in the part:

 

   mWSheet1 = (Worksheet)mWorkSheets.get_Item(excelSheetName);

 

one or more types required to compile a dynamic expression cannot be found. are you missing"

 

I am using SSIS 2012

 

 


Sunday, April 12, 2015 - 2:37:45 AM - SQL2008 Back To Top (36908)

hi Arshad, thanks for the post on converting xl to csv dynamically, I am working on a similar concept of loading xl file with multiple sheets into SQL table using c#, I want to replace the connection ["Excel"] to be passed from a variable:

 // Get connectionstring from Excel Connection Manager and use it to connect through OLE DB

                OleDbConnection excelConnection = new OleDbConnection(Dts.Connections["Excel"].ConnectionString);

                excelConnection.Open();

can you direct any reference on this type of scenario, appretiate your response,

thanks


Wednesday, October 29, 2014 - 12:30:11 PM - Hussaini Back To Top (35112)

Hi Arshad,

Thanks Arshad for very clear and detail explanation. I know my question is not related this topic but more similar I want to Import data from MS Access and I tried few times but was not successful, and do you have any topics or notes that could help us.

 

Thanks,

Hussaini. 


Monday, March 31, 2014 - 7:21:24 PM - rick Back To Top (29931)

How do we run this from a SQL Server Agent Job like Curly said? It doesn't want to run...


Wednesday, December 18, 2013 - 2:23:45 PM - Scott Coleman Back To Top (27833)

If you're going to use Excel interop to load the spreadsheet, why don't you just move the code from a Script Task to a Script Source Component?  Just add a loop to copy the cell values row by row into the row buffer.  Then you have complete control over how every column is interpreted.


Monday, October 14, 2013 - 2:45:50 PM - Robert Mitchell Back To Top (27146)

Stuart,

 

It would seem that adding "IMEX=1" would solve everything, but in reality it doesn't. Even with IMEX set to 1, if a column has data that Excel's ISAM driver interprets as numeric in the first eight rows, it will still set the column datatype as "numeric" whether IMEX=1 or not, and will still return a NULL if a subsequent row contains a string. My example is a customer purchase order number field where the first fifteen entries are numeric, like 65088976, and the sixteenth entry is C65098787. The Excel Source returns this cell, and any others like it, as NULL. Increasing the number of rows to use for the guess is out of the question, as I cannot meddle with registry settings on a production server. (I tried it on my local workstation to see if it would work, and it didn't, anyway, so it's still not a solution.) I tried inserting data conversion tasks, derived column tasks, etc, but the dropping of the actual cell value happens at the moment the Excel source reads the row. 


Monday, July 29, 2013 - 4:39:19 AM - Abhi Desai Back To Top (26040)

Hi Arshad,

Thanks for such great article and your simple language.

Your Part 1 and Part 2 both are really helpful.

Ragards,
Abhi Desai


Tuesday, June 4, 2013 - 11:25:42 AM - Arshad Back To Top (25280)

Hi Stuart and Izhar,

Please have a look at this.

http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/78b87712-8ffe-4c72-914b-f1c031ba6c75


Monday, June 3, 2013 - 10:32:55 AM - Stuart Steedman Back To Top (25254)

This came up as a featured article today.  Yet, the reasons for it and the solution to the problem are both poorly conceived.

 

The solution to the problem is to set the connection string with the extra flag: IMEX=1

(As mentioned by Izhar Azati in the comments to Part 1)

This renders the whole purpose of this article obsolete.

 

Secondly, converting a multi-worksheet workbook into a CSV is not going to work and even if it were just a single worksheet, this is a massive overhead for no good reason.


Sunday, June 2, 2013 - 11:40:26 AM - Faran Back To Top (25241)

Hi Guys 

I was trying it. Have excel installed as well. The problem is that I can't add the reference Microsoft.Office.Interop.Excel under .NET components  in the Script Task,  because it is not available in the list. Any help will be appreciated?

 

regards

 

Faran


Friday, May 24, 2013 - 2:29:44 AM - ashish Back To Top (25116)

 

hi

i have created excel file from sql table but for varchar columns

i'm getting single quotes before text like for example

('Myname)

why is it so ?

i have used ado.net source and excel destination..


Monday, April 15, 2013 - 11:47:27 AM - KevH Back To Top (23356)

Does this handle speach marks, commas and blank columns?


Thursday, March 21, 2013 - 10:36:03 AM - Kashif Back To Top (22935)

Hi Arshad,

I am trying to load a pipe '|' delimited file using SSIS, I am getting the following error related to target code page. I appreciate if you can provide some help.

Error: 0xC02020A1 at Data Flow Task 1, Source - Account_txt [1]: Data conversion failed. The data conversion for column "FORMULA0,Account" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page.".
Error: 0xC020902A at Data Flow Task 1, Source - Account_txt [1]: The "output column "FORMULA0,Account" (26)" failed because truncation occurred, and the truncation row disposition on "output column "FORMULA0,Account" (26)" specifies failure on truncation. A truncation error occurred on the specified object of the specified component.


Tuesday, January 8, 2013 - 9:17:56 AM - Imane Back To Top (21314)

Hi,

I'm trying to find a way to change from Excel to text file using SQL code.

It is somehow like a converter from Excel to .txt file with the rules I can add to that txt file in order to have the structure that I want.

Can anyone help please?

 

Thanks!

 


Wednesday, December 19, 2012 - 7:20:37 AM - curly Back To Top (21026)

Chris, You do have to install Exel on the server, nothing else works. I tried to add Microsoft.Office.Interop.Excel.dll to the windows\asembly 10 diffrent ways with no luck. Installing Excel will put the dll in the asembly so you can use it. The package will pickup and build the script fine, but will not execute correctly if the dll is not in the assembly.

My problme now is that my package wont run the script from a job. it does run correctly in debug mode though. I find it difficult to debug because Arshad's code dont have any descriptive error output with it. I'm a SQL guy, not a c# programmer. Can anybody help out with some code for error debugging in this script?

Arshad - Great Job with this solution. This helped me out big time.


Tuesday, November 27, 2012 - 12:20:22 PM - Chris Back To Top (20567)

Hi, Arshad.  Thanks for posting the solution described above.  I have a question.  Does using the Microsoft.Office.Interop.Excel dll require Office to be installed on the server?  I thought that I saw a post about this.  Thanks.

Thanks...Chris


Monday, October 15, 2012 - 8:39:39 AM - jcidge Back To Top (19926)

Does anyone know, in a SSIS loop to read multiple Excel files, does the excel connection manager scan the first N rows of EACH file it opens to set the data types for each pass OR does it  use only on the first file it finds ?


Monday, October 8, 2012 - 2:17:40 PM - Megan Brooks Back To Top (19823)

I have been having to deal with these issues for a number of years. As I blogged back in 2009, my solution has been to hand-convert each Excel file to tab-delimited text and then read the data using a flat-file data source with suitable metadata settings. I avoided CSV because of the potential for incorrectly handling embedded commas.  CSV can work if the data is properly enclosed in quotation marks and the double-quote is specified as the text delimiter in the flat-file source, but as I recall Excel doesn't always write or read the data correctly. I don't remember the details now, but it seems like Import/Export operations were broken in this regard. Or maybe it was Open/Save As. Always test.

I hadn't thought about doing the Excel-to-text conversion right in the SSIS package. I don't have enough data volume for it to be a problem right now (most of my Excel data comes in once or twice a year, and there aren't that many different files), but if there is more in the future then I will look into it. Thank you!


Monday, October 8, 2012 - 6:12:23 AM - Marce Back To Top (19818)

Hey good man thank you a lot for this information. As i have seen you quite dominate rendering excel files in SSIS id like to ask you if you know how to password protect spreadsheets once created. The flow would be like this: grab the xls from an specific folder, open it, password protect it, close it. I have seen some ssis extensions suites but as far as i know there should be a way to put some code in the mpackage so that the processed file can be automized. i would endless appreciate this, Cheers!


Tuesday, October 2, 2012 - 12:33:49 PM - Russ Back To Top (19765)

I've been working with importing Excel files using SSIS and the interop .dll

Have you had any issues with hanging EXCEL.exe processes and if so, how have you handled them?

 

Thanks!


Tuesday, October 2, 2012 - 10:19:22 AM - Bill Back To Top (19763)

Good stuff, a reasonable work around that is better than changing a registry setting.















get free sql tips
agree to terms