SSIS in One Hour: Hands-On Tutorial

By:   |   Updated: 2024-07-17   |   Comments   |   Related: More > Integration Services Development


Problem

Microsoft SQL Server Integration Services (SSIS) is a very important Extract, Transform, and Load (ETL) tool widely used in corporate data management and business intelligence. This ETL tool provides a powerful platform for coordinating and automating data workflows, including operational tasks, such as logging and scheduling, fault-tolerant error handling, support for big data sets scaling, and user-friendliness through its intuitive graphical interface. Built specifically for corporate data pipelines and analytics, SSIS gives data scientists the power to create, run, and monitor complicated processes of integrating information into one chain effectively.

Solution

The goal of this tutorial is to enable ETL developers to obtain practical knowledge to exploit SSIS in transforming and combining data through hands-on exploration to be ready for real-world challenges in managing information.

We'll examine and discuss some of the most essential components and functionalities. We will begin by reviewing the Microsoft SSIS interface and ways to navigate it. We will then address connection managers and set up connections to various data sources. Also, we will explore what makes control flow tasks different from data flow tasks and how they fit into data workflows.

In this demonstration, we'll import a CSV file into SQL Server, including derived column creation, conditional splitting, and the need for data conversion. Immediately after, we'll look at variables and expressions in SSIS for dynamic data processing. Additionally, we will have another demonstration where multiple CSV files are looped over until they reach SQL Server to show how well it handles iterative copies of data via SSIS.

Creating an SSIS Project

First, we must create a new "Integration Services Project" using Visual Studio. Make sure you have already installed the SSIS extension from Visual Studio Market Place:

Create an SSIS Project

Next, we will configure the project name and directory.

Configure Your Project

SSIS Development Interface

The integration services development interface will appear once the SSIS project is created using Visual Studio. As shown in the screenshot below, it is mainly composed of four areas:

  1. SSIS Toolbox
  2. Package Designer (five tabs)
    1. Control Flow
    2. Data Flow
    3. Parameters
    4. Event Handlers
    5. Package Explorer
  3. Connection Managers
  4. Properties

More tools and options are available. However, they are outside the scope of this beginner-level tutorial.

SSIS development interface

Connection Managers

Since data is the main component of any data integration and analysis solution, the first thing to do in the ETL package is to define all data connections using connection managers. SSIS supports a wide variety of connection managers. For this tutorial, we will use an OLE DB connection to connect to an SQL Server database and a Flat File connection to connect to CSV files.

We will start by adding a Flat File connection to read from a source CSV file. Right-click in the connection managers panel and click "New Flat File Connection…"

Side Note: If you are interested in checking all available connection managers, click on the "New Connection" menu item to select from an extended list.

Connection manager

The Flat File connection manager editor will appear. Let's first change the connection manager name to Source File.

flat file connection manager editor

Now, we need to specify the source file path by clicking on "Browse..." Once we select a file, SSIS tries to detect the file metadata: encoding, text qualifier, row delimiter, etc. Sometimes, we need to edit those values manually as SSIS may not detect them accurately. After selecting the file, a warning appears mentioning that "Columns are not defined…"

flat file connection manager editor

Click on the Columns tab to let the connection manager detect the existing columns tab based on the first rows in the flat file.

flat file connection manager editor

If these columns are not defined accurately or their data types need to be changed, they can be edited manually in the Advanced tab.

flat file connection manager editor

After defining the first data connection, click OK.

Now, let's define the SQL Server data connection. We need to create an OLE DB connection manager.

OLE DB connection

In the Connection Manager editor, specify the SQL Server instance, the authentication parameters, and the database name. Other configurations are available in the All tab.

connection manager

To rename the OLE DB connection manager, right-click on its icon in the connection manager area and click Rename.

rename source file

Adding Tasks from the SSIS Toolbox

After adding and configuring the connection managers, we need to design our package. SSIS supports a wide variety of tasks, which can be found in the SSIS toolbox along with a brief description of each task.

Let's begin by adding an "Execute SQL Task," which executes SQL statements using a connection manager that points to a relational data source.

SSIS toolbox

Double-click on the Execute SQL Task icon or drag and drop it within the control flow area. Next, double-click on the Execute SQL Task component in the control flow to open the editor.

Execute SQL Task Editor

In the Execute SQL Task Editor, set the connection type to "OLE DB" since we are using an OLE DB connection manager. Then, select the connection manager we previously created.

Next, add our SQL statement in the "SQLStatement" property, as shown below.

CREATE TABLE [InternationalSales] (
   [ID] INT IDENTITY(1,1) PRIMARY KEY,
   [ProductID] INT,
   [Date] Date,
   [Zip] varchar(50),
   [Units] INT,
   [Revenue] NUMERIC(18,6),
   [Country] varchar(50),
   [SourceFileName] nvarchar(4000))
Enter SQL Query

Click OK.

Execute SQL Task Editor

Click OK to finish configuring the task.

Now, to test this task, right-click on it and click "Execute Task" to execute it separately.

A green icon appears on the upper-right corner of the task to show that it was executed successfully.

task executed successfully

More details about the task execution can be seen under the Progress tab that appears after executing the task.

Progress of the task

Now, click the Stop button to exit debug mode.

Stop debugging

We can rename the task by right-clicking on it and clicking the "Rename" menu item.

rename task

Also, a task can be disabled by clicking on the "Disable" menu item.

disable task

Let's try to execute the SQL task for a second time. It fails, as shown below.

Failed task

To check the cause of the failure, you can navigate to the Progress bar and read the package log.

Progress bar to investigate cause of failure

Since the error message is long and truncated, right-click on the error line and click on "Copy Message Text."

Copy Message Text

Now, we can paste the error message into a new Notepad window to read the full error message.

Paste error message into Notepad

It looks like the task failed because the table was already created. To avoid this error, we can edit the SQL statement to handle this situation, i.e., create the table if it does not exist, and truncate it if it exists.

IF NOT EXISTS ( SELECT 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'dbo' AND TABLE_NAME = 'InternationalSales')
 CREATE TABLE [InternationalSales] (
    [ID] INT IDENTITY(1,1) PRIMARY KEY,
    [ProductID] INT,
    [Date] Date,
    [Zip] varchar(50),
    [Units] INT,
    [Revenue] NUMERIC(18,6),
    [Country] varchar(50),
    [SourceFileName] nvarchar(4000)
)
ELSE
 TRUNCATE TABLE [dbo].[InternationalSales]
Correction SQL statement

Working with Data Flows

Control Flow manages the workflow and execution order of tasks within an SSIS package. It supports executing command line operators, web services, XML tasks, file system operations, and more tasks. A data flow handles the extraction, transformation, and loading of data within an SSIS package. A package has only one control flow and can have multiple data flows.

For instance, we can create a destination table in the control flow and use a data flow to load the data into it.

Let's add a "Data Flow Task" from the SSIS toolbox.

Add Data Flow Task

Click on the "Execute SQL Task" and drag the arrow into the new "Data Flow Task."

Execute SQL task

This will make both tasks connected and execute sequentially. This connector is called a "Precedence Constraint." We can configure it by double-clicking on the connector.

Precedence constraint editor

For example, let's change the precedence constraint to completion. This will tell the package to execute the data flow task even if the precedent "Execute SQL Task" fails.

Precedence constraint editor

Connectors example with "completion" label.

Precedence constraint editor

Designing the Data Flow

After adding the data flow, double-click on the "Data Flow Task" component to navigate to the "Data Flow" area. Note that the SSIS toolbox changes at this level.

SSIS toolbox

To read data from the Flat File connection manager, we need to add a "Flat File Source" component and select the connection manager.

Flat File source editor

Then, click on the "Columns" tab to read and define the component metadata as well as configure the output columns; we can ignore some columns or assign others their aliases.

Flat File source editor

Moreover, we can add a column that contains the source file path. After closing the editor, right-click on the "Flat File Source" and click on "Show Advanced Editor…"

Show Advanced Editor

Navigate to the "Component Properties" tab. Assign a name to the "FileNameColumnName" property. In this tutorial, we will name it "SourceFileName."

Advanced Editor for Flat File Source

Now, when we recheck the input columns, we can see that a new column has been added.

Flat File Source Editor

Data Transformation

After reading the data from the flat file, we need to convert the columns' data types from text to the appropriate types. The search bar located at the top can be used to search SSIS toolbox items. First, add a "Data Conversion" component.

SSIS toolbox

Now, let's connect the "Flat File" component to it.

Connecting the flat file source to the data conversion component

Double-click to open the data conversion editor and check three columns:

  • ProductID (Output Alias: INT_ProductID, Type: DT_I4)
  • Units (Output Alias: INT_Units, Type: DT_I4)
  • Revenue (Output Alias: NUM_Revenue, Type: DT_NUMERIC, Precision: 18, Scale: 6)
Data Conversation Transformation Editor

Now, let's add a "Derived Column" component to apply some expression to the data.

SSIS Toolbox: Derived Column

Let's connect the output of the Data Conversion to the Derived Column component.

Connecting Data Conversion output to the Derived Column component

We need to add two derived columns transformation:

  1. Add a new column DT_DATE using the following expression: (DT_DBDATE)[Date]
  2. Replace the "SourceFileName" column with the following expression: RIGHT(SourceFileName,FINDSTRING(REVERSE(SourceFileName),"\\",1) - 1)
Add Derived Column

Now, let's add a "Conditional Split" component to ignore all records having a date before "01 January 2020."

SSIS Toolbox: Conditional Split

We need to connect the derived column output to the conditional split component input.

Connecting derived column output to conditional split component input

Double-click to open the conditional split editor. We need to define our filter condition: YEAR([DT_DATE]) < 2020.

Define the filter condition

We need to edit the output names. The filter we added will be named "Old records," and the default output name will be changed to "New records" since it will output the records that do not match the filters defined in the conditional split component.

Finally, let's add an "OLE DB Destination" component to insert the data into the destination table.

SSIS Toolbox: OLE DB Destination

Once we connect the conditional split output to the OLE DB destination, a prompt will appear to select the component output that will be linked to the OLE DB destination component. In this case, select "New records."

Input Output Selection

Note that after connecting the components, we can still use the other output, "Old records," and link it to another component, if needed.

Can Use Old records

Double-click the OLE DB destination component to select the connection manager and destination table, as well as configure the data load options.

OLE DB Destination Editor

Next, configure the column mapping. Navigate to the "Mappings" tab and map the input columns to the SQL destination columns, as shown in the image below.

OLE DB Destination Editor

Adding Data Viewers

To monitor the data during the package execution, double-click on any data flow path (connectors) and enable the data viewer to select the columns to be monitored.

Data Viewer

After executing the package, a window appears showing the data. Click on the Play button to continue fetching the data or the Detach button to stop monitoring the data and continue the package execution.

Data Viewer
Package execution

After executing the package, check the "Progress" tab to read the package execution log.

Package execution log

Inserting Multiple Flat Files into the Same Table

Consider that we need to loop over several flat files stored within a directory with the same structure. Go back to the control flow and add a "ForEach Loop Container." Move the data flow task into the container (as shown in the screenshot below) and connect the execute SQL task to the container instead of the data flow task.

Inserting Multiple Flat Files into the Same Table

Working with Variables and Expressions

Right-click in the control flow area and click on "Variables."

Working with Variables and Expressions

SSIS variables store values that can be used and updated during the execution of an SSIS package, allowing for dynamic control and configuration of tasks and data flows. In the Variables windows, click on "Add variable."

Add variable

Add a new string variable named SourceFilePath and set the default value to the "Australia.csv" file.

Add variable

Now, double-click on the ForEach Loop container. Change the Enumerator type to "Foreach File Enumerator." Select the Source directory and change the Files filter to *.csv to only read the CSV files. Make sure that file names are retrieved in a fully qualified format (full path).

ssis

Next, go to the Variable Mappings tab and map the file name retrieved by each loop to the "SourceFilePath" variable we created previously. This allows the full path of each file to be read using this variable on each iteration.

Foreach Loop Editor

Now, we should configure the flat file connection manager to read the input files. Click on the "Source File" connection manager. In the Properties tab, click on the button in the Expressions property.

ssis

Add a new expression to the "ConnectionString" property as follows: @[User::SourceFilePath]

Property Expression Editor

This will make the flat file connection manager link to the file path stored within the SourceFilePath variable.

Let's try to execute the package. Note that the data flow task is executed several times.

Package execution

After the package execution finishes, go to the Progress tab and note that several files were manipulated.

Package execution

Checking the Package Structure

To check how the SSIS package is structured, click on the "Package Explorer" tab to see the package tasks and components tree.

Package Explorer

Finally, we can run the following query from SSMS to check how many records were imported from each file into the destination table.

SELECT [SourceFileName], COUNT(*) as RecordCount
  FROM [Test].[dbo].[InternationalSales]
  GROUP BY [SourceFileName]

We can note that six files were mentioned in the table.

Results of query in SSMS
Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Hadi Fadlallah Hadi Fadlallah is an SQL Server professional with more than 10 years of experience. His main expertise is in data integration. He's one of the top ETL and SQL Server Integration Services contributors at Stackoverflow.com. He holds a Ph.D. in data science focusing on context-aware big data quality and two master's degrees in computer science and business computing.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2024-07-17

Comments For This Article

















get free sql tips
agree to terms