Transformations in SQL Server Integration Services SSIS


By:
Overview

In this section, we build further upon the data flow created in the previous section. We will add an extra column using the Derived Column transformation and fetch extra data using the Lookup component.

SSIS Derived Column Transformation

The Derived Column transformation has the ability to either modify existing columns or add new columns to the buffer. You can open the editor by double-clicking the component.

derived column transformation in SSIS

In the editor, you can drag variables, parameters and existing columns to the expression. At the right, you have also a library of functions available. You can also choose if you want to replace existing columns or if you want to add a new column:

Derived Column Transformation Editor

Let’s add a new column that contains the date of today (using the GETDATE() function) and trim the existing CityName column. You can drag the column name and the Trim function to the editor:

create expression in SSIS

The Derived Column transformation is very powerful, but the one-line expression editor can be frustrating. Let’s add a multicast and a data viewer to inspect the results:

Dervived Column Output data Viewer

SSIS Lookup Component

With a lookup, we can match our current columns against a reference dataset and retrieve one or more columns if a match has been found. In our dataset, we have the StateProvinceID column. We are going to fetch the full name of the State/Province. Drag a Lookup transformation to the canvas and open the editor. In the first pane, there are a couple properties to set:

SSIS Lookup Component General Tab
  • Cache mode. This defines how the reference dataset is loaded. With full cache, the entire dataset is loaded into memory at the start of the data flow. This allows for very quick matching between the datasets. With partial cache, only a part of the dataset is loaded into memory. If there’s a cache miss, the data will be fetched from the database and put in the cache, possibly evicting older data. With no cache, nothing is loaded into memory and for every row a query needs to be sent to the database, which is quite slow.
  • Connection type. You can either choose a cache connection manager for when you when to pre-load your reference datasets and use them in multiple packages or data flows, or a regular OLE DB connection manager. The default is a standard OLE DB connection manager. Notice there’s no option to use ADO.NET or ODBC.
  • No match behavior. Here you specify how the lookup component should behave if no match was found for a row.
    • Ignore failure. The row is passed to the Match Output and the columns from the reference dataset get NULL values.
    • Redirect rows to error output. All rows without a match are sent to the error output (the red arrow).
    • Fail Component. The default, but a bit drastic. The data flow and package will stop if no match is found.
    • Redirect row to no match output. A new output is created where all rows without a match are sent to.

Let’s set this option to “Redirect row to no match output”. In the next tab, you need to define the reference dataset. You can use the dropdown box to select a table, but just as with the source, a SQL statement is preferred. Select only the columns you need to make the match and of course the columns you want to return. We can use this T-SQL statement:

SELECT
[StateProvinceID]
,[StateProvinceName]
FROM [Application].[StateProvinces];
SSIS Lookup Component Connection Tab

The last tab we need to edit is the Columns tab, where we specify how the matching will take place and which columns we want to return. You need to drag the key columns from the input columns to the key columns from the lookup columns. Then you can check each column from the lookup columns which you want returned; in our case the StateProvinceName.

SSIS Lookup Component Columns Tab

Close the editor. When we now attach a multicast to the lookup component, we can choose which output we want:

SSIS Lookup Component Input Output Selection

Let’s attach multicasts on both outputs, combined with data viewers so we can test the lookup component:

State Province Name highlighted in the Data Viewer

As you can see, the StateProvinceName column has been added to the buffer and there were no rows sent to the no match output.


Last Update: 8/31/2017




Comments For This Article




Friday, January 22, 2021 - 10:57:51 AM - soumya Back To Top (88084)
Thank you

Friday, January 22, 2021 - 9:40:28 AM - Koen Verbeeck Back To Top (88083)
Hi soumya,
yes, you can use the multicast for debugging. It can act as a sort of "destination", meaning you don't have to write rows to some system. And then you can add a data viewer to inspect the data.
But its original purpose is to duplicate streams :)
Koen

Thursday, January 21, 2021 - 6:07:05 AM - soumya Back To Top (88078)
I want to ask is this Multicast transformation that can be used for debugging and also does not allow data to follow to Destination.

Wednesday, November 22, 2017 - 10:15:12 AM - SLowry Back To Top (70093)

"Connection type. You can either choose a cache connection manager for when you when to pre-load your reference datasets and use them in multiple packages or data flows."

This needs rewording, as it doesn't make any sense at the moment, specifically "for when you when to pre-load" and it says "You can either choose" but only has 1 option.















get free sql tips
agree to terms