By: Nat Sundar | Updated: 2016-11-08 | Comments | Related: More > Integration Services Development
Problem
I read the previous tip to implement metadata driven SSIS development, but how do I extend the functionality to implement SSIS data flow transformations?
Solution
During a transformation, a set of business/technical rules are applied to the data, so that we can load the data to the target tables. Data cleansing is a key step in the transformation process.
Some examples of transformations are:
- Excluding NULL values for key columns
- Lookup against dimension tables to get surrogate keys
- Derive new column values based on business rules
- Aggregating data based on columns
- Deduplication, ordering, data enrichment, etc.
In this tip, we will learn and practice the concept of metadata management for data transformations.
There are many transformations which exists in SSIS to solve various business problems. We are going to take two transformations as an example to understand the concept. Once you are familiar with the concept you can extend the design to accommodate other transformations. The two methods we will look at are:
- Derived Column Transformations
- Lookup Transformations
Building an Example Model
In this example we are going to add a derived column transformation to the data flow task. Lets have a quick look at the database design to accommodate this transformation.
Prerequisites
It is assumed that you have already reviewed the previous tip about building a metadata model for a data warehouse. The database script from the previous tip becomes the basis for this tip.
Database Design and Stored Procedures
The below script will enhance the metadata model by creating tables for the data transformation.
USE [MetadataModel] GO create proc [dbo].[ListLookupHeaderForCustomer] as Select LTH.TransformationShortName, LTrim(RTrim(DP.Name)) as ConnectionName, LTH.SourceQuery from [dbo].[LookupTransformationHeader] LTH INNER JOIN dbo.DataPackage DP on DP.DataPackageID = LTH.SourceDataPackageID GO create proc [dbo].[ListLookupInputColumnDetails] as Select LTH.TransformationShortName,LTD.ColumnName from [dbo].[LookupTransformationHeader] LTH INNER JOIN [dbo].[LookupTransformationDetail] LTD on LTH.LookupTransformationHeaderID = LTD.LookupTransformationHeaderID and LTD.Type='IN' GO create proc [dbo].[ListLookupOutputColumnDetails] as Select LTH.TransformationShortName,LTD.ColumnName from [dbo].[LookupTransformationHeader] LTH INNER JOIN [dbo].[LookupTransformationDetail] LTD on LTH.LookupTransformationHeaderID = LTD.LookupTransformationHeaderID and LTD.Type='OUT' GO Create proc [dbo].[ListTransformationForCustomerCopy] as SELECT DTH.TransformationShortName,DTD.DerivedColumnName,DTD.Expression, DTD.DataType,DTD.[DataLength],DTD.ReplaceExisting FROM [dbo].[TransformationStep] TS INNER JOIN [dbo].[DerivedColumnTransformationHeader] DTH on TS.TransformationStepID = DTH.DerivedTransformationHeaderID INNER JOIN [dbo].[DerivedColumnTransformationDetail] DTD on DTH.DerivedTransformationHeaderID = DTD.DerivedColumnTransformationHeaderID GO CREATE TABLE [dbo].[DerivedColumnTransformationDetail]( [DerivedColumnTransformationHeaderID] [int] NOT NULL, [DerivedColumnTransformationDetailID] [int] NOT NULL, [DerivedColumnName] [nvarchar](250) NOT NULL, [DerivedColumnType] [nvarchar](250) NOT NULL, [Expression] [nvarchar](200) NOT NULL, [DataType] [nvarchar](50) NOT NULL, [DataLength] [nvarchar](50) NOT NULL, [Precision] [nvarchar](50) NOT NULL, [Scale] [nvarchar](50) NOT NULL, [CodePage] [nvarchar](50) NOT NULL, [ReplaceExisting] [nvarchar](50) NULL, CONSTRAINT [PK_TransformationDerivedColumn] PRIMARY KEY CLUSTERED ( [DerivedColumnTransformationDetailID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO CREATE TABLE [dbo].[DerivedColumnTransformationHeader]( [DerivedTransformationHeaderID] [int] IDENTITY(1,1) NOT NULL, [TransformationShortName] [nvarchar](50) NOT NULL, [TransformationComments] [nvarchar](200) NULL, CONSTRAINT [PK_DerivedColumnTransformationHeader] PRIMARY KEY CLUSTERED ( [DerivedTransformationHeaderID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO CREATE TABLE [dbo].[LookupTransformationDetail]( [LookupTransformationDetailID] [int] NOT NULL, [LookupTransformationHeaderID] [int] NOT NULL, [ColumnName] [nvarchar](50) NOT NULL, [DataType] [nvarchar](50) NOT NULL, [DataLength] [nvarchar](50) NOT NULL, [Type] [nchar](10) NOT NULL, CONSTRAINT [PK_LookupTransformationDetail] PRIMARY KEY CLUSTERED ( [LookupTransformationDetailID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO CREATE TABLE [dbo].[LookupTransformationHeader]( [LookupTransformationHeaderID] [int] NOT NULL, [TransformationShortName] [nvarchar](250) NOT NULL, [SourceDataPackageID] [int] NOT NULL, [NoMatchConfiguration] [nvarchar](250) NOT NULL, [SourceQuery] [nvarchar](4000) NOT NULL, CONSTRAINT [PK_LookupTransformationHeader] PRIMARY KEY CLUSTERED ( [LookupTransformationHeaderID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO CREATE TABLE [dbo].[TransformationStep]( [MappingID] [int] NOT NULL, [TransformationStepName] [nvarchar](50) NOT NULL, [TransformationType] [nvarchar](50) NOT NULL, [TransformationSequenceID] [int] NULL, [TransformationStepID] [int] IDENTITY(1,1) NOT NULL, [TransformationID] [int] NULL, CONSTRAINT [PK_TransformationStep_1] PRIMARY KEY CLUSTERED ( [TransformationStepID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO ALTER TABLE [dbo].[DerivedColumnTransformationDetail] WITH CHECK ADD CONSTRAINT [FK_DerivedColumnTransformationDetail_DerivedColumnTransformationHeader] FOREIGN KEY([DerivedColumnTransformationHeaderID]) REFERENCES [dbo].[DerivedColumnTransformationHeader] ([DerivedTransformationHeaderID]) GO ALTER TABLE [dbo].[DerivedColumnTransformationDetail] CHECK CONSTRAINT [FK_DerivedColumnTransformationDetail_DerivedColumnTransformationHeader] GO
Relationships between transformation tables can be seen below.
Table Explanation
The above script should have created the transformation tables. Usage of these tables is provided below.
TransformationStep Table
This table stores the transformation details. If there are "n" transformations between the source and target, it is expected to have all "n" transformations in this table. Only a basic definition of the transformation will be available in this table. The detailed definition of the transformation will be stored in transformation specific tables.
This table stores information about:
- MappingID Unique key to identify the data flow
- TransformationType It defines the type of transformation (example DER for derived column transformation)
- TransformationSequenceID It helps to arrange multiple transformations in numeric order
DerivedColumnTransformationHeader Table
This is a DerivedColumnTransformation specific table and it stores high level information.
This table stores information about:
- TransformationShortName Transformation name that can be used within the data flow
- TransformationComments Detailed comments about the transformation
- TransformationHeaderID Surrogate key to uniquely identify a transformation
DerivedColumnTransformationDetail Table
This table stores column level details for the transformation.
This table stores information about:
- TransformationHeaderID Help to link the transformationHeader
- TransformationDetaiID Surrogate key
- DerivedColumnName
- Expression Actual expression used in the Derived column transformation
- DataType, DataLength Data type related details
- ReplaceExisting Flag to indicate whether the given expression is to replace existing column
ListTransformationForCustomerCopy Stored Procedure
This stored procedure will join the TransformationStep, DerivedColumnTransformationHeader and DerivedColumnTransformationDetail tables to return the dataset to build the derived column transformation.
Load Sample Data
Let's insert some data into the tables.
USE [MetadataModel] GO INSERT INTO [dbo].[TransformationStep] ([MappingID] ,[TransformationStepName] ,[TransformationType] ,[TransformationSequenceID] ,[TransformationID]) VALUES (1 ,'Derive New Column' ,'DER' ,1 ,1) GO INSERT INTO [dbo].[DerivedColumnTransformationHeader] ([TransformationShortName] ,[TransformationComments]) VALUES ('DE - Upper Case' ,'Convert Name to Upper Case') GO INSERT INTO [dbo].[DerivedColumnTransformationDetail] ([DerivedColumnTransformationHeaderID] ,[DerivedColumnTransformationDetailID] ,[DerivedColumnName] ,[DerivedColumnType] ,[Expression] ,[DataType] ,[DataLength] ,[Precision] ,[Scale] ,[CodePage] ,[ReplaceExisting]) VALUES (1 ,1 ,'Name' ,'DER' ,'UPPER(NAME)' ,'String' ,'100' ,'NA' ,'NA' ,'NA' ,'true') GO INSERT INTO [dbo].[LookupTransformationHeader] ([LookupTransformationHeaderID] ,[TransformationShortName] ,[SourceDataPackageID] ,[NoMatchConfiguration] ,[SourceQuery]) VALUES (1 ,'LKP - Phonenumber' ,1 ,'NA' ,'Select Name,PhoneNumber from dbo.CustomerLookup') GO INSERT INTO [dbo].[LookupTransformationDetail] ([LookupTransformationDetailID] ,[LookupTransformationHeaderID] ,[ColumnName] ,[DataType] ,[DataLength] ,[Type]) VALUES (1 ,1 ,'Name' ,'String' ,'250' ,'IN'), (2 ,1 ,'PhoneNumber' ,'String' ,'250' ,'OUT') GO
Derived Column Transformation
The stored procedure ListTransformationForCustomerCopy will be called from the BIMLScript to get the Derived Column Transformation details for the CustomerCopy data flow task. The returned dataset will be stored in a DataTable. As we have only one derived column expression, we can make use of the first retuned row to build the transformation.
The below BIML script will help us generate an SSIS package based on the configuration in the Metadata Repository.
<#@ import namespace="System.Data" #> <#@ import namespace="System.Data.SqlClient" #> <#@ template language="C#" tier="1" #> <Biml xmlns="http://schemas.varigence.com/biml.xsd"> <Connections> <# string strConnection = @"Data Source=localhost\SQL2012;Initial catalog=MetadataModel; Integrated security=SSPI;"; string connectionString =""; DataTable dtConnection = new DataTable(); SqlDataAdapter sqldaStaging = new SqlDataAdapter("exec [dbo].[ListconnectionDetails];", strConnection); sqldaStaging.Fill(dtConnection); DataRow rwConnection = dtConnection.Rows[0]; DataTable dtTableMapping = new DataTable(); SqlDataAdapter sqldaTableMapping = new SqlDataAdapter("exec [dbo].[ListCustomerCopyMapping];", strConnection); sqldaTableMapping.Fill(dtTableMapping); DataRow rwTableMapping = dtTableMapping.Rows[0]; DataTable dtTransformation = new DataTable(); SqlDataAdapter sqldaTransformation = new SqlDataAdapter("exec [dbo].[ListTransformationForCustomerCopy];", strConnection); sqldaTransformation.Fill(dtTransformation); DataRow rwTransformation = dtTransformation.Rows[0]; #> <Connection Name="<#=rwConnection["ConnectionName"]#>" ConnectionString="<#=rwConnection["ConnectionString"]#>" /> </Connections> <Packages> <Package Name="staging" ConstraintMode="Linear"> <Tasks> <ExecuteSQL Name="Truncate Table" ConnectionName="<#=rwConnection["ConnectionName"]#>"> <DirectInput> Truncate table dbo.customer_copy </DirectInput> </ExecuteSQL> <Dataflow Name="DFT Copy Customer Data"> <Transformations> <OleDbSource Name="Source" ConnectionName="<#=rwTableMapping["ConnectionName"]#>" > <DirectInput>SELECT * FROM <#=rwTableMapping["SourceTableName"]#></DirectInput> </OleDbSource> <DerivedColumns Name="<#=rwTransformation["TransformationShortName"]#>"> <Columns> <Column ReplaceExisting="<#=rwTransformation["ReplaceExisting"]#>" Name="<#=rwTransformation["DerivedColumnName"]#>" DataType="<#=rwTransformation["DataType"]#>" Length="<#=rwTransformation["DataLength"]#>" > <#=rwTransformation["Expression"]#> </Column> </Columns> </DerivedColumns> <OleDbDestination Name="Target" ConnectionName="<#=rwTableMapping["ConnectionName"]#>"> <ExternalTableOutput Table="<#=rwTableMapping["DestinationTableName"]#>"/> </OleDbDestination> </Transformations> </Dataflow> </Tasks> </Package> </Packages> </Biml> </Biml>
Similar design approaches can be followed for other row by row transformations.
Lookup Transformation
Now we can apply the same logic to generate a Lookup transformation in SSIS. The lookup transformation is different from other transformations in that we need additional tables to store information associated with the lookup transformation.
LookupTransformationHeader Table
This table stores high level information about lookup transformation.
This table stores information about:
- LookupTransformationHeaderID Surrogate key for the transformation header
- TransformationShortName Name used in the data flow
- SourceDataPackageID Used to derive the connection manager
- NoMatchConfiguration Value to define in case of no matching records returned
- Source Query SQL Query used to get the lookup data
LookupTransformationDetail Table
This table stores column level details for the transformation.
This table stores information about:
- LookupTransformationDetailID Surrogate Key for the detail table
- LookupTransformationHeaderID Helps to link the header table
- Column Name
- DataType
- DataLength
- Type Input or Output column
The below BIML script will help us build the transformation components from the Metadata Repository.
<#@ import namespace="System.Data" #> <#@ import namespace="System.Data.SqlClient" #> <#@ template language="C#" tier="1" #> <Biml xmlns="http://schemas.varigence.com/biml.xsd"> <Connections> <# string strConnection = @"Data Source=localhost\SQL2012;Initial catalog=MetadataModel; Integrated security=SSPI;"; string connectionString =""; DataTable dtConnection = new DataTable(); SqlDataAdapter sqldaStaging = new SqlDataAdapter("exec [dbo].[ListconnectionDetails];", strConnection); sqldaStaging.Fill(dtConnection); DataRow rwConnection = dtConnection.Rows[0]; DataTable dtTableMapping = new DataTable(); SqlDataAdapter sqldaTableMapping = new SqlDataAdapter("exec [dbo].[ListCustomerCopyMapping];", strConnection); sqldaTableMapping.Fill(dtTableMapping); DataRow rwTableMapping = dtTableMapping.Rows[0]; DataTable dtTransformation = new DataTable(); SqlDataAdapter sqldaTransformation = new SqlDataAdapter("exec [dbo].[ListLookupHeaderForCustomer];", strConnection); sqldaTransformation.Fill(dtTransformation); DataRow rwTransformation = dtTransformation.Rows[0]; DataTable dtLKPTransformationInput = new DataTable(); SqlDataAdapter sqldaLKPTransformationInput = new SqlDataAdapter("exec [dbo].[ListLookupInputColumnDetails];", strConnection); sqldaLKPTransformationInput.Fill(dtLKPTransformationInput); DataRow rwLKPTransformationInput = dtLKPTransformationInput.Rows[0]; DataTable dtLKPTransformationOutput = new DataTable(); SqlDataAdapter sqldaLKPTransformationOutput = new SqlDataAdapter("exec [dbo].[ListLookupOutputColumnDetails];", strConnection); sqldaLKPTransformationOutput.Fill(dtLKPTransformationOutput); DataRow rwLKPTransformationOutput = dtLKPTransformationOutput.Rows[0]; #> <Connection Name="<#=rwConnection["ConnectionName"]#>" ConnectionString="<#=rwConnection["ConnectionString"]#>" /> </Connections> <Packages> <Package Name="staging" ConstraintMode="Linear"> <Tasks> <ExecuteSQL Name="Truncate Table" ConnectionName="<#=rwConnection["ConnectionName"]#>"> <DirectInput> Truncate table dbo.customer_copy </DirectInput> </ExecuteSQL> <Dataflow Name="DFT Copy Customer Data"> <Transformations> <OleDbSource Name="Source" ConnectionName="<#=rwTableMapping["ConnectionName"]#>" > <DirectInput>SELECT * FROM <#=rwTableMapping["SourceTableName"]#></DirectInput> </OleDbSource> <Lookup Name="<#=rwTransformation["TransformationShortName"]#>" OleDbConnectionName="<#=rwTransformation["ConnectionName"]#>" > <DirectInput> <#=rwTransformation["SourceQuery"]#> </DirectInput> <Inputs> <Column SourceColumn="<#=rwLKPTransformationInput["ColumnName"]#>" /> </Inputs> <Outputs> <Column SourceColumn="<#=rwLKPTransformationOutput["ColumnName"]#>" /> </Outputs> </Lookup> <OleDbDestination Name="Target" ConnectionName="<#=rwTableMapping["ConnectionName"]#>"> <ExternalTableOutput Table="<#=rwTableMapping["DestinationTableName"]#>"/> </OleDbDestination> </Transformations> </Dataflow> </Tasks> </Package> </Packages> </Biml>
Scope For Further Improvement
In this example, we have seen only one transformation for a data flow task, but in business scenarios we need multiple transformations to load data to target tables. The TransformationStep table plays a pivotal role by connecting the Mapping (Data flow) to all the transformations. The sequenceID in the TransformationStep table will help to sort the transformations based on the need.
A new stored procedure will need to be created to list out all the transformations for a given data flow task. This stored procedure will be called from the BIML script to generate the BIML code snippet based on the transformation.
Summary
This metadata model for transformation logic can be extended for other transformations such as Union All, Merge Join, Merge, Sort, etc. Based on the above examples we can see the metadata driven development can be accelerated with the help of BIMLScript.
Next Steps
- Read more about metadata here
- Read metadata from Kimball book here
- Read Data Warehouse Metadata here
- Metadata Driven SQL Server Integration Services ETL Development Using BIML
- Check out all of the SQL Server Business Intelligence resources on MSSQLTips.com
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2016-11-08