Metadata Driven SQL Server Integration Services ETL Development Using BIML

By:   |   Updated: 2016-10-26   |   Comments   |   Related: More > Integration Services Development


Problem

BIML is a great way to generate SQL Server Integration Services (SSIS) packages from code.  How can I use my technical metadata repository to generate SSIS packages using BIML?

Solution

Metadata means "data about data" and describes the relevant aspect of the data.  Metadata can be classified in three ways:

  • Business Metadata
  • Operational Metadata
  • Technical Metadata

These are high level benefits for using metadata for SSIS package creation for an enterprise:

  • No need to develop a SSIS package. The SSIS package can be generated based on the configuration.
  • Highly transparent business rules. Rules can be modified easily by changing the values in the metadata tables.
  • Less dependency on the development team.

Technical Metadata

Technical metadata stores information about ETL processes. It contains details about data mapping and transformations from source to target in a data system.

This metadata is used by applications to generate ETL solutions dynamically. In addition, it helps the ETL Modelers, Developers and Analysts understand the control flow and data flow for the data load.

Most commercial ETL applications provide a metadata repository with an integrated metadata management system to manage the ETL process definition. The definition of technical metadata is usually more complex than the business metadata and it sometimes involves multiple dependencies.

Technical metadata contains the definition of the following:

  • Source Database / Source System Definition -  Can be a relational database, third party system, file system, etc.
  • Target Database - Usually a Data Warehouse Database
  • Source Tables
  • Source Columns
  • Target Tables
  • Target Columns
  • Transformation

In this tip, we will learn and practice using technical metadata for ETL development.

Setup Metadata for Examples

The below script will help us setup a simple metadata model for ETL development.

USE [master];
GO

IF EXISTS (SELECT name FROM sys.databases WHERE name = N'Staging')
DROP DATABASE Staging;
GO
IF NOT EXISTS (SELECT name FROM sys.databases WHERE name = N'Staging')
CREATE DATABASE Staging;
GO
USE Staging;
GO

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Customer]') 
AND type IN (N'U'))
Drop table [dbo].[Customer]

CREATE TABLE [dbo].[Customer]( [Customer_ID] [int] NOT NULL, [Name] [nvarchar](250) NOT NULL, 
[Address] [nvarchar](250) NOT NULL, 
CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED ( [Customer_ID] ASC )) 
Go

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Customer_Copy]') 
AND type IN (N'U'))
Drop table [dbo].[Customer_Copy]
CREATE TABLE [dbo].[Customer_Copy]( [Customer_ID] [int] NOT NULL, [Name] [nvarchar](250) NOT NULL, 
[Address] [nvarchar](250) NOT NULL, CONSTRAINT [PK_Customer_Copy] 
PRIMARY KEY CLUSTERED ( [Customer_ID] ASC )) 
GO


IF EXISTS (SELECT name FROM sys.databases WHERE name = N'MetadataModel')
DROP DATABASE MetadataModel;
GO
IF NOT EXISTS (SELECT name FROM sys.databases WHERE name = N'MetadataModel')
CREATE DATABASE MetadataModel;
GO
USE MetadataModel;
GO

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[ListConnectionDetails]') 
AND type IN (N'P', N'PC'))
Drop Procedure dbo.[ListConnectionDetails]
go

CREATE procedure [dbo].[ListConnectionDetails] as 
Select RDP.ServerName,RDP.DatabaseName,
LTrim(RTrim(DP.Name)) as ConnectionName, 
'Data Source=' + RDP.ServerName + 
';Persist Security Info=true;Integrated Security=SSPI;Provider=SQLNCLI11.1;Initial Catalog=' 
+RDP.DatabaseName as ConnectionString 
from DataPackage DP 
INNER JOIN RelationalDataPackage RDP 
on DP.DataPackageID = RDP.DataPackageID 
Where [Type]='REL' 
GO 

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[ListCustomerCopyMapping]') 
AND type IN (N'P', N'PC'))
Drop Procedure dbo.[ListCustomerCopyMapping]
go

CREATE procedure [dbo].[ListCustomerCopyMapping] as 
Select 
MAP.MappingComments as DataFlowName, 
RDGS.TableName as SourceTableName, 
RDGD. TableName as DestinationTableName, 
RTrim(LTrim(DP.Name)) as ConnectionName 
from [dbo].[SourceToTargetMapping] MAP 
INNER JOIN dbo.DataGroup DGS 
on MAP.SourceDataGroupID = DGS.DataGroupIDI and 
Upper(DGS.[Type])='TABLE' 
INNER JOIN dbo.RelationalDataGroup RDGS 
on DGS.DataGroupID = RDGS.DataGroupID 
INNER JOIN dbo.DataGroup DGD 
on MAP.DestinationDataGrouplD = DGD.DataGroupID and 
Upper(DGD.[Type])='TABLE' 
INNER JOIN dbo.RelationalDataGroup RDGD 
on DGD.DataGroupID = RDGD.DataGroupID 
INNER JOIN dbo.DataPackage DP 
on DGS.ContainedDataPackageID = DP.DataPackageID 
Where MAP.[MappingName] ='CustomerDataCopy' 
GO 


IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[DataGroup]') 
AND type IN (N'U'))

Drop table [dbo].[DataGroup]
CREATE TABLE [dbo].[DataGroup]( [DataGroupID] [int] IDENTITY(1,1) NOT NULL, 
[Type] [nvarchar](50) NOT NULL, [Description] [nvarchar](50) NOT NULL, 
[ContainedDataPackageID] [int] NOT NULL, 
CONSTRAINT [PK_DataGroup] PRIMARY KEY CLUSTERED ( [DataGroupID] ASC )) ON [PRIMARY] 
GO 

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[DataPackage]') 
AND type IN (N'U'))
Drop table [dbo].[DataPackage]
CREATE TABLE DataPackage ([DataPackageID] [int] IDENTITY(1,1) NOT NULL, 
[Type] [nvarchar](50) NOT NULL, [Description] [nvarchar](50) NOT NULL,[Name] [nvarchar](50) NULL, 
CONSTRAINT [PK_DataPackage] PRIMARY KEY CLUSTERED ([DataPackageID] ASC )) ON [PRIMARY] 
GO

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[RelationalDataGroup]') 
AND type IN (N'U'))
Drop table [dbo].[RelationalDataGroup]
CREATE TABLE [dbo].[RelationalDataGroup]( [TableName] [nvarchar](50) NOT NULL, 
[DataGroupID] [int] NOT NULL, [Usage] [nvarchar](50) NULL ) ON [PRIMARY] 
GO

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[RelationalDataPackage]') 
AND type IN (N'U'))
Drop table [dbo].[RelationalDataPackage]
CREATE TABLE [dbo].[RelationalDataPackage]( [DataPackageID] [int] NOT NULL, 
[ServerName] [nvarchar](50) NOT NULL, [DatabaseName] [nvarchar](50) NOT NULL ) 
ON [PRIMARY] 
GO

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[SourceToTargetMapping]') 
AND type IN (N'U'))
Drop table [dbo].[SourceToTargetMapping]
CREATE TABLE [dbo].[SourceToTargetMapping]( [SourceDataGrouplD] [int] NOT NULL, 
[DestinationDataGroupID] [int] NOT NULL, [MappingComments] [nvarchar](200) NOT NULL, 
[MappingName] [nvarchar](50) NULL, 
CONSTRAINT [PK_SourceToTargetMapping] PRIMARY KEY CLUSTERED 
([SourceDataGrouplD] ASC, [DestinationDataGroupID] ASC )) ON [PRIMARY] 
GO

ALTER TABLE [dbo].[DataGroup] WITH CHECK ADD CONSTRAINT [FK_DataGroup_DataPackage] 
FOREIGN KEY([ContainedDataPackageID]) REFERENCES [dbo].[DataPackage] ([DataPackageID]) 
GO

ALTER TABLE [dbo].[DataGroup] CHECK CONSTRAINT [FK_DataGroup_DataPackage] 
GO

ALTER TABLE [dbo].[RelationalDataGroup] WITH CHECK ADD CONSTRAINT [FK_RelationalDataGroup_DataGroup]
FOREIGN KEY([DataGroupID]) REFERENCES [dbo].[DataGroup] ([DataGroupID]) 
GO


ALTER TABLE [dbo].[RelationalDataGroup] CHECK CONSTRAINT [FK_RelationalDataGroup_DataGroup] 
GO

ALTER TABLE [dbo].[RelationalDataPackage] WITH CHECK 
ADD CONSTRAINT [FK_RelationalDataPackage_DataPackage] FOREIGN KEY(DataPackageID) REFERENCES 
[dbo].[DataPackage] ([DataPackageID]) 
GO

ALTER TABLE [dbo].[RelationalDataPackage] CHECK CONSTRAINT [FK_RelationalDataPackage_DataPackage] 
GO

The relationships between the tables can be seen below.

Meta data tables ERD

Let's insert some data into the tables.

USE [MetadataModel]
GO

INSERT INTO [dbo].[DataGroup]
([Type]
,[Description]
,[ContainedDataPackageID])
VALUES
('Table',
'Customer Staging Table',
1),
('Table',
'Customer Copy Staging Table',
1)

GO
INSERT INTO [dbo].[RelationalDataGroup]
([TableName]
,[DataGroupID]
,[Usage])
VALUES
('dbo.Customer'
,1
,'SRC'),
('dbo.Customer_Copy'
,2
,'DST')
GO
INSERT INTO [dbo].[DataPackage]
([Type]
,[Description]
,[Name])
VALUES
('REL'
,'Development Server'
,'STAGING_SERVER')
GO

INSERT INTO [dbo].[RelationalDataPackage]
([DataPackageID]
,[ServerName]
,[DatabaseName])
VALUES
(1
,'localhost\SQL2012'
,'Staging')
GO

INSERT INTO [dbo].[SourceToTargetMapping]
([SourceDataGroupID]
,[DestinationDataGroupID]
,[MappingComments]
,[MappingName])
VALUES
(1
,2
,'Copy Customer Data'
,'CustomerDataCopy')

go

Understanding the Metadata Tables

Below are the tables that were created and their usage.

DataPackage

This table stores the information about the Source and Destination systems at a very high level. It corresponds to systems such as relational database, flat files, spreadsheets and messaging. The type column differentiates the nature of the system. The type can be relational (REL), flat file, Excel or a third party system.

RelationalDataPackage

This table holds the details about the relational database for a data package. Server Name and Database details will help us create connection managers.

DataGroup

The data group is the lowest level grouping of data and corresponds to the relational table. The relationship between data package and data group is one to many. This hierarchy (Data Package/Data Group) allows the organization to process business transactions. The data group can also represent a file or a sheet in a spreadsheet. The type column differentiates types of data groups available within a data package.

RelationalDataGroup

The relational data group table stores information about the element of data group if the type is relational. This table contains the names of the actual tables for a data group.

SourceToTargetMapping

This table stores the information about the source to destination mapping. This mapping details will be used within the Data Flow.

Metadata Configuration

To understand the effective usage of metadata, let’s say we need to copy the data from a source to a target table. For this purpose, I have created two tables Customer and Customer_Copy. We have the data in the Customer table and would like to copy the data to Customer_Copy table.


To enable us to achieve this requirement, we have loaded configuration data for these tables (DataPackage, RelationalDataPackage, DataGroup, RelationalDataGroup and SourceToTargetMapping).

In addition, two stored procedures have been created to return the dataset to the BIML script.

ListConnectionDetails

This stored procedure will combine the information from the DataPackage and RelationalDataPackage tables. The resultant dataset will be used to create the connection managers dynamically.

ListCustomerCopyMapping

This stored procedure will return the source to destination table mapping details for a specific mapping name. This will be used within the Dataflow task to map the source to destination tables.

Building ETL Package from Metadata

As we store all the information about the ETL in the metadata database, we can make use of the details to generate a SSIS package.

Connection Managers

The stored procedure ListConnectionDetails will be called from the BIML script to get the list of systems configured in the metadata repository. The returned dataset will be stored in a DataTable. As we have only one connection manager, we can make use of the first retuned row to build the connection manager. If we have more than one DataPackage, then we can make use of the foreach loop to build multiple connection managers. The below mentioned BIML script will help us generate a SSIS package based on the configuration in the metadata repository.

<#@ import namespace="System.Data" #> 
<#@ import namespace="System.Data.SqlClient" #>
<#@ template language="C#" tier="1" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<#
 string strConnection = @"Data Source=localhost\SQL2012;Initial catalog=MetadataModel; Integrated security=SSPI;"; 
 string connectionString =""; 
 DataTable dtConnection = new DataTable(); 
 SqlDataAdapter sqldaStaging = new SqlDataAdapter("exec [dbo].[ListconnectionDetails];", strConnection); 
 sqldaStaging.Fill(dtConnection); 
 DataRow rwConnection = dtConnection.Rows[0];
#>
 <Connection Name="<#=rwConnection["ConnectionName"]#>" ConnectionString="<#=rwConnection["ConnectionString"]#>" />
</Connections>
<Packages>
 <Package Name="staging" ConstraintMode="Linear">
  <Tasks>
   <ExecuteSQL Name="Truncate Table" ConnectionName="<#=rwConnection["ConnectionName"]#>"> 
   <DirectInput> Truncate table dbo.customer_copy </DirectInput> 
   </ExecuteSQL>
  </Tasks>
 </Package>
</Packages>
</Biml>

This is the SSIS package that is generated.

BIML generated Connection Manager and Control Flow in SSIS

Generating Data Flow Based on Metadata Repository

Now we can apply the same logic to generate the Data Flow. Let’s make use of the DataGroup, RelationalDataGroup and Mapping tables to generate the SSIS package for the Data Flow.

The stored procedure ListCustomerCopyMapping will be called from the BIML script to get the source and target tables for the table mapping. The returned dataset will be stored in a DataTable.

These details will be collected from the first row of the datatable:

  • ConnectionName
  • DataFlowName
  • SourceTable
  • DestinationTable

The below BIML script will help us build the data flow components from the metadata repository.

<#@ import namespace="System.Data" #> 
<#@ import namespace="System.Data.SqlClient" #>
<#@ template language="C#" tier="1" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<#
 string strConnection = @"Data Source=localhost\SQL2012;Initial catalog=MetadataModel; Integrated security=SSPI;"; 
 string connectionString =""; 
 DataTable dtConnection = new DataTable(); 
 SqlDataAdapter sqldaStaging = new SqlDataAdapter("exec [dbo].[ListconnectionDetails];", strConnection); 
 sqldaStaging.Fill(dtConnection); 
 DataRow rwConnection = dtConnection.Rows[0];

 DataTable dtTableMapping = new DataTable();
 SqlDataAdapter sqldaTableMapping = new SqlDataAdapter("exec [dbo].[ListCustomerCopyMapping];", strConnection);
 sqldaTableMapping.Fill(dtTableMapping);
 DataRow rwTableMapping = dtTableMapping.Rows[0];
#>
 <Connection Name="<#=rwConnection["ConnectionName"]#>" ConnectionString="<#=rwConnection["ConnectionString"]#>" />
</Connections>
<Packages>
 <Package Name="staging" ConstraintMode="Linear">
  <Tasks>
   <ExecuteSQL Name="Truncate Table" ConnectionName="<#=rwConnection["ConnectionName"]#>"> 
    <DirectInput> Truncate table dbo.customer_copy </DirectInput> 
   </ExecuteSQL>
   <Dataflow Name="DFT Copy Customer Data">
    <Transformations> 
     <OleDbSource Name="Source" ConnectionName="<#=rwTableMapping["ConnectionName"]#>" >
      <DirectInput>SELECT * FROM <#=rwTableMapping["SourceTableName"]#></DirectInput> 
     </OleDbSource> 
     <OleDbDestination Name="Target" ConnectionName="<#=rwTableMapping["ConnectionName"]#>"> 
      <ExternalTableOutput Table="<#=rwTableMapping["DestinationTableName"]#>"/> 
     </OleDbDestination> 
    </Transformations> 
   </Dataflow>
  </Tasks>
 </Package>
</Packages>
</Biml>

Here is what is generated:

BIML generated DataFlow Tasks in SSIS Package

Summary

This metadata model can be extended further for other technical systems such as flat files, spreadsheets and other relational DBMS. There is no doubt metadata driven ETL development can succeed with the help of BIMLScript.

Next Steps
  • Stay tuned to learn about building transformation for Metadata Driven Development in the next tip.
  • Read more about metadata here
  • Read metadata from Kimball book here
  • Read Data Warehouse Metadata here


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Nat Sundar Nat Sundar is working as an independent SQL BI consultant in the UK with a Bachelors Degree in Engineering.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2016-10-26

Comments For This Article

















get free sql tips
agree to terms