By: Nat Sundar | Updated: 2016-10-26 | Comments | Related: More > Integration Services Development
Problem
BIML is a great way to generate SQL Server Integration Services (SSIS) packages from code. How can I use my technical metadata repository to generate SSIS packages using BIML?
Solution
Metadata means "data about data" and describes the relevant aspect of the data. Metadata can be classified in three ways:
- Business Metadata
- Operational Metadata
- Technical Metadata
These are high level benefits for using metadata for SSIS package creation for an enterprise:
- No need to develop a SSIS package. The SSIS package can be generated based on the configuration.
- Highly transparent business rules. Rules can be modified easily by changing the values in the metadata tables.
- Less dependency on the development team.
Technical Metadata
Technical metadata stores information about ETL processes. It contains details about data mapping and transformations from source to target in a data system.
This metadata is used by applications to generate ETL solutions dynamically. In addition, it helps the ETL Modelers, Developers and Analysts understand the control flow and data flow for the data load.
Most commercial ETL applications provide a metadata repository with an integrated metadata management system to manage the ETL process definition. The definition of technical metadata is usually more complex than the business metadata and it sometimes involves multiple dependencies.
Technical metadata contains the definition of the following:
- Source Database / Source System Definition - Can be a relational database, third party system, file system, etc.
- Target Database - Usually a Data Warehouse Database
- Source Tables
- Source Columns
- Target Tables
- Target Columns
- Transformation
In this tip, we will learn and practice using technical metadata for ETL development.
Setup Metadata for Examples
The below script will help us setup a simple metadata model for ETL development.
USE [master]; GO IF EXISTS (SELECT name FROM sys.databases WHERE name = N'Staging') DROP DATABASE Staging; GO IF NOT EXISTS (SELECT name FROM sys.databases WHERE name = N'Staging') CREATE DATABASE Staging; GO USE Staging; GO IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Customer]') AND type IN (N'U')) Drop table [dbo].[Customer] CREATE TABLE [dbo].[Customer]( [Customer_ID] [int] NOT NULL, [Name] [nvarchar](250) NOT NULL, [Address] [nvarchar](250) NOT NULL, CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED ( [Customer_ID] ASC )) Go IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Customer_Copy]') AND type IN (N'U')) Drop table [dbo].[Customer_Copy] CREATE TABLE [dbo].[Customer_Copy]( [Customer_ID] [int] NOT NULL, [Name] [nvarchar](250) NOT NULL, [Address] [nvarchar](250) NOT NULL, CONSTRAINT [PK_Customer_Copy] PRIMARY KEY CLUSTERED ( [Customer_ID] ASC )) GO IF EXISTS (SELECT name FROM sys.databases WHERE name = N'MetadataModel') DROP DATABASE MetadataModel; GO IF NOT EXISTS (SELECT name FROM sys.databases WHERE name = N'MetadataModel') CREATE DATABASE MetadataModel; GO USE MetadataModel; GO IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[ListConnectionDetails]') AND type IN (N'P', N'PC')) Drop Procedure dbo.[ListConnectionDetails] go CREATE procedure [dbo].[ListConnectionDetails] as Select RDP.ServerName,RDP.DatabaseName, LTrim(RTrim(DP.Name)) as ConnectionName, 'Data Source=' + RDP.ServerName + ';Persist Security Info=true;Integrated Security=SSPI;Provider=SQLNCLI11.1;Initial Catalog=' +RDP.DatabaseName as ConnectionString from DataPackage DP INNER JOIN RelationalDataPackage RDP on DP.DataPackageID = RDP.DataPackageID Where [Type]='REL' GO IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[ListCustomerCopyMapping]') AND type IN (N'P', N'PC')) Drop Procedure dbo.[ListCustomerCopyMapping] go CREATE procedure [dbo].[ListCustomerCopyMapping] as Select MAP.MappingComments as DataFlowName, RDGS.TableName as SourceTableName, RDGD. TableName as DestinationTableName, RTrim(LTrim(DP.Name)) as ConnectionName from [dbo].[SourceToTargetMapping] MAP INNER JOIN dbo.DataGroup DGS on MAP.SourceDataGroupID = DGS.DataGroupIDI and Upper(DGS.[Type])='TABLE' INNER JOIN dbo.RelationalDataGroup RDGS on DGS.DataGroupID = RDGS.DataGroupID INNER JOIN dbo.DataGroup DGD on MAP.DestinationDataGrouplD = DGD.DataGroupID and Upper(DGD.[Type])='TABLE' INNER JOIN dbo.RelationalDataGroup RDGD on DGD.DataGroupID = RDGD.DataGroupID INNER JOIN dbo.DataPackage DP on DGS.ContainedDataPackageID = DP.DataPackageID Where MAP.[MappingName] ='CustomerDataCopy' GO IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[DataGroup]') AND type IN (N'U')) Drop table [dbo].[DataGroup] CREATE TABLE [dbo].[DataGroup]( [DataGroupID] [int] IDENTITY(1,1) NOT NULL, [Type] [nvarchar](50) NOT NULL, [Description] [nvarchar](50) NOT NULL, [ContainedDataPackageID] [int] NOT NULL, CONSTRAINT [PK_DataGroup] PRIMARY KEY CLUSTERED ( [DataGroupID] ASC )) ON [PRIMARY] GO IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[DataPackage]') AND type IN (N'U')) Drop table [dbo].[DataPackage] CREATE TABLE DataPackage ([DataPackageID] [int] IDENTITY(1,1) NOT NULL, [Type] [nvarchar](50) NOT NULL, [Description] [nvarchar](50) NOT NULL,[Name] [nvarchar](50) NULL, CONSTRAINT [PK_DataPackage] PRIMARY KEY CLUSTERED ([DataPackageID] ASC )) ON [PRIMARY] GO IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[RelationalDataGroup]') AND type IN (N'U')) Drop table [dbo].[RelationalDataGroup] CREATE TABLE [dbo].[RelationalDataGroup]( [TableName] [nvarchar](50) NOT NULL, [DataGroupID] [int] NOT NULL, [Usage] [nvarchar](50) NULL ) ON [PRIMARY] GO IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[RelationalDataPackage]') AND type IN (N'U')) Drop table [dbo].[RelationalDataPackage] CREATE TABLE [dbo].[RelationalDataPackage]( [DataPackageID] [int] NOT NULL, [ServerName] [nvarchar](50) NOT NULL, [DatabaseName] [nvarchar](50) NOT NULL ) ON [PRIMARY] GO IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[SourceToTargetMapping]') AND type IN (N'U')) Drop table [dbo].[SourceToTargetMapping] CREATE TABLE [dbo].[SourceToTargetMapping]( [SourceDataGrouplD] [int] NOT NULL, [DestinationDataGroupID] [int] NOT NULL, [MappingComments] [nvarchar](200) NOT NULL, [MappingName] [nvarchar](50) NULL, CONSTRAINT [PK_SourceToTargetMapping] PRIMARY KEY CLUSTERED ([SourceDataGrouplD] ASC, [DestinationDataGroupID] ASC )) ON [PRIMARY] GO ALTER TABLE [dbo].[DataGroup] WITH CHECK ADD CONSTRAINT [FK_DataGroup_DataPackage] FOREIGN KEY([ContainedDataPackageID]) REFERENCES [dbo].[DataPackage] ([DataPackageID]) GO ALTER TABLE [dbo].[DataGroup] CHECK CONSTRAINT [FK_DataGroup_DataPackage] GO ALTER TABLE [dbo].[RelationalDataGroup] WITH CHECK ADD CONSTRAINT [FK_RelationalDataGroup_DataGroup] FOREIGN KEY([DataGroupID]) REFERENCES [dbo].[DataGroup] ([DataGroupID]) GO ALTER TABLE [dbo].[RelationalDataGroup] CHECK CONSTRAINT [FK_RelationalDataGroup_DataGroup] GO ALTER TABLE [dbo].[RelationalDataPackage] WITH CHECK ADD CONSTRAINT [FK_RelationalDataPackage_DataPackage] FOREIGN KEY(DataPackageID) REFERENCES [dbo].[DataPackage] ([DataPackageID]) GO ALTER TABLE [dbo].[RelationalDataPackage] CHECK CONSTRAINT [FK_RelationalDataPackage_DataPackage] GO
The relationships between the tables can be seen below.
Let's insert some data into the tables.
USE [MetadataModel] GO INSERT INTO [dbo].[DataGroup] ([Type] ,[Description] ,[ContainedDataPackageID]) VALUES ('Table', 'Customer Staging Table', 1), ('Table', 'Customer Copy Staging Table', 1) GO INSERT INTO [dbo].[RelationalDataGroup] ([TableName] ,[DataGroupID] ,[Usage]) VALUES ('dbo.Customer' ,1 ,'SRC'), ('dbo.Customer_Copy' ,2 ,'DST') GO INSERT INTO [dbo].[DataPackage] ([Type] ,[Description] ,[Name]) VALUES ('REL' ,'Development Server' ,'STAGING_SERVER') GO INSERT INTO [dbo].[RelationalDataPackage] ([DataPackageID] ,[ServerName] ,[DatabaseName]) VALUES (1 ,'localhost\SQL2012' ,'Staging') GO INSERT INTO [dbo].[SourceToTargetMapping] ([SourceDataGroupID] ,[DestinationDataGroupID] ,[MappingComments] ,[MappingName]) VALUES (1 ,2 ,'Copy Customer Data' ,'CustomerDataCopy') go
Understanding the Metadata Tables
Below are the tables that were created and their usage.
DataPackage
This table stores the information about the Source and Destination systems at a very high level. It corresponds to systems such as relational database, flat files, spreadsheets and messaging. The type column differentiates the nature of the system. The type can be relational (REL), flat file, Excel or a third party system.
RelationalDataPackage
This table holds the details about the relational database for a data package. Server Name and Database details will help us create connection managers.
DataGroup
The data group is the lowest level grouping of data and corresponds to the relational table. The relationship between data package and data group is one to many. This hierarchy (Data Package/Data Group) allows the organization to process business transactions. The data group can also represent a file or a sheet in a spreadsheet. The type column differentiates types of data groups available within a data package.
RelationalDataGroup
The relational data group table stores information about the element of data group if the type is relational. This table contains the names of the actual tables for a data group.
SourceToTargetMapping
This table stores the information about the source to destination mapping. This mapping details will be used within the Data Flow.
Metadata Configuration
To understand the effective usage of metadata, let’s say we need to copy the data from a source to a target table. For this purpose, I have created two tables Customer and Customer_Copy. We have the data in the Customer table and would like to copy the data to Customer_Copy table.
To enable us to achieve this requirement, we have loaded configuration data for these tables (DataPackage, RelationalDataPackage, DataGroup, RelationalDataGroup and SourceToTargetMapping).
In addition, two stored procedures have been created to return the dataset to the BIML script.
ListConnectionDetails
This stored procedure will combine the information from the DataPackage and RelationalDataPackage tables. The resultant dataset will be used to create the connection managers dynamically.
ListCustomerCopyMapping
This stored procedure will return the source to destination table mapping details for a specific mapping name. This will be used within the Dataflow task to map the source to destination tables.
Building ETL Package from Metadata
As we store all the information about the ETL in the metadata database, we can make use of the details to generate a SSIS package.
Connection Managers
The stored procedure ListConnectionDetails will be called from the BIML script to get the list of systems configured in the metadata repository. The returned dataset will be stored in a DataTable. As we have only one connection manager, we can make use of the first retuned row to build the connection manager. If we have more than one DataPackage, then we can make use of the foreach loop to build multiple connection managers. The below mentioned BIML script will help us generate a SSIS package based on the configuration in the metadata repository.
<#@ import namespace="System.Data" #> <#@ import namespace="System.Data.SqlClient" #> <#@ template language="C#" tier="1" #> <Biml xmlns="http://schemas.varigence.com/biml.xsd"> <Connections> <# string strConnection = @"Data Source=localhost\SQL2012;Initial catalog=MetadataModel; Integrated security=SSPI;"; string connectionString =""; DataTable dtConnection = new DataTable(); SqlDataAdapter sqldaStaging = new SqlDataAdapter("exec [dbo].[ListconnectionDetails];", strConnection); sqldaStaging.Fill(dtConnection); DataRow rwConnection = dtConnection.Rows[0]; #> <Connection Name="<#=rwConnection["ConnectionName"]#>" ConnectionString="<#=rwConnection["ConnectionString"]#>" /> </Connections> <Packages> <Package Name="staging" ConstraintMode="Linear"> <Tasks> <ExecuteSQL Name="Truncate Table" ConnectionName="<#=rwConnection["ConnectionName"]#>"> <DirectInput> Truncate table dbo.customer_copy </DirectInput> </ExecuteSQL> </Tasks> </Package> </Packages> </Biml>
This is the SSIS package that is generated.
Generating Data Flow Based on Metadata Repository
Now we can apply the same logic to generate the Data Flow. Let’s make use of the DataGroup, RelationalDataGroup and Mapping tables to generate the SSIS package for the Data Flow.
The stored procedure ListCustomerCopyMapping will be called from the BIML script to get the source and target tables for the table mapping. The returned dataset will be stored in a DataTable.
These details will be collected from the first row of the datatable:
- ConnectionName
- DataFlowName
- SourceTable
- DestinationTable
The below BIML script will help us build the data flow components from the metadata repository.
<#@ import namespace="System.Data" #> <#@ import namespace="System.Data.SqlClient" #> <#@ template language="C#" tier="1" #> <Biml xmlns="http://schemas.varigence.com/biml.xsd"> <Connections> <# string strConnection = @"Data Source=localhost\SQL2012;Initial catalog=MetadataModel; Integrated security=SSPI;"; string connectionString =""; DataTable dtConnection = new DataTable(); SqlDataAdapter sqldaStaging = new SqlDataAdapter("exec [dbo].[ListconnectionDetails];", strConnection); sqldaStaging.Fill(dtConnection); DataRow rwConnection = dtConnection.Rows[0]; DataTable dtTableMapping = new DataTable(); SqlDataAdapter sqldaTableMapping = new SqlDataAdapter("exec [dbo].[ListCustomerCopyMapping];", strConnection); sqldaTableMapping.Fill(dtTableMapping); DataRow rwTableMapping = dtTableMapping.Rows[0]; #> <Connection Name="<#=rwConnection["ConnectionName"]#>" ConnectionString="<#=rwConnection["ConnectionString"]#>" /> </Connections> <Packages> <Package Name="staging" ConstraintMode="Linear"> <Tasks> <ExecuteSQL Name="Truncate Table" ConnectionName="<#=rwConnection["ConnectionName"]#>"> <DirectInput> Truncate table dbo.customer_copy </DirectInput> </ExecuteSQL> <Dataflow Name="DFT Copy Customer Data"> <Transformations> <OleDbSource Name="Source" ConnectionName="<#=rwTableMapping["ConnectionName"]#>" > <DirectInput>SELECT * FROM <#=rwTableMapping["SourceTableName"]#></DirectInput> </OleDbSource> <OleDbDestination Name="Target" ConnectionName="<#=rwTableMapping["ConnectionName"]#>"> <ExternalTableOutput Table="<#=rwTableMapping["DestinationTableName"]#>"/> </OleDbDestination> </Transformations> </Dataflow> </Tasks> </Package> </Packages> </Biml>
Here is what is generated:
Summary
This metadata model can be extended further for other technical systems such as flat files, spreadsheets and other relational DBMS. There is no doubt metadata driven ETL development can succeed with the help of BIMLScript.
Next Steps
- Stay tuned to learn about building transformation for Metadata Driven Development in the next tip.
- Read more about metadata here
- Read metadata from Kimball book here
- Read Data Warehouse Metadata here
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2016-10-26