Azure Data Factory Stored Procedure Activity Transformation Activities

By:   |   Updated: 2019-07-12   |   Comments (3)   |   Related: > Azure Data Factory


Problem

In these series of posts, I am going to explore Azure Data Factory (ADF), compare its features against SQL Server Integration Services (SSIS) and show how to use it towards real-life data integration problems. In previous posts, we have built pipelines with copy activities. Many data integration scenarios require data transformation and processing capabilities, in addition to copy tasks. In this post, we will discuss transformation related pipeline activities.

Solution

Azure Data Factory Transformation Activities

Transformation activities transform and process data in different computing environments such as SQL Server, Azure HDInsight cluster or an Azure Batch. In other words, ADF pipelines play an orchestrator role, initiating tasks within different engines. The following transformation types are available:

  • Stored Procedure Activity
  • HDInsight Hive activity
  • HDInsight Pig activity
  • HDInsight MapReduce activity
  • HDInsight Streaming activity
  • HDInsight Spark activity
  • Machine Learning activities
  • Data Lake Analytics U-SQL activity
  • Databricks Notebook activity
  • Databricks Jar activity
  • Databricks Python activity
  • Custom activity

In this post, we will be focusing on using Stored Procedure Activity.

Stored Procedure Activity can be used to invoke a stored procedure in one of the following data stores in your enterprise or on an Azure virtual machine (VM):

  • Azure SQL Database
  • Azure SQL Data Warehouse
  • SQL Server Database on on-premises machine on Azure VM. In this case, you will need to use Self Hosted Integrated Service, which we discussed in earlier posts.

Creating Stored Procedure Activity in Azure Data Factory

Stored Procedure Activity could be used to run regular batch processes, to log pipeline execution progress or exceptions. We will create a simple stored procedure in the DstDb database to store pipeline name, pipeline run ID and sample text. Let's run the following queries to create ExceptionLogs table and Usp_ExceptionLog stored procedure, using below script under DstDb database context:

CREATE TABLE [dbo].[ExceptionLogs](
 [PipelineName] [varchar](100) NULL,

 [RunId] [varchar](100) NULL,
 [TableName] [varchar](100) NULL
)
GO

CREATE PROCEDURE Usp_ExceptionLog
(@PipelineName varchar(100), @runid varchar(100),@TableName varchar(100)) 
AS
BEGIN
    INSERT INTO ExceptionLogs VALUES(@PipelineName,@runid,@TableName) 
END
GO

Most of activities we created in previous tips had two datasets - source and sink (destination) datasets. In contrary, like most transformation activities, Stored Procedure Activity has no datasets at all. However, we still need to have a linked service pointing to the server or service hosting the database, where the stored procedure is located.

I have created pipeline ExploreSQLSP_PL for the purpose of this exercise and will use SqlServerLS_Dst linked service, which is pointing to the DstDb database.

Creating Stored Procedure Activity is a straightforward process and here are the required steps:

  • Select the pipeline that will host this activity, expand the General category on the Activities panel, drag-drop it to the central panel and assign the name (SP_AC in this example):
stored procedure
  • Switch to SQL Account tab and select the required linked service (SqlServerLS_Dst in this case):
stored procedure
  • Switch to the Stored Procedure tab and select the required procedure (Usp_ExceptionLog in this case):
usp exception
  • Click the Import parameter button to fetch the procedure parameters:
import parameter
  • Now, let's assign a system variable to the first parameter, by using the Add dynamic content link:
pipeline name
  • The Add Dynamic Content window allows you to use a combination of system variables and ADF functions to build complex expressions interactively. Select Pipeline Name under the System Variables group and ensure that ADF has added the new expression to the expression box:
add dynamic content
  • Follow similar steps to assign Pipeline run ID system variable to runid parameter:
add dynamic content
  • Assign static string DimCustomer to the TableName parameter. Here' how your screen should look at the end of this exercise:
import parameter

Here is JSON script behind this pipeline:

{
    "name": "ExploreSQLSP_PL",
    "properties": {
        "activities": [
            {
                "name": "SP_AC",
                "type": "SqlServerStoredProcedure",
                "policy": {
                    "timeout": "7.00:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "typeProperties": {
                    "storedProcedureName": "[dbo].[Usp_ExceptionLog]",
                    "storedProcedureParameters": {
                        "PipelineName": {
                            "value": {
                                "value": "@pipeline().Pipeline",
                                "type": "Expression"
                            },
                            "type": "String"
                        },
                        "runid": {
                            "value": {
                                "value": "@pipeline().RunId",
                                "type": "Expression"
                            },
                            "type": "String"
                        },
                        "TableName": {
                            "value": "DimCustomer",
                            "type": "String"
                        }
                    }
                },
                "linkedServiceName": {
                    "referenceName": "SqlServerLS_Dst",
                    "type": "LinkedServiceReference"
                }
            }
        ]
    }
}

Finally, let's publish all changes and start this pipeline manually. As you can see from ADF monitoring page's screenshot, the execution was successful:

pipeline name

Let's validate the results by querying the target table:

sql query

Conclusion

Stored Procedure Activity in ADF could be used for running batch processes and storing a pipeline's intermediate results in a way that resembles SSIS's OLE DB Command functionality. The other types of transformations I mentioned require knowledge of specific technology and we will look at them in future posts. We will be exploring another type of activity Control Flow activities in the next few posts.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Fikrat Azizov Fikrat Azizov has been working with SQL Server since 2002 and has earned two MCSE certifications. He’s currently working as a Solutions Architect at Slalom Canada.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2019-07-12

Comments For This Article




Monday, December 13, 2021 - 11:01:58 AM - akhtar Back To Top (89573)
i have a ADF solution which is metadata driven, so it passes connection string and source and sink as parameters.. my concern is that i also have SQL Logging steps within pipelines and child pipelines and now for a simple Azure DB table copy into ADSL Parquet it is bottlenecked by the logging steps and child pipelines... i noticed that each step (mainly logging steps) take around 3-6 seconds? is there any way of reducing the time to execute logging steps??? i have tried upgrading the Config Database from basic to S1, i have tried changing the ADF's integration runtime to 32 core count, i have changed the TTL to 20 Mins, and i have also checked the quick cache.. nothing seems to reduce the time to run these audit steps...

the audit step is a StoredProc which you pass in a load of parameters. this proc run in split seconds in SSMS so the proc isnt the issue

Monday, August 17, 2020 - 6:37:11 PM - Fikrat Back To Top (86316)
Yes, you can create a pipeline parameter and use it as an expression when supplying proc name, like this:
pipeline().parameters.ProcNamePar

Monday, August 17, 2020 - 4:53:01 PM - ABhijit Back To Top (86314)
In Activity Store Procedure; can I pass Procedure name as Parameter.
The StorProc to be executed by StoreProcedure Activity is passed as Parameter.
Possible?
Then what is alternative?














get free sql tips
agree to terms