By: Koen Verbeeck | Updated: 2019-01-18 | Comments (2) | Related: > Azure Integration Services
Problem
We have created an Azure Data Factory environment and configured an Azure-SSIS Integration Runtime to run our Integration Services packages in the cloud. Everything works fine, but running the Azure-SSIS IR cluster the entire time leads to significant hosting costs. We can save costs by turning off the IR when we don't need it. Is there a way to automate this, so we don't have to turn it down manually?
Solution
Like most resources in the cloud, running something costs money. Some resources are pay-per-use, others cost money by just having them, such as storage where you pay per terabyte. Certain types of resources cost money for each second, they are turned on, such as virtual machines for example. The Azure-SSIS IR environment is a cluster of virtual machines which run an SSIS scale-out configuration. Having the IR turned on means you are paying for those virtual machines for each second that passes by. You can find information about those costs in the official pricing page.
There are three easy way to cut costs:
- Go for a smaller cluster. Less nodes with less RAM and cores. If you have typical ELT scenarios (where most of the work is done by a database engine or something similar), you can scale down your cluster. You don't need a powerful cluster since most of the work is offloaded.
- Choose standard edition for SSIS. You only need enterprise edition for certain specific scenarios, such as fuzzy matching or specific connectors. You can find an overview of the enterprise features here. If you already have a SQL Server license, you can save extra money with the hybrid benefit.
- Turn off your IR when you don't need it. Spin up the cluster at the start of your ETL load, turn it off when you're done. If your ETL load takes 3 hours for example, you can turn the IR off for the other 21 hours. This almost saves you 90% of the cost when the IR would be running full-time.
In this tip, we'll explore the last option. We'll see how you can automatically start your IR at the beginning of an ETL flow (using an Azure Data Factory pipeline) and how to turn it off at the end.
Create an Azure Automation Account & Runbook
The key to automate your Azure-SSIS IR environment is using Azure Automation. With this "serverless" service, you can automate and configure your cloud resources. One tool is runbooks, which allow you to run PowerShell scripts. First, we have to create an automation account. Click on "Create a Resource" in the Azure Portal and search for automation.
Choose a name, subscription, resource group and region to configure your automation account. When you choose to create the Azure Run As account, you need permissions in Azure Active Directory to do so.
After a few minutes, the Azure Automation account is created and some sample Runbooks are provided as well.
Before we create our runbook, we need to import some Data Factory modules into the Automation account. In the Shared Resources section, go to Modules.
Here we need the latest version of the AzureRM.Profile module and the AzureRM.DataFactoryV2 module.
When you following the links to the PowerShell gallery, you can choose to deploy the module directly to Azure Automation.
Clicking on Deploy to Azure Automation will take you to the Azure Portal, where you can choose your Automation account.
The module is then deployed to the Azure Automation account, which might take some time. We can repeat the same steps for the other module.
The next part is to create our PowerShell runbook. Go to the Automation Account and select Runbooks.
Then click on Add a new Runbook.
Choose to create a new runbook, enter a name, an optional description and select PowerShell from the dropdown list.
There are other options as well, but in this tip we'll focus on PowerShell.
After creating the Runbook, you'll be automatically redirected to the edit environment. Here we can write our PowerShell code.
First, we need to create some parameters to make the runbook more dynamic. We're going to pass the names of the resource group under which the Azure Data Factory is located, the Azure Data Factory and the Azure-SSIS IR. We're also passing a command: are we starting or stopping the Azure-SSIS IR?
Next, we're using the Azure Run As account to create a connection to Azure.
If the command parameter is equal to START or start, we're going to start the Azure-SSIS IR, using the Start-AzureRmDataFactoryV2IntegrationRuntime cmdlet.
Using the -Force parameter will start the IR without asking for confirmation, which is what we want in an automated environment. The PowerShell script will keep running until the IR is actually provisioned and started. If the IR is already running when the runbook is launched, no problem, it script will just finish successfully.
If the command parameter is equal to STOP or stop, the similar cmdlet Stop-AzureRmDataFactoryV2IntegrationRuntime is used to stop the Azure-SSIS IR.
The entire script looks like this:
Param ( [Parameter (Mandatory= $true)] [String] $ResourceGroup, [Parameter (Mandatory= $true)] [String] $DataFactory, [Parameter (Mandatory= $true)] [String] $AzureSSIS, [Parameter (Mandatory= $true)] [String] $Command ) $connectionName = "AzureRunAsConnection" try { # Get the connection "AzureRunAsConnection " $servicePrincipalConnection=Get-AutomationConnection -Name $connectionName "Logging in to Azure..." Connect-AzureRmAccount ` -ServicePrincipal ` -TenantId $servicePrincipalConnection.TenantId ` -ApplicationId $servicePrincipalConnection.ApplicationId ` -CertificateThumbprint $servicePrincipalConnection.CertificateThumbprint } catch { if (!$servicePrincipalConnection) { $ErrorMessage = "Connection $connectionName not found." throw $ErrorMessage } else{ Write-Error -Message $_.Exception throw $_.Exception } } if($Command -eq "START" -or $Command -eq "start") { "##### Starting the Azure-SSIS IR #####" Start-AzureRmDataFactoryV2IntegrationRuntime ` -ResourceGroupName $ResourceGroup ` -DataFactoryName $DataFactory ` -Name $AzureSSIS ` -Force } elseif($Command -eq "STOP" -or $Command -eq "stop") { "##### Stopping the Azure-SSIS IR #####" Stop-AzureRmDataFactoryV2IntegrationRuntime ` -DataFactoryName $DataFactory ` -Name $AzureSSIS ` -ResourceGroup $ResourceGroup ` -Force } "##### Script Finished #####"
Keep in mind this PowerShell script can also be used to start other types of Integration Runtimes, not just the Azure-SSIS IR.
When the script is finished, you can publish it:
By clicking on Test pane, you can test the script by specifying values for the parameters and running the script.
If the script runs successfully, you'll get the output of the script displayed on the right side:
Unfortunately, I had to take the output of a similar script in another environment. Ironically, I forgot to shut down my Azure-SSIS IR one day and it kept running for a couple of days, eating away all of my free Azure credits. Which is exactly the reason why I decided to write this tip, so it would never happen again.
As a last step, you can schedule the runbook. You just need to provide a schedule and supply the parameters.
In the next part of the tip, we are going to explain how we can trigger the runbook from an Azure Data Factory pipeline, using webhooks.
Next Steps
- Try it out yourself! To follow along with this tip, you need a valid Azure subscription (free trials are available) and a local machine to host the node for the self-hosted IR.
- More tips about Azure Data Factory and the Azure-SSIS IR:
- Connect to On-premises Data in Azure Data Factory with the Self-hosted Integration Runtime – Part 1
- Executing Integration Services Packages in the Azure-SSIS Integration Runtime
- Parallel package execution in Azure-SSIS Runtime
- Configure an Azure SQL Server Integration Services Integration Runtime
- Customized Setup for the Azure-SSIS Integration Runtime
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2019-01-18