Using Azure DevOps CI CD to Deploy Azure Data Factory Environments

By:   |   Updated: 2020-08-04   |   Comments (3)   |   Related: > DevOps


Problem

There are many unique methods of deploying Azure Data Factory environments using Azure DevOps CI/CD. These options include a variety of source control repos and various architectures that can be both complex and challenging to set-up and configure. What is a good method of getting started with deploying Azure Data Factory environments with Azure DevOps CI/CD?

Solution

There are a few methods of deploying Azure Data Factory environments with Azure DevOps CI/CD. Source control repository options can range from GitHub to DevOps Git and implementation architectures can range from utilizing adf_publish branches to using working and master branches instead. In this demo, I will demonstrate an end-to-end process of how to create an Azure Data Factory multi-environment DevOps CI/CD by using GitHub for source control repos synced to working and master branches. Also, Azure DevOps Build and Release pipelines will be used for CI/CD, and a custom PowerShell based code-free Data Factory publish task will be used for deploying the CI/CD Data Factory resources all within the same Resource Group.

Solution Architecture diagram

Pre-requisites

1) GitHub Account: For more information on creating a GitHub Account, see How to Create an Account on GitHub.

2) Azure Data Factory V2: For more information on creating an Azure Data Factory V2, see Quickstart: Create an Azure data factory using the Azure Data Factory UI.

3) Azure DevOps: For more information on creating a new DevOps account, see Sign up, sign in to Azure DevOps.

Create a GitHub Repository

After the GitHub account has been created from the pre-requisites section, a new Git repository will also need to be created.

To get stared, navigate to Repositories in the newly created GitHub account, and click New.

Create GitHub Repo

Next enter the new Repository name, select a Visibility option Public vs. Private, enable 'initialize this repository with a Readme', and click Create Repository.

Create Repo Detailed steps

Once the repository has been created, the Readme file will be viewable and the master branch will be associated with the repo.

Image of Created GitHub Repo

Create a Data Factory

While creating the new Data Factory from the pre-requisites section, ensure that GIT is Enabled.

Enter the necessary details related to the GIT account and Repo.

Click Create to provision the DEV Data Factory.

Steps to enable github in create adf UI

Navigate to the newly created DEV Data Factory in the desired resource group.

Click Author & Monitor to launch the Data Factory authoring UI.

ADF Created in RG image

Log into GitHub from Data Factory

When the DEV Data Factory is launched, click Log into GitHub to connect to the GitHub Account.

ADF step to login to GitHub

Next, Click the Git repo settings.

Image of how to find github repo settings

Click Open management hub. See 'Management Hub in Azure Data Factory' for more information on working with this hub.

Step to get to ADF Management Hub

Next click the Git configuration section of the connections to either Edit, Disconnect, or Verify the Git repository. We can verify Git repo connection details from this tab.

Steps to get to ADF Git configuration section

Finally, we can also see that the GitHub master branch has been selected in the top left corner of the Data Factory UI.

Image of ADF github master branch selected

Create a Test Data Factory Pipeline

Now that the Data Factory has been connected to the GitHub Repo, let's create a test pipeline.

To create a pipeline, click the pencil icon, next click the plus icon, and finally click Pipeline from the list of options.

ADF Create a test pipeline in UI

Select an activity from the list of options. To keep this demo simple, I have selected a Wait activity.

Click save and publish to check in the pipeline to the Master GitHub branch.

Step to create an adf Wait Activity, demo.

Once the ADF pipeline has been checked in, navigate back to the GitHub account repo to ensure that the pipeline has been committed.

Git reflecting adf pipeline commit

Create a DevOps Project

After creating an Azure DevOps Account from the pre-requisites section, we'll need to creating a DevOps project along with a Build and Release pipeline.

Let's get started by creating a new project with the following details.

Steps to create a new ADO project

Create a DevOps Build Pipeline

Now it's time to create a DevOps Build Pipeline. This can be done by navigating to Pipelines tab of the project.

Select Pipelines from the list of options.

Steps to create a new ADF Build Pipeline

Create your first project pipeline by clicking Create Pipeline.

Step to create ADO pipeline

When prompted to select where your code is, click Use the classic editor toward the bottom.

ADO image depicting where to find GitHub Code

Select the following GitHub source, enter the connection name and click Authorize using OAuth.

Steps to select GitRepo in ADO and Authorize

Authorize Azure Pipelines using OAuth will display a UI for further verification. Once the authorization verification process is complete, click Authorize Azure Pipelines.

Step to authorize azure pipelines

After the Azure Pipelines are authorized using OAuth, the authorized connections along with the repo and default branch will be listed as follows and can be changed by clicking the icon.

Click Continue to proceed.

Image of details on GitHub Authorization and management

When prompted to choose a template, select Empty job.

Choose an empty template in ADO

The Build Pipeline tab will contain the following details.

Configure and select the Name, Agent pool and Agent Specification.

Image showing steps to create ADO pipeline tasks

Click the + icon by Agent job 1 to add a task to the job.

Step to add task to agent job.

Search for 'publish build artifacts' and add the task to the Build pipeline.

Step to add publish build artifact task

In the Publish build artifacts UI, enter the following details.

Also browse and select the path to publish.

Step to configure publish build artifacts details

Click OK after the path is selected.

Step to select build artifacts path.

Click Save & queue to prepare the pipeline to run.

Step to save and Queue pipeline

Finally, run the Build pipeline by clicking Save and run.

Step to save and run the pipeline in ADO

Note the pipeline run summary which indicates the repo, run date/times, and validation that the pipeline has been successfully published.

Click the following published icon for more detail.

Step showing pipeline summary and published artifacts.

Notice that the demopipeline has been published in JSON format, which confirms that the build pipeline has successfully been published and is ready for release.

Image showing published artifacts.

Create a DevOps Release Pipeline

Now that the Build Pipeline has been created and published, we are ready to create the Release pipeline next.

To do this, click the pipelines icon, and select Releases.

Steps to create ADO Release Pipeline.

Azure DevOps will let you know that there a no release pipelines found.

Go ahead and click New Pipeline.

Step to create a new ADO Release Pipleline.

When prompted to select a template, click Empty job.

Step to create a empty job for release pipeline.

Enter a Stage name and verify the Stage Owner. For my scenario, I will select PROD.

Step to add ADO release pipeline stage name.

Next, let's go ahead and Add an artifact.

Step to add ADO release Artifact.

Ensure that the source time is Build and that the correct Source (build pipeline) is selected.

Click Add to add the selected artifact.

Step to add artifact details

In the Stages section where we have the PROD stage which was created earlier, notice that there is 1 job and no tasks associated with it yet.

Click to view the stage tasks.

Steps to view stage tasks.

Search for adf and click Get it free to download the Deploy Azure Data Factory task.

Step to find ADF Deploy task

You'll be re-directed to the Visual Studio marketplace.

Once again, click Get it free to download the task.

For more information on this Deploy Azure Data Factory task, see this PowerShell module which it is based on, azure.datafactory.tools.

Step to get the ADF Deploy task for free.

Select the Azure DevOps organization and click Install.

Steps to install the ADF Deploy Task

When the download succeeds, navigate back to the DevOps Release pipeline configuration process.

Image showing that the install of the task in successful.

Add the newly downloaded Publish Azure Data Factory task to the release pipeline.

Steps to add the ADF Publish task to the release pipeline.

The Publish Azure Data Factory task will contain the following details that will need to be selected and configured.

For a list of subscription connection options, select Manage.

Click the icon to select the Azure Data Factory Path.

Steps to configure publish adf task

After the file or folder is selected, click OK.

Select ADF Publish artifact folder.

As you scroll through the task, ensure the additional selection details are configured accurately.

Additional Details for the ADF Publish Task

Also ensure that the release pipeline is named appropriately, click Save.

Steps to save the ADO Release Pipeline

Click Create release.

Steps to create the ADO Release pipeline.

Finally, click Create one last time.

Steps to create a new Release.

Once the release has been created, click the Release-1 link.

Image Confirming that new ADO release pipeline is created.

View the adf release pipeline details and note that the PROD Stage has been successfully published.

ADO-ADF Pipeline release succeeded to prod.

Verify the New Data Factory Environment

After navigating back to the Portal, select the resource group containing the original dev Data Factory.

Notice that there is now an additional Data Factory containing the prod instance.

Image verifying that the prod adf resource is created.

As expected, notice that the prod instance of the data factory also contains the same demopipeline with the Wait activity.

Image showing that the Prod ADF Pipeline also contains the same demo wait activity as the DEV ADF.

Summary

In this article, I demonstrated how to create an Azure Data Factory environment (PROD) from an existing Azure Data Factory environment (DEV) using a GitHub Repo for source control and Azure DevOps Build and Release pipelines for a streamlined CI/CD process to create and manage multiple Data Factory Environments within the same Resource Group.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Ron L'Esteve Ron L'Esteve is a trusted information technology thought leader and professional Author residing in Illinois. He brings over 20 years of IT experience and is well-known for his impactful books and article publications on Data & AI Architecture, Engineering, and Cloud Leadership. Ron completed his Master�s in Business Administration and Finance from Loyola University in Chicago. Ron brings deep tec

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2020-08-04

Comments For This Article




Wednesday, January 27, 2021 - 6:01:33 PM - DataBI Back To Top (88112)
The article is very well explained. I am able to create the azure data factory instance but no pipeline has been created inside the newly created data factory instance. Any idea what's the cause of this ?

Tuesday, January 5, 2021 - 6:25:05 AM - Debbie Edwards Back To Top (87998)
I have gone through the process but at the other end when I check my data factory in Production, nothing has been added. I am getting a warning "Both Az and AzureRM modules were detected on this machine. Az and AzureRM modules cannot be imported in the same session or used in the same script or runbook. If you are running PowerShell in an environment you control you can use the 'Uninstall-AzureRm' cmdlet to remove all AzureRm modules from your machine. If you are running in Azure Automation, take care that none of your runbooks import both Az and AzureRM modules" I dont know if this has anything to do with it. Also when you get to the Agent Job, I added my production Subscription and resource group information which your post didn't specify so i dont fully know if I did this correctly

Sunday, August 30, 2020 - 5:25:13 AM - Sourav Ghosh Back To Top (86391)
Hi Ron,
Thanks for a detail explanation.

Currently I have a build and release pipeline that is working as expected, where developers are creating pull request to merge changes to master branch and after validation we are merging and deploying to higher environment.

Now, I want to enhance this CI part in such way so that, we can select particular pipelines, link services and dataset which need to deployed on to higher environment.

As developers are working on multiple release in their feature branch and we want to ignore pipelines those are for future release and then merge it to master branch for deployment.

is there a way to achieve this? or do you have any alternate suggestion here?

Thanks in advance.

Best Regards,
Sourav














get free sql tips
agree to terms