Correlation Analysis using Correlation Plot in Power BI Desktop

By:   |   Updated: 2017-11-27   |   Comments (8)   |   Related: > Power BI Charts


Problem

Correlation Analysis is a fundamental method of exploratory data analysis to find a relationship between different attributes in a dataset. Statistically, correlation can be quantified by means of a correlation co-efficient, typically referred as Pearson’s co-efficient which is always in the range of -1 to +1. A value of -1 indicates a total negative relationship and +1 indicates a total positive relationship. Any number closer to zero represents very low or no relationship at all. There is a statistical calculation involved to find this co-efficient and using this you can identify the correlation between two attributes with numerical data. If interested, you can read more about it here. It can be a very statistically intensive process if the task is to identify correlation between many numeric variables. Correlation plots can be used to quickly calculate the correlation coefficients without dealing with a lot of statistics, effectively helping to identify correlations in a dataset.

Solution

Power BI provides correlation plot visualization in the Power BI Visuals Gallery to create Correlation Plots for correlation analysis.

In this tip we will create a correlation plot in Power BI Desktop using a sample dataset of car performance. It is assumed that Power BI Desktop is already installed on your development machine. So please follow the steps as mentioned below.

1) The first step is to download the correlation plot from here, as it is not available by default in Power BI Desktop. This visualization makes using of the R corrplot package. The same plot can be generated using the R Script visualization and some code. Instead this visualization eliminates the need for coding and provides parameters to configure the visualization.

Download Correlation Plot for Power BI Desktop - Description: Download Correlation Plot

2) After downloading the file, open Power BI Desktop. You can click on the ellipsis in the visualization tab and select the “Import from file” menu option. This will open a dialog box to select the visualization package file, to add the visualization in Power BI. Select the downloaded file and it should add the correlation plot to Power BI Desktop visualizations gallery. Click on the correlation plot icon in the gallery and you should be prompted with a warning as shown below to enable script visuals. Click on Enable which will make the correlation plot operational.

Enable Script Visuals in Power BI Desktop - Description: Install

3) After the correlation plot is added to the report layout, enlarge it to occupy the entire available area on the report. After you have done this, the interface should look as shown below.

Add visual to the Power BI Desktop interface - Description: Add visual

4) Now that we have the visualization, it is time to populate it with some data on which correlation analysis can be performed. We need a dataset with many numerical attributes. You can download a sample dataset from here that contains data on car performance with metrics like miles per gallon, horsepower, transmission, acceleration, cylinder, displacement, weight, gears, etc. Click on the Get Data menu and select CSV since we have the data in a csv file format.

Add CSV data to Power BI Desktop - Description: Add CSV

5) This will open a dialog box to select the file. Navigate to the downloaded file and select it. This will read a few records from the file and show a data preview as shown below.

Data Preview in Power BI Desktop - Description: Preview

6) As you can see, the column headers are in the first row. So, click on the edit button to indicate this before importing the dataset. Click on the “Use First Row as Headers” so that we get the column names properly. Also, you can rename the Car Names column and name it Model.

Modify Columns in Power BI Desktop - Description: Modify Columns

7) After you apply the setting, the column names should look as shown below. Click on the Close and Apply button to complete the import process.

Modified Columns in Power BI Desktop - Description: Modified Columns

8) After the data is imported in the Power BI Desktop, the model should look as shown below.

Model in Power BI Desktop after the import - Description: Model

9) Now it is time to select the fields and add them to the visualization. Click on the visualization in the report layout, and add all the fields from the model except the model field which is a categorical / textual field. The visualization would look as shown below.

Correlation Plot in Power BI Desktop - Description: Correlation Plot

10) The below points should be considered while reading this plot:

  • The dark blue circles in a diagonal line from top left to bottom right shows correlation of an attribute with itself, which is always the strongest or 1. So this should not be read as correlation, but just as a separator line.
  • The more the circle has a dark blue color, it signifies stronger positive correlation. The darker the red color, it signifies a negative correlation. Lighter or white colors signifies weak or no correlation.
  • The scale can be used to estimate the correlation coefficient value.

11) We need to make a few modifications in this plot to make it visually analyzable. Click on the Format option, in the Labels section and increase the font size, so that the field labels are clearly visible as shown below. As you can see, weight (wt) has a strong positive correlation with displacement (disp) and miles per gallon (mpg) has a strong negative correlation with weight (wt). The data is shown in a matrix format and there are many positive and negative correlation spreads in the plot.

Increase the Font Size for the Labels in Power BI Desktop - Description: Labels

12) It would be easier to analyze correlation if attributes with the same type of correlation are clustered together. To do so, select the correlation plot parameters and set the “Draw clusters” property to “Auto”. This will cluster and reorganize the attributes as shown below.

Clusters displayed in Power BI Desktop - Description: Cluster

13) The strength of the correlation is still shown by the depth of the color. It would be easier to analyze the data if it is shown by a number indicating this strength – i.e. correlation coefficient. To do so, switch On the Correlation Coefficients section and increase the font size, so that you can see the coefficient clearly. Using the values as a reference, you can easily find out the strongest and weakest correlation in the entire dataset.

Coefficients Displayed in Power BI Desktop - Description: Coefficients

14) There are other sections for formatting the data, but those are mostly related to cosmetic aspects of the plot like title, background, transparency, title, etc. You can try to modify those settings and make the plot more suitable to the theme of the report.

With Power BI, without digging into any coding or complex statistical calculations, one can derive correlation analysis from the data by using the correlation plot in Power BI Desktop.

Next Steps
  • Try modifying the plot with the correlation parameters formatting options, so that only upper half of the plot remains visible and you do not need to see the correlation circles between attributes twice.
  • Check out other Power BI tips.

Learn more about Power BI in this 3 hour training course.


Click here to start the Power BI course






sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Siddharth Mehta Siddharth Mehta is an Associate Manager with Accenture in the Avanade Division focusing on Business Intelligence.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2017-11-27

Comments For This Article




Tuesday, June 13, 2023 - 11:58:25 PM - correlation Back To Top (91282)
Do you know the alternative for correlation plot? Since I cannot find it on microsoft webstore any more..

Wednesday, March 15, 2023 - 5:30:43 AM - DP Back To Top (91011)
I cannot see the correlation plot under Apps

Wednesday, November 30, 2022 - 6:54:04 AM - Mario Stuck Back To Top (90729)
Hi, any idea, why the correlation plot visual is gone from the microsoft visuals store?

No way to find the visual. Thanks for any help.

Thursday, March 25, 2021 - 8:56:19 AM - Steve Back To Top (88459)
My analysis is based on correlation of ETFs based on dates. Not all ETFs dates correspond. How does this BI Desktop visual, "PowerBI-visuals-corrplot.1.0.1.0", handle data where some ETFs have data for a specific date and other ETFs has a blank or null value for the data of the same date.

Thank you in advance to this visual. This tool is very helpful

Thursday, February 28, 2019 - 3:43:33 PM - delia Back To Top (79163)

 Hello, 

As I follow exact steps as you instruct, but it gives me blank visualization. Could you please help me to solve the issue?

Thank you,

Delia


Monday, February 4, 2019 - 9:46:22 PM - Clint Back To Top (78963)

Hi I downloaded this correlation plot and followed the instructions on this page, and all I got was a blank visualization.  The same data is working fine in the standard visualizations. Where can I get troubleshooting support for this add in?

Clint 


Tuesday, March 20, 2018 - 3:02:58 PM - Julie Back To Top (75486)

 This was really helpful in creating my first correlation plot.  I never would have found some of these options otherwise. Thanks so much


Wednesday, December 6, 2017 - 7:38:38 PM - Tom Stricker Back To Top (73702)

 The directions work correctly as far as that goes; problem is that coorelation feature is only available for PowerBI Pro installs (at least, that is the message I received when I tried to add the feature as a visualization tool.

 















get free sql tips
agree to terms