Data Mining Clustering Example in SQL Server Analysis Services SSAS

By:   |   Updated: 2013-07-05   |   Comments (5)   |   Related: > Analysis Services Development


Problem

Where can one find a simple example utilizing the data mining clustering capabilities in SQL Server Analysis Services?

Solution

The solution presented here creates a two dimensional data table with clearly observable clusters. Next, this data is read into the clustering algorithm in SSAS where the clusters can be determined and then displayed.

The first step is to create a table and load it with data using the T-SQL sample below.

CREATE TABLE [dbo].[tblClusterExample](
    pkClusterExample int identity (1,1) primary key,
 [AttributeX] [decimal](18, 4) NOT NULL,
 [AttributeY] [decimal](18, 4) NOT NULL
) ON [PRIMARY]
GO
insert into dbo.tblClusterExample values (rand()/10 +0.8, rand()/10 +0.8)
insert into dbo.tblClusterExample values (rand()/10 +0.2, rand()/10 +0.8)
insert into dbo.tblClusterExample values (rand()/10 +0.2, rand()/10 +0.2)
insert into dbo.tblClusterExample values (rand()/10 +0.8, rand()/10 +0.2)
go 10000
select top 10 * from dbo.tblClusterExample

The output from the T-SQL code is shown below. This data will be used as the input for the data mining algorithm.

The output from the T-SQL code i

The image below is a plot made from sampling 100 points generated by the T-SQL code above. The four clusters are clearly defined in the corners of the plot.

a plot made from sampling 100 points generated by the T-SQL code

In Visual Studio, create a new Analysis Services Multidimensional and Data Mining Project. In this tip, we will name the project ClusteringExample. Click on OK when finished with the New Project window.

 create a new Analysis Services Multidimensional and Data Mining Project

In the Solution Explorer window, right-click on the Data Sources folder and choose "New Data Source..." to initiate the Data Source Wizard. Click on "Next".

right-click on the Data Sources folder and choose "New Data Source..."

Choose your data connection, if one exists. If a Data connection does not exist, click on "New..." to create a new data connection. In this example, we are using a connection to the Tips database on the localhost. Click on "Next".

If a Data connection does not exist, click on "New..."

On the Impersonation Information screen, click on "Use a specific Windows user name and password." Enter your username and password. Click on "Next".

Enter your username and password. Click on "Next ".

On the Completing the Wizard screen, the data source name can be changed if desired. Click on "Finish".

the data source name can be changed if desired.

In the Solution Explorer window, right-click on the Data Source Views folder and choose "New Data Source View..." to launch the Data Source View Wizard. Click on "Next".

choose "New Data Source View..."

On the Select a Data Source page in the Relational data sources window, select the data source we created in the above step. Click on "Next".

select the data source we created

On the Name Matching page, check the box for "Create logical relationships by matching columns." In the Foreign key matches box, press the "Same name as primary key" button. Click on "Next".

In the Foreign key matches box, press the "Same name as primary key" button.

On the Select Tables and Views page, move tblClusterExample from the Available Objects box to the Included object box by selecting tblClusterExample in the Available objects box and then clicking on the ">" box. Click on "Next".

On the Select Tables and Views page, move tblClusterExample from the Available Objects box

On the Completing the Wizard page, give the Data Source View a name and click on "Finish".

give the Data Source View a name and click on "Finish"

The Solution Explorer should appear as it does below with one Data Source and one Data Source View defined.

The Solution Explorer should appear as it does below with one Data Source and one Data Source View defined.

In the Solution Explorer window, right-click on the Mining Structures folder and choose "New Mining Structure..." to launch the Data Mining Wizard. Click on "Next".

In the Solution Explorer window, right-click on the Mining Structures folder and choose "New Mining Structure..."

On the Select the Definition Method page, press the radio button labeled "From existing relational database or data warehouse". Click on "Next".

On the Select the Definition Method page, press the radio button labeled "From existing relational database or data warehouse".

On the Create the Data Mining Structure page, press the radio button labeled "Create mining structure with a mining model". Choose the "Microsoft Clustering" data mining technique from the drop-down box.

Choose the "Microsoft Clustering" data mining technique from the drop-down box.

On the Select Data Source View page, choose "Tips" from the Available data source views. Please note this is the data source view we created earlier. Click on "Next".

choose "Tips" from the Available data source views

On the Specify Table Types page, make sure the Case box is checked and the Nested box is unchecked for the table named tblClusterExample.

On the Specify Table Types page, make sure the Case box is checked and the Nested box is unchecked for the table named tblClusterExample.

On the Specify the Training Data page, in the Key column check the box that corresponds to the pkClusterExample column. In the Input column, check the boxes for the AttributeX and AttributeY columns. These two columns will be used as input to the clustering algorithm. Because clustering is an example of unsupervised learning where the prediction of a class label is not of concern, there will be no columns checked in the Predictable column. Click on "Next".

the Specify the Training Data page

On the Specify Columns' Content and Data Type page, we see the columns to be used in the mining model structure, along with their content and data types. Click on "Next".

the Specify Columns' Content and Data Type page

On the Create Testing Set page, we will set the "Percentage of data for testing" and "Maximum number of cases in testing data set" to zero for this example. Please note that there needs to be a set of data reserved for testing or use 10-fold cross validation to prevent over fitting the data mining model to the training data. Click on "Next".

the Create Testing Set page, we will set the "Percentage of data for testing" and "Maximum number of cases in testing data set" to zero

On the Completing the Wizard page, the name of the mining structure and model can be changed. Click on "Finish".

Completing the Wizard

The image below shows the Visual Studio Solution Explorer window and the Mining Structure tab of the data mining model that was created in the steps above.

the Visual Studio Solution Explorer window

The next image from Visual Studio shows the Mining Models tab of the data mining model that was created in the steps above.

the Mining Models tab

Under the Mining Model tab, right click on the box labeled "Microsoft_Clustering" to bring up the menu shown below and select "Set Algorithm Parameters..." to display the Algorithm Parameters window.

right click on the box labeled "Microsoft_Clustering"

In the Algorithm Parameters window, set CLUSTER_COUNT to 0 and MINIMUM_SUPPORT to 5000. Setting the CLUSTER_COUNT to zero will force the algorithm to determine the optimum number of clusters which can be beneficial when exploring new sets of data. If the MINIMUM_SUPPORT value is set too low or to the default, then the algorithm will generate more than the four clusters desired for our example dataset. The description box explains the meanings of the parameters and their values.

The description box explains the meanings of the parameters and their values.

The next few images show the process of building, deploying, and processing the data mining model on the SQL Server Analysis Services server from the Visual Studio project. In the Solution Explorer window in the Mining Structures folder, right-click on the ClusteringExample.dmm mining structure and choose "Process..."

the process of building, deploying, and processing the data mining model on the SQL Server Analysis Services server

When asked "Would you like to build and deploy the project first?", choose "Yes". If prompted to enter the password for your data source, please do so.

"Yes". If prompted to enter the password for your data source, please do so.

In the Process Mining Structure window, leave the default values and click on "Run...".

 leave the default values and click on "Run...".

The Process Progress window should appear while the cluster algorithm is running and the data mining structures are built and deployed. When the status is "Process succeeded", click on "Close" to go back to the Process Mining Structure window. Click on "Close" again to return to the Visual Studio window.

MSSQLTips.com Sample Image

Click on the Mining Model Viewer tab and click on the Cluster Diagram tab. The clusters will display on the screen. Hovering the mouse pointer over the rectangular cluster objects will pop-up an informational box showing the population of the cluster.

Click on the Mining Model Viewer tab and click on the Cluster Diagram tab.

Clicking on the Cluster Profiles tab of the Mining Model Viewer reveals the population of each cluster and the overall population.

Clicking on the Cluster Profiles tab of the Mining Model Viewer reveals the population of each cluster and the overall population.

Clicking on the Cluster Characteristics tab of the Mining Model Viewer shows the properties of one cluster or all of the clusters.

Clicking on the Cluster Characteristics tab of the Mining Model Viewer shows the properties of one cluster or all of the clusters.

Clicking on the Cluster Discrimination tab of the Mining Model Viewer allows for the comparison of the composition of any two clusters.

Clicking on the Cluster Discrimination tab of the Mining Model Viewer allows for the comparison of the composition of any two clusters.

The image below is taken from the Object Explorer window for Analysis Services in SQL Server Management Studio after deployment and processing. Notice how the objects displayed here match the objects seen in the Solution Explorer window within Visual Studio.

The image below is taken from the Object Explorer window for Analysis Services in SQL Server Management Studio after deployment and processing.
Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Dallas Snider Dr. Dallas Snider is an Assistant Professor in the Computer Science Department at the University of West Florida and has 18+ years of SQL experience.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2013-07-05

Comments For This Article




Monday, December 8, 2014 - 3:52:26 AM - nervin Back To Top (35542)
 

 I have some continuous and discrete data that i want cluster them, when I clustered these data the range numbers of state in shading variable of cluster diagram don't show correct range of my data, for example when I have range data for an attribute min=1 and max=718 but after cluster show out of this range in cluster diagram, I do not know what to do to fix this problem.


Wednesday, March 19, 2014 - 12:19:44 AM - james Back To Top (29807)

How to export output to table in Ms.SQL ?


Monday, July 8, 2013 - 11:05:19 PM - mona Back To Top (25753)

I have a project on the Application of data mining in SQL SEVERPapers or a good book in this area know me to introduce
I'm very grateful that can help me


Saturday, July 6, 2013 - 8:54:39 AM - Dallas Snider Back To Top (25735)

SQL Server 2012 which uses Visual Studio 2010


Friday, July 5, 2013 - 11:57:51 AM - John Back To Top (25729)

What version of Visual Studio/SQL Server BIDS are you using to create the "Analysis Services Multidimensional and Data Mining Project" project ?















get free sql tips
agree to terms