By: Dallas Snider | Updated: 2013-06-04 | Comments (5) | Related: > Analysis Services Development
Problem
Many Data Mining or Machine Learning students have trouble making the transition from a Data Mining tool such as WEKA [1] to the data mining functionality in SQL Server Analysis Services.
Solution
The solution presented here takes a classic example from Data Mining and Machine Learning seen in differing variations in textbooks by Quinlan [2], Mitchell [3], Han, Kamber and Pei [4], and the WEKA application.
The first step is to create a table and load it with data using the T-SQL sample below.
IF EXISTS (SELECT name FROM sys.tables WHERE name = N'tblDecisionTreeExample') DROP TABLE dbo.tblDecisionTreeExample; create table dbo.tblDecisionTreeExample ( PKey integer identity(1,1) Primary Key, Outlook varchar(8) not null, Temperature varchar(4) not null, Humidity varchar(6) not null, Windy varchar(5) not null, Play varchar(3) not null ) insert into dbo.tblDecisionTreeExample values ('sunny','hot','high','FALSE','no') insert into dbo.tblDecisionTreeExample values ('sunny','hot','high','TRUE','no') insert into dbo.tblDecisionTreeExample values ('overcast','hot','high','FALSE','yes') insert into dbo.tblDecisionTreeExample values ('rainy','mild','high','FALSE','yes') insert into dbo.tblDecisionTreeExample values ('rainy','cool','normal','FALSE','yes') insert into dbo.tblDecisionTreeExample values ('rainy','cool','normal','TRUE','no') insert into dbo.tblDecisionTreeExample values ('overcast','cool','normal','TRUE','yes') insert into dbo.tblDecisionTreeExample values ('sunny','mild','high','FALSE','no') insert into dbo.tblDecisionTreeExample values ('sunny','cool','normal','FALSE','yes') insert into dbo.tblDecisionTreeExample values ('rainy','mild','normal','FALSE','yes') insert into dbo.tblDecisionTreeExample values ('sunny','mild','normal','TRUE','yes') insert into dbo.tblDecisionTreeExample values ('overcast','mild','high','TRUE','yes') insert into dbo.tblDecisionTreeExample values ('overcast','hot','normal','FALSE','yes') insert into dbo.tblDecisionTreeExample values ('rainy','mild','high','TRUE','no') select * from dbo.tblDecisionTreeExample
The output from the T-SQL code is shown below. This data will be used as the input for the data mining algorithm.
In Visual Studio, create a new Analysis Services Multidimensional and Data Mining Project. In this tip, we will name the project DecisionTreeExample. Click on OK when finished with the New Project window.
In the Solution Explorer window, right-click on the Data Sources folder and choose "New Data Source..." to initiate the Data Source Wizard. Click on "Next >".
Choose your data connection, if one exists. If a Data connection does not exist, click on "New..." to create a new data connection. In this example, we are using a connection to the Tips database on the localhost. Click on "Next >".
On the Impersonation Information screen, click on "Use a specific Windows user name and password." Enter your username and password. Click on "Next >".
On the Completing the Wizard screen, the data source name can be changed if desired. Click on "Finish".
In the Solution Explorer window, right-click on the Data Source Views folder and choose "New Data Source View..." to launch the Data Source View Wizard. Click on "Next >".
On the Select a Data Source page in the Relational data sources window, select the data source we created in the above step. Click on "Next >".
On the Name Matching page, check the box for "Create logical relationships by matching columns." In the Foreign key matches box, press the "Same name as primary key" button. Click on "Next >".
On the Select Tables and Views page, move tblDecisionTreeExample from the Available Objects box to the Included object box by selecting tblDecisionTreeExample in the Available objects box and then clicking on the ">" box. Click on "Next >".
On the Completing the Wizard page, give the Data Source View a name and click on "Finish".
The Solution Explorer should appear as it does below with one Data Source and one Data Source View defined.
In the Solution Explorer window, right-click on the Mining Structures folder and choose "New Mining Structure..." to launch the Data Mining Wizard. Click on "Next >".
On the Select the Definition Method page, press the radio button labeled "From existing relational database or data warehouse". Click on "Next >".
On the Create the Data Mining Structure page, press the radio button labeled "Create mining structure with a mining model". Choose the "Microsoft Decision Trees" data mining technique from the drop-down box.
On the Select Data Source View page, choose "Tips" from the Available data source views. Please note this is the data source view we created earlier. Click on "Next >".
On the Specify Table Types page, make sure the Case box is checked and the Nested box is unchecked for the table named tblDecisionTreeExample.
On the Specify the Training Data page, in the Key column check the box that corresponds to the PKey column. In the Input column, check the boxes for the Humidity, Outlook, Temperature, and Windy columns. These input columns will be used as input to the decision tree algorithm. In the Predictable column, check the box for the Play column which is the column the decision tree algorithm will attempt to predict. Click on "Next >".
On the Specify Columns' Content and Data Type page, we see the columns to be used in the mining model structure, along with their content and data types. Click on "Next >".
On the Create Testing Set page, we will set the "Percentage of data for testing" and "Maximum number of cases in testing data set" to zero for this example. Please note that there needs to be a set of data reserved for testing or use 10-fold cross validation to prevent overfitting the data mining model to the training data. Click on "Next >".
On the Completing the Wizard page, the name of the mining structure and model can be changed. Click on "Finish".
The image below shows the Visual Studio Solution Explorer window and the Mining Structure tab of the data mining model that was created in the steps above.
The next image from Visual Studio shows the Mining Model tab of the data mining model that was created in the steps above.
Under the Mining Model tab, right click on the box labeled "Microsoft_Decision_Trees" to bring up the menu shown below and select "Set Algorithm Parameters..." to display the Algorithm Parameters window.
In the Algorithm Parameters window, remove the COMPLEXITY_PENALTY and FORCE_REGRESSOR parameters. Set MINIMUM_SUPPORT to 1, SCORE_METHOD to 1, and SPLIT_METHOD to 2. The description box explains the meanings the parameters and their values.
The parameters and their values should now appear as shown below. Click on "OK".
The next few images show the process of building, deploying, and processing the data mining model on the SQL Server Analysis Services server from the Visual Studio project. In the Solution Explorer window in the Mining Structures folder, right-click on the DecisionTreeExample.dmm mining structure and choose "Process..."
When asked "Would you like to build and deploy the project first?", choose "Yes".
Next, the user might be given a warning about the amount of time it could possibly take to process the model. When asked "Do you wish to continue?", choose "Yes". There are only 14 records so it should only take a few seconds.
The Process Mining Model window is displayed next. In this example, we will "Process Full" and leave all other settings to their default values. Click on "Run...".
The Process Progress window is displayed while processing is occurring. Upon successful completion, click "Close".
Click on the Mining Model Viewer tab. The decision tree model will display on the screen.
Hovering the mouse pointer over the rectangular objects will pop-up an informational box showing the population of the split.
The image below is taken from the Object Explorer window for Analysis Services in SQL Server Management Studio after deployment and processing. Notice how the objects displayed here match the objects seen in the Solution Explorer window within Visual Studio.
Next Steps
- After completing the above tip, adjust the parameters and their values to see how the decision tree changes.
- Please refer to the following tips for further assistance with SQL Server Analysis Services:
References
- [1] Witten, I.H., E. Frank, E., Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed., Morgan Kaufmann Publishers, 2005.
- [2] Quinlan, R., C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.
- [3] Mitchell, T., Machine Learning, McGraw Hill, 1997.
- [4] Han, J., M. Kamber and J. Pei, Data Mining: Concepts and Techniques, Elsevier, 2012.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2013-06-04