Connecting WEKA to SQL Server 2016

By:   |   Updated: 2017-01-09   |   Comments (3)   |   Related: > Analysis Services Development


Problem

I need to use SQL Server 2016 data as an input to WEKA. How can I connect WEKA to SQL Server 2016?

Solution

WEKA is an open source data mining application, written in Java, that provides a variety of machine learning algorithms. Unlike SQL Server Analysis Services, WEKA generates Receiver Operating Characteristic (ROC) curves, calculates the area under the ROC curve, and calculates precision and recall. In this tip we will step through the process of connecting WEKA 3.8.0 to SQL Server 2016 on Windows 10. The data in this tip is from a banking example provided by Bamshad Mobasher at DePaul University.

After installing WEKA 3.8.0 with the default settings, download and install the Microsoft JDBC 4 driver. Please remember the path and file name of the driver as it is needed in the next step. In this tip, the installation program was run with Administrator rights and the driver was installed in the directory C:\Program Files\Microsoft JDBC Driver 6.0 for SQL Server.

Install the JDBC driver

Next we need to add the full path and file name of the Microsoft JDBC 4 driver to the CLASSPATH system environment variable. In this example, the path and file name is C:\Program Files\Microsoft JDBC Driver 6.0 for SQL Server\sqljdbc_6.0\enu\sqljdbc4.jar.

  • Open the Control Panel.
  • Click on System.
  • Click on Advanced System Settings.
Advanced System Settings

Click on Environment Variables...

System Properties

Look for a CLASSPATH variable in the System Variables box.

Edit Existing Classpath

If a CLASSPATH variable exists, click on CLASSPATH and then Edit. In the Variable Value box, append a semicolon and the full path to the JDBC jar file to the existing CLASSPATH value. Make sure to include the file name. Click on OK to close the Edit System Variable window.

Edit Existing Classpath Value

If CLASSPATH does not exist click on New.

Classpath Is Not Defined

In the Variable Value box enter the full path and file name to the JDBC jar file. Click on OK to close the New System Variable window.

New classpath entered

Now that the CLASSPATH is pointing to the JDBC driver, click on OK to close the Environment Variable window.

Classpath changes finished

Click on OK to close the System Properties window.

With the CLASSPATH set we need to make modifications to the WEKA database utilities property file. You will need to have administrator rights to do the next series of steps. Go to the WEKA home directory. In this example it is C:\Program Files\Weka-3-8. Locate the weka.jar file.

Control Panel Administrative Tools ODBC Data Sources (64-bit)

From the weka.jar file, we want to extract the directory structure and files. The goal is to have the directory .\weka\experiment created and its files extracted to this directory. In this example the full path is C:\Program Files\Weka-3-8\weka\experiment. Notice the database utility property files at the bottom of the following image.

WEKA Experiment folder and database utility property files

We need to make several changes to the DatabaseUtils.props file. Make a backup copy of DatabaseUtils.props. Open the file DatabaseUtils.props.mssqlserver2005 with a text editor and save as DatabaseUtils.props to overwrite the existing DatabaseUtils.props file. Make sure your entries for JDBC driver and database URL are similar to what are shown below. In this example the server is localhost, the port is 1433, and the database is named MSSQLTips. The # symbol denotes a comment.

# JDBC driver (comma-separated list)
jdbcDriver=com.microsoft.sqlserver.jdbc.SQLServerDriver

# database URL
jdbcURL=jdbc:sqlserver://localhost:1433;databasename=MSSQLTips

Below the database URL entry there is a section that configures the data type mappings between WEKA and SQL Server. In the example below I have added comments to show the default mappings and I have added some more mappings.

# specific data types
# string, getString() = 0;    --> nominal
# boolean, getBoolean() = 1;  --> nominal
# double, getDouble() = 2;    --> numeric
# byte, getByte() = 3;        --> numeric
# short, getByte()= 4;        --> numeric
# int, getInteger() = 5;      --> numeric
# long, getLong() = 6;        --> numeric
# float, getFloat() = 7;      --> numeric
# date, getDate() = 8;        --> date
# text, getString() = 9;      --> string
# time, getTime() = 10;       --> date
# timestamp, getTime() = 11;  --> date

# map SQL Server data type to WEKA data type
# default mappings
varchar=0
float=2
tinyint=3
int=5
# values added manually
string=0
bigint=6
nvarchar=9
decimal=2
bit=1

Save the changes to this file.

Next, we will use the T-SQL examples below to create and then populate our example table. By default, WEKA will use the last column as the label used for the classification algorithms. In this example, the column DidBuyAPersonalEquityPlan will be the class label in WEKA.

create table dbo.tblBankData 
(
  UniqueID int identity(1,1) Primary Key,
  AgeRange varchar(8) not null,
  Gender varchar(6) not null,
  Region varchar(10) not null,
  Income varchar(8) not null,
  Married varchar(3) not null,
  NumberOfChildren int not null,
  IsCarOwner varchar(3) not null,
  HasSavingsAccount varchar(3) not null,
  HasCurrentAccount varchar(3) not null,
  HasMortgage varchar(3) not null,
  DidBuyAPersonalEquityPlan varchar(3) not null
)

Click here for the T-SQL code to insert 600 rows into the table.

Now let's start WEKA 3.8 (with console).

Start WEKA 3.8 with console

In the WEKA GUI Chooser window, select Explorer.

Select Explorer in the WEKA GUI Chooser window

Click on Open DB...

Click on Open DB...

Click on the button highlighted below to display the Database Connection Parameters window.

Click to set database user and password

Enter the database login and password. Click on OK. Please note that I was only able to connect using SQL Server Authentication.

Enter database login and password

Click on the connect to the database button.

Click on the Connect to Database button

A successful database connection will appear as shown below with a warning message in the Info box. Any connection errors will also display in this box.

Successful database connection

Enter a query and click on Execute.

Enter a query and click on Execute

The results will display in the Results box. If satisfied with the results click on OK.

Examine query resutls

We now see that our SQL Server database data has been loaded into WEKA.

Data loaded into WEKA
Next Steps

Now that the data is loaded into WEKA, you can take advantage of all of the available machine learning algorithms available to classify and cluster your data. Also, please check out these tips on data mining in SQL Server Analysis Services on MSSQLTips.com.



sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Dallas Snider Dr. Dallas Snider is an Assistant Professor in the Computer Science Department at the University of West Florida and has 18+ years of SQL experience.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2017-01-09

Comments For This Article




Monday, July 15, 2019 - 2:09:18 PM - consuelo Back To Top (81752)

I can't download Weka Jar files.


Thursday, October 18, 2018 - 4:54:22 AM - Maria Back To Top (77975)

Hi,

I'm trying to connect my instance of SQL Server to Weka by your articol but I have this result:

exception: java.sql.SQLException: Unable to find a suitable driver for jdbc:sqlserver://localhost:1433;databasename=GestioneDipendenti

I set SQL Server service to connect by TCP/IP protocol and pipeline in addition.

And I'm using Microsoft SQL Server 2017 and I set DriverSQLServer\sqljdbc_7.0.0.0_ita\sqljdbc_7.0\ita\mssql-jdbc-7.0.0.jre8.jar in CLASSPATH variable environment. 

 

I cannot understand because not run this connection string. Can you help me, please?

 


Sunday, July 23, 2017 - 3:37:15 AM - Umesh Back To Top (63098)

 

 Showing error unable to find a suitable drive 















get free sql tips
agree to terms