By: Dallas Snider | Updated: 2014-12-23 | Comments | Related: > TSQL
Problem
In data mining, we often need to perform min-max normalization on numeric data type columns to prevent one column from skewing or dominating the models produced by the machine learning algorithms.
Solution
In this tip, we will demonstrate how to use T-SQL to perform a min-max normalization from 0.0 to 1.0 inclusive, which can be represented as [0.0, 1.0]. The normalized value equals (the original column value minus the minimum column value) divided by the (maximum column value minus the minimum column value).
We will create a table that contains two data columns, AttributeX and AttributeY.
IF EXISTS (SELECT name FROM sys.tables WHERE name = N'tblExample') DROP TABLE dbo.tblExample go create table dbo.tblExample ( pKey integer identity(1,1) PRIMARY KEY, AttributeX decimal(7,3), AttributeY decimal(7,3) )
Next, we will populate this table with 27 rows of data. AttributeY will be 10 times the value of AttributeX.
insert into dbo.tblExample values (24, 240) insert into dbo.tblExample values (26, 260) insert into dbo.tblExample values (27, 270) insert into dbo.tblExample values (27, 270) insert into dbo.tblExample values (30, 300) insert into dbo.tblExample values (31, 310) insert into dbo.tblExample values (31, 310) insert into dbo.tblExample values (32, 320) insert into dbo.tblExample values (33, 330) insert into dbo.tblExample values (33, 330) insert into dbo.tblExample values (36, 360) insert into dbo.tblExample values (36, 360) insert into dbo.tblExample values (36, 360) insert into dbo.tblExample values (36, 360) insert into dbo.tblExample values (41, 410) insert into dbo.tblExample values (44, 440) insert into dbo.tblExample values (44, 440) insert into dbo.tblExample values (46, 460) insert into dbo.tblExample values (46, 460) insert into dbo.tblExample values (46, 460) insert into dbo.tblExample values (46, 460) insert into dbo.tblExample values (47, 470) insert into dbo.tblExample values (51, 510) insert into dbo.tblExample values (56, 560) insert into dbo.tblExample values (57, 570) insert into dbo.tblExample values (63, 630) insert into dbo.tblExample values (81, 810)
After inserting the rows, we will select the row count and the minimum and maximum values from our data columns.
select count(*) as RowQuantity, min(AttributeX) as MinimumX, max(AttributeX) as MaximumX, min(AttributeY) as MinimumY, max(AttributeY) as MaximumY from dbo.tblExample
T-SQL Code for Min Max Normalization in SQL Server
Now we will use the T-SQL code below to perform min-max normalizations on AttributeX and AttributeY.
--Declare the variables declare @minX as decimal(7,3) declare @maxX as decimal(7,3) declare @minY as decimal(7,3) declare @maxY as decimal(7,3) --Set the variables set @minX=(select min(AttributeX) from dbo.tblExample) set @maxX=(select max(AttributeX) from dbo.tblExample) set @minY=(select min(AttributeY) from dbo.tblExample) set @maxY=(select max(AttributeY) from dbo.tblExample) --Perform the normalization select AttributeX, (AttributeX - @minX)/(@maxX - @minX) as NormalizedX, AttributeY, (AttributeY - @minY)/(@maxY - @minY) as NormalizedY from dbo.tblExample
The results from the previous query are shown below. Notice how the normalized values for AttributeX and AttributeY are equal on each row, while the original values differ.
Next Steps
Normalized values can be inserted into a table and then read into data mining algorithms. You can read more about SQL Server data mining in the following tips from MSSQLTips.com.
- SQL Server 2012 Analysis Services Association Rules Data Mining Example
- Explaining the Calculations of Probability and Importance for Complex Association Rules in SQL Server 2012 Analysis Services
- Classic Machine Learning Example In SQL Server Analysis Services
- Microsoft Naïve Bayes Data Mining Model in SQL Server Analysis Services
- Data Mining Clustering Example in SQL Server Analysis Services SSAS
- SQL Server Analysis Services Glossary
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2014-12-23