By: Tim Cullen | Updated: 2010-11-16 | Comments (4) | Related: > Functions System
Problem
My client wants to standardize address information for existing and future addresses collected for their customers, particularly the street suffixes. The application used to enter and collect address information has the street suffix separated from the address field, but it is a textbox instead of a drop down list therefore things are not standardized. I know there are some options out there to standarize data, but they would like a less expensive alternative. Are there any functions in SQL Server that I can use to standardized data?
Solution
SQL Server offers two functions that can be used to compare string values: The SOUNDEX and DIFFERENCE functions.
The SOUNDEX function converts a phrase to a four-character code. The first character is the first letter of the phrase. Vowels are omitted unless the first letter of the phrase is a vowel and the other three characters represent the rest of the phrase. Below is a comparison between the SOUNDEX of "MSSQLTIPS" and "MSSQLTPS":
The DIFFERENCE function evaluates two expressions and assigns a value between 0 and 4, with 0 being little to no similarity and 4 representing the same or very similar phrases. This value is derived from the number of characters in the SOUNDEX of each phrase that are the same. Below is an example of the DIFFERENCE function used in two scenarios:
For the example of standardizing street suffixes, we can create a new table named ref_RoadTypes with the following structure:
(
RoadTypeID TINYINT NOT NULL
CONSTRAINT PK_refRoadTypes_RoadTypeID PRIMARY KEY
, RoadTypeAbbr VARCHAR(20) NOT NULL
, RoadType VARCHAR(50) NOT NULL
, CreatedDt DATETIME NOT NULL
CONSTRAINT dft_refRoadTypes_CreatedDt DEFAULT(CURRENT_TIMESTAMP)
, CreatedBy VARCHAR(50) NOT NULL
CONSTRAINT dft_refRoadTypes_CreatedBy DEFAULT(SYSTEM_USER)
, ModifiedDt DATETIME NOT NULL
CONSTRAINT dft_refRoadTypes_ModifiedDt DEFAULT(CURRENT_TIMESTAMP)
, ModifiedBy VARCHAR(50) NOT NULL
CONSTRAINT dft_refRoadTypes_ModifiedBy DEFAULT(SYSTEM_USER)
)
Use this sample script RoadTypeInsert.txt to create a test table and test data. Once the values are in the table you can use either function to retrieve the best match for the street suffix. Below are samples of each function being used to retrieve the best match:
FUNCTION | QUERY |
---|---|
SOUNDEX | DECLARE @Suffix VARCHAR(20) SELECT @Suffix = 'Bolevard' --(intentionally misspelled) SELECT RoadTypeAbbr FROM dbo.ref_RoadTypes WHERE SOUNDEX(RoadType) = SOUNDEX(@Suffix)Results |
DIFFERENCE | DECLARE @Phrase VARCHAR(20) SELECT @Phrase = 'Str' SELECT TOP 1 RoadTypeAbbr FROM dbo.ref_RoadTypes ORDER BY DIFFERENCE(@Phrase, RoadType) DESCResults |
Next Steps
- Read more about the SOUNDEX function
- Read more about the DIFFERENCE function
- Compare the SOUNDEX and DIFFERENCE functions
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2010-11-16