SQL Server function to validate Social Security Numbers SSN

By:   |   Updated: 2011-01-10   |   Comments (5)   |   Related: 1 | 2 | > Functions User Defined UDF


Problem

Most of us have to deal with storing SSN information in databases, and how do we know if the SSN entered is valid or not. In this tip, the High Group list issued by the Social Security Administration is used to check if a particular SSN is valid.

Solution

SSN is broken into three parts xxx-xx-xxxx, the first three represent the AreaCode (geographical location), the next two represent the GroupCode and the last four represent the sequence. SSA website publishes the list of High Group data that is used to generate the next set of SSNs, and the list can be downloaded from the following link http://www.ssa.gov/employer/ssns/HGDec0110.txt. For a particular AreaCode, the GroupCode is issued in a "Odd", "Even", "Even", "Odd" pattern as described below:

1. Odd numbered GroupCodes "01", "03", "05", "07", "09" are issued first for any areacode
2. Even numbered GroupCodes between "10" and "98" are issued next
3. Even numbered GroupCodes "02", "04", "06", "08" are issued next
4. Finally odd numbered GroupCodes between "11" and "99" are issued.

To setup this test we will do the following:

  1. Create a table to store the High Group information
  2. Populate this table using the data from SSA website
  3. Run the T-SQL code to validate the SSN

Step 1

Create a table to store the High Group information downloaded from the above link.

CREATE TABLE [dbo].[SSN_RangeList]
(
    [AreaCode] [smallint] NOT NULL,
    [GroupCode] [tinyint] NOT NULL
) ON [PRIMARY] 

Step 2

Load the data from the SSA website.  You will need to remove the '*' in the data.  To make this simpler for this test you can download this data that has been extracted already from the December 2010 file.

Step 3

The SELECT statement below uses the High Group Data and the pattern described above to validate SSNs. Sub select is used to break the given SSN into the three parts (Area, HighGroup and Sequence) which are used to compare against the SSN_RangeList table (HighGroup data from SSA)

DECLARE @ssn VARCHAR(15)
SET @ssn = '900-00-0000'

SELECT 
        @ssn SSN,
        CASE 
        WHEN ISNUMERIC(REPLACE(@SSN, '-', '')) <> 1 OR Sequence = 0 OR HighGroup = 0 THEN 'Invalid SSN' 
        ELSE
            CASE WHEN Groupcode % 2 = 1 THEN --odd groupcode 
                CASE WHEN GroupCode <10 THEN 
                    CASE WHEN HighGroup % 2 = 1 AND HighGroup <= GroupCode THEN 'Valid SSN' 
                    ELSE 'Invalid SSN' 
                    END
                ELSE
                    CASE WHEN HighGroup % 2 = 0 THEN 'Valid SSN' 
                    WHEN HighGroup % 2 = 1 THEN 
                        CASE WHEN HighGroup <= GroupCode THEN 'Valid SSN' ELSE 'Invalid SSN' END
                    END
                END
            ELSE --even groupcode
                CASE WHEN GroupCode >= 10 THEN
                    CASE WHEN HighGroup %2 =1 THEN
                        CASE WHEN HighGroup <=9 THEN 'Valid SSN' ELSE 'Invalid SSN' END
                    WHEN HighGroup %2 = 0 THEN
                        CASE WHEN HighGroup <= GroupCode THEN 'Valid SSN' ELSE 'Invalid SSN' END
                    END
                ELSE
                    CASE WHEN HighGroup %2 =1 THEN
                        CASE WHEN HighGroup <=9 THEN 'Valid SSN' ELSE 'Invalid SSN' END
                    WHEN HighGroup %2 = 0 THEN 
                        CASE WHEN HighGroup >=10 THEN 'Valid SSN' 
                        WHEN HighGroup < 10 THEN 
                            CASE WHEN HighGroup <= GroupCode THEN 'Valid SSN' ELSE 'Invalid SSN' END
                        END
                    END
                END
            END
        END [ValidSSN]
FROM 
(
SELECT 
    LEFT(REPLICATE('0', 9 - LEN(LTRIM(REPLACE(@SSN, '-', '')))) + REPLACE(@SSN, '-', ''), 3) Area,
    SUBSTRING(REPLICATE('0', 9 - LEN(LTRIM(REPLACE(@SSN, '-', '')))) + REPLACE(@SSN, '-', ''), 4, 2) HighGroup,
    RIGHT(REPLICATE('0', 9 - LEN(REPLACE(@SSN, '-', ''))) + REPLACE(@SSN, '-', ''), 4) Sequence,
    REPLICATE('0', 9 - LEN(REPLACE(@SSN, '-', ''))) + REPLACE(@SSN, '-', '') SSN
) v
LEFT JOIN SSN_RangeList sr
    ON v.Area = CAST(sr.AreaCode AS VARCHAR(3))
Next Steps
  • This SELECT statement can be made into a view and can be used to join against any table containing SSNs.
  • You can also create a function using this code to test and validate.
  • Additional checks can be added on the sub select to remove non numeric characters based on the input SSN data.
  • Note: The data that was used for this sample was from December 2010.  New lists are generated each month. You can download the latest SSA High Group data from this website.
  • Note: As of June 25, 2011 the SSA will begin a new process for this, read more here.


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Vamsi Bogullu Vamsi Bogullu is a Database Architect administering database servers, developing applications and designing enterprise BI solutions.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2011-01-10

Comments For This Article




Monday, September 19, 2022 - 1:48:45 PM - SSN Generator 2022 Back To Top (90497)
Ssn Generator Coding
United States

# SSNPad.py - creates a SSN scratch pad. This is used to map one SSN value to
# a random replacement value. Useful to protect a SSN number while preserving
# the format of the data. Reasonably secure if the scratch pad is kept secure.

# This script creates a text file with the following structure:
# XXXXXXXXX|YYYYYYYYY
# where XXXXXXXXX is the source SSN and
# YYYYYYYYY is a psuedo(semi)-random alternate SSN

# Sample:

# SSN numbers are composed of three parts:
# AAA-GG-IIII
# AAA = The Area (used to map to a state, doesn't anymore)
# GG = Group 01-99
# IIII a zero padded number between 0001-9999

# This process takes a specific source Area and Group (i.e. 123-45) and maps it
# to a randomly chosen different replacement Area and Group (i.e. 763-22) and
# then generates a random replacement ID for each source ID.

# A sample of the output:
# 001010001|686075132
# 001010002|686074919
# 001010003|686077114
# 001010004|686078786

import random
from itertools import product

f = open("SSNList.txt",'w+')

master_groups = list(product(range(1, 899 + 1), range(1, 99+1)))
area_groups = list(master_groups)

for (area, group) in master_groups:
area_group = random.choice(area_groups)
id_list = range(1, 9999 + 1)
for n in range(1, 9999 + 1):
leftSSN = "{0:0>3}{1:0>2}{2:0>4}".format(area, group, n)

random_id = random.choice(id_list)

rightSSN = '{0:0>3}{1:0>2}{2:0>4}'.format(
area_group[0], area_group[1], random_id)

id_list.remove(random_id)

f.write('{0}|{1}n'.format(leftSSN,rightSSN))
area_groups.remove(area_group)

f.close()

https://en.gravatar.com/ssngeneratororg

Wednesday, November 11, 2020 - 4:22:24 PM - Greg Robidoux Back To Top (87786)
Hi Dray,

here is one way you can do this. I am just creating this table to store the SSNs and adding a few rows, but you could use any table that has SSNs.

CREATE TABLE [dbo].[SSN_List]
(
[SSN] varchar(15) NOT NULL
) ON [PRIMARY]

Here is how you could create the view using the above table.

CREATE VIEW vwSSNcheck
AS
SELECT
SSN,
CASE
WHEN ISNUMERIC(REPLACE(SSN, '-', '')) <> 1 OR Sequence = 0 OR HighGroup = 0 THEN 'Invalid SSN'
ELSE
CASE WHEN Groupcode % 2 = 1 THEN --odd groupcode
CASE WHEN GroupCode <10 THEN
CASE WHEN HighGroup % 2 = 1 AND HighGroup <= GroupCode THEN 'Valid SSN'
ELSE 'Invalid SSN'
END
ELSE
CASE WHEN HighGroup % 2 = 0 THEN 'Valid SSN'
WHEN HighGroup % 2 = 1 THEN
CASE WHEN HighGroup <= GroupCode THEN 'Valid SSN' ELSE 'Invalid SSN' END
END
END
ELSE --even groupcode
CASE WHEN GroupCode >= 10 THEN
CASE WHEN HighGroup %2 =1 THEN
CASE WHEN HighGroup <=9 THEN 'Valid SSN' ELSE 'Invalid SSN' END
WHEN HighGroup %2 = 0 THEN
CASE WHEN HighGroup <= GroupCode THEN 'Valid SSN' ELSE 'Invalid SSN' END
END
ELSE
CASE WHEN HighGroup %2 =1 THEN
CASE WHEN HighGroup <=9 THEN 'Valid SSN' ELSE 'Invalid SSN' END
WHEN HighGroup %2 = 0 THEN
CASE WHEN HighGroup >=10 THEN 'Valid SSN'
WHEN HighGroup < 10 THEN
CASE WHEN HighGroup <= GroupCode THEN 'Valid SSN' ELSE 'Invalid SSN' END
END
END
END
END
END [ValidSSN]
FROM
(
SELECT
LEFT(REPLICATE('0', 9 - LEN(LTRIM(REPLACE(SSN, '-', '')))) + REPLACE(SSN, '-', ''), 3) Area,
SUBSTRING(REPLICATE('0', 9 - LEN(LTRIM(REPLACE(SSN, '-', '')))) + REPLACE(SSN, '-', ''), 4, 2) HighGroup,
RIGHT(REPLICATE('0', 9 - LEN(REPLACE(SSN, '-', ''))) + REPLACE(SSN, '-', ''), 4) Sequence,
REPLICATE('0', 9 - LEN(REPLACE(SSN, '-', ''))) + REPLACE(SSN, '-', '') SSN
FROM SSN_List
) v
LEFT JOIN SSN_RangeList sr
ON v.Area = CAST(sr.AreaCode AS VARCHAR(3))

Query the data
SELECT * FROM vwSSNcheck

Sunday, November 8, 2020 - 9:53:56 PM - Dray Back To Top (87774)
When you said to create view from select statement and join to table how would you do that?

Wednesday, May 27, 2020 - 5:31:15 PM - Greg Robidoux Back To Top (85788)

Hi Mike,

I just tested the code and it seems to be working.  Can you let me know what problems you are having.

-Greg


Wednesday, May 27, 2020 - 4:19:18 PM - mike Back To Top (85787)

This is not working















get free sql tips
agree to terms