SQL Server Management Studio 2017 Data Discovery and Categorization

By:   |   Comments (5)   |   Related: > SQL Server 2017


Problem

How does the new data discovery and categorization features in SQL Server Management Studio 2017 work?

Solution

For many years, it seemed that SQL Server Management Studio (SSMS) languished as the basic go to tool for SQL Server querying and management, but very few new features were added. With the recent revision of SSMS 2017, one very beneficial new feature was implemented. The Data Discovery and Classification function provides, at its most true level, a way to investigate and classify the data in your database. The tool not only makes the search and investigation process easy, but it also provides rich reporting functions. The tool searches out sensitive data whether it be social security numbers, account numbers, healthcare details, or other personally identifiable data. It is no coincidence that this process is used to help with compliance of regulations such as GDPR. Additionally, the tool not only tells you where sensitive data may be, but it also assists with determine the level of sensitivity with the data.

We will look at how to do this with SSMS as well as with T-SQL scripts. 

Before being able to use the data discovery tool, the 17.6 version of SSMS 2017 must be downloaded and then installed. You can have both SSMS 2016 and SSMS 2017 installed on the same machine so no need to worry about using both tools if need.

Using SQL Server Data Categorizations with SSMS GUI

The Microsoft website has similar steps as to what I show below, but I thought it would be good to just quickly show how this can be done using the SSMS GUI.

Upon installation, a new option is available in the Tasks context menu for a database, as shown below.

right contexxt menu - Description: right context menu

Initiating the process generates the following, fairly non-descript report. We do need to drill into the various options to further utilize this tool. However, at the onset, we see that no current fields have been classified. However, we have a recommendation, shown at both the top and bottom of the window, that potentially 92 columns have traits that would possibly indicate classification would be beneficial.

classificaiton report - Description: classification report

Clicking on the recommend option provides us with a list of fields (and their related tables and schemas) and suggestions for the type of information along with the sensitivity level. Both these last items can be changed as needed.

recommendation list  - Description: recommendation list
Information Type Changes - Description: Information Type Changes

senstivity level changes - Description: senstivity level changes

Selecting the “Accept selected recommendations” option applies the classification details to the checked fields.

data classification

As an alternative, individual classifications can be added manually using the Add Classification option shown in step 1 below. After selecting Add Classification, the schema, table, column, information type, and sensitivity level are specified. Finally, to add the classification, select Add.

classified columns

Deleting the classification for a particular column is as simple as clicking the trash can button and then saving the changes.

Delete Classification - Description: Delete Classification

The classification report can be executed using the View Report option.

classification report - Description: classification report

This report could easily be the first step in providing you with the list of tables and fields that may require further review and possibly the application of encryption, masking, or other means of restricting access to the data or even ultimately removal of the data. Additionally, the report could easily be provided to external parties who are inquiring about the sensitivity of your organization’s data, such as an auditor.

Using SQL Server Data Categorizations with T-SQL

As an alternative, the classification data is actually stored in the extended properties for each table, so you could directly query the sys.extended_properties view and specifically look for the sys_sensitivity_label_name and sys_information_type_name data rows. I have included a “sample query” below which illustrates some of the data that can be queried.

WITH INFOTYPE
   AS
   (
SELECT
    SCHEMA_NAME(TABLELIST.schema_id) AS Schema_Name
   ,TABLELIST.name AS Table_Name   ,COLLIST.name AS Column_Name
   ,TABLELIST.object_id
   ,COLLIST.column_id
   ,EXTPROP.value AS Information_Type 
   FROM
   sys.extended_properties AS EXTPROP
   INNER JOIN
   sys.tablesAS TABLELIST
   ON
   EXTPROP.major_id = TABLELIST.object_id
   INNER JOIN
   sys.all_columns AS COLLIST
   ON 
   TABLELIST.object_ID = COLLIST.object_id
   AND
   EXTPROP.minor_id = COLLIST.column_id
   WHERE
   EXTPROP.name ='sys_information_type_name'
   )
   ,
  SENSITIVITY_LEVEL
  AS
  (
   SELECT
   SCHEMA_NAME(TABLELIST.schema_id) AS Schema_Name
   ,TABLELIST.name AS Table_Name   ,COLLIST.name AS Column_Name
   ,TABLELIST.object_id
   ,COLLIST.column_id
   ,EXTPROP.value AS Sensitivity_Level
   FROM
   sys.extended_properties AS EXTPROP
   INNER JOIN
   sys.tablesAS TABLELIST
   ON
   EXTPROP.major_id = TABLELIST.object_id
   INNER JOIN
   sys.all_columns AS COLLIST
   ON 
   TABLELIST.object_ID = COLLIST.object_id
   AND
   EXTPROP.minor_id = COLLIST.column_id
   WHERE
   EXTPROP.name ='sys_sensitivity_label_name'
   )
 
   SELECT
   INFOTYPE.Schema_Name
   ,INFOTYPE.Table_Name   ,INFOTYPE.Column_Name
   ,INFOTYPE.Information_Type
   ,SENTLEVEL.Sensitivity_Level
   FROM
   INFOTYPE AS INFOTYPE
   INNER JOIN
   SENSITIVITY_LEVEL AS SENTLEVEL
   ON
   INFOTYPE.object_id = SENTLEVEL.object_id
   AND
   INFOTYPE.column_id = SENTLEVEL.column_id;
GO

This query provides the following results.

extended properties results - Description: extended properties results


As the data for the categories is stored in the extended properties for each table, we can insert the data directly into the properties using the sp_addextendedproperty function.  Both, the sensitivity level and the information type require two items, a name and a GUID value, which means four executions of sp_addextendedproperty. Thus, four data points are inserted using sp_addextendedproperty. 

For the sensitivity level, you will want to insert only one of the following values for the name:

  • Public
  • General
  • Confidential
  • Confidential – GDPR
  • Highly Confidential
  • Highly Confidential – GDPR

For the information type, you will want to insert only one of the following values for the name:

  • Banking
  • Contact Info
  • Credentials
  • Credit Card
  • Date of Birth
  • Financial
  • Health, Name
  • National ID
  • Networking
  • SSN
  • Other

For the ID field, we would want to generate a GUID to insert for that value. A sample set of the 4 queries is illustrated below.

EXEC sp_addextendedproperty 
  @name=N'sys_sensitivity_label_id', 
  @level0type= N'schema', 
  @level0name=N'Purchasing', 
  @level1type=N'table', 
  @level1name= N'Suppliers', 
  @level2type=N'column', 
  @level2name=N'BankAccountCode', 
  @value= '8F2DA722-043B-452D-87F2-02239F9613D6'; 
GO 
 
EXEC sp_addextendedproperty 
  @name=N'sys_sensitivity_label_name', 
  @level0type= N'schema', 
  @level0name=N'Purchasing', 
  @level1type=N'table', 
  @level1name= N'Suppliers', 
  @level2type=N'column', 
  @level2name=N'BankAccountCode', 
  @value= N'Confidential'; 
GO 
 
EXEC sp_addextendedproperty 
  @name=N'sys_information_type_id', 
  @level0type= N'schema', 
  @level0name=N'Purchasing', 
  @level1type=N'table', 
  @level1name= N'Suppliers', 
  @level2type=N'column', 
  @level2name=N'BankAccountCode', 
  @value= 'B6900930-EBAF-468D-A4E9-DC7142F95C38'; 
GO 
 
EXEC sp_addextendedproperty 
  @name=N'sys_information_type_name', 
  @level0type= N'schema', 
  @level0name=N'Purchasing', 
  @level1type=N'table', 
  @level1name= N'Suppliers', 
  @level2type=N'column', 
  @level2name=N'BankAccountCode', 
  @value= N'Banking'; 
GO			

Executing these four statements successfully adds categorization to the Suppliers.BankAccountCode field.

sp_addextendedpropery result - Description: sp_addextendedproperty result

Certainly, situations may arise where you will have a large number of fields to update with these categorizations. Using the tool in SSMS could be tedious as the number of fields grows. Thus, we could use the power of SQL to generate the sp_addextendedproperty statements needed for a long list of columns.  As illustrated in the following code, we use SQL to generate each of the four sp_addextendedproperty statements for any column \ field that contains “BANK” in the field name. The parameter at the top of the query allows us to quickly change what column names potentially would be included.

DECLARE @COLUMN_SEARCH varchar(100);SET @COLUMN_SEARCH = 'BANK';
 
SELECT
  SCHEMA_NAME(TABLELIST.schema_id) AS Schema_Name,
  TABLELIST.name AS Table_Name,
  COLLIST.name AS Column_Name,
  'exec sp_addextendedproperty @name=N''sys_information_type_name'',@level0type=N''schema'',@level0name=N'''
  +
  SCHEMA_NAME(TABLELIST.schema_id)
  +
  ''' ,@level1type=N''table'',@level1name=N'''
  +
  TABLELIST.name
  +
  ''' ,@level2type=N''column'',@level2name=N'''
  +
  COLLIST.name
  +
  ''' ,@value=N''Banking'';'  --Change this line to one of the information types: Banking, Contact Info, Credentials, Credit Card, Date of Birth, Financial, Health, Name, National ID, Networking, SSN, Other
FROM sys.tables AS TABLELIST
INNER JOIN sys.all_columns AS COLLIST
  ON TABLELIST.object_ID = COLLIST.object_id
WHERE COLLIST.name LIKE '%' + @COLUMN_SEARCH + '%'
 
UNION ALL
 
SELECT
  SCHEMA_NAME(TABLELIST.schema_id) AS Schema_Name,
  TABLELIST.name AS Table_Name,
  COLLIST.name AS Column_Name,
  'exec sp_addextendedproperty @name=N''sys_information_type_id'',@level0type=N''schema'',@level0name=N'''
  +
  SCHEMA_NAME(TABLELIST.schema_id)
  +
  ''' ,@level1type=N''table'',@level1name=N'''
  +
  TABLELIST.name
  +
  ''' ,@level2type=N''column'',@level2name=N'''
  +
  COLLIST.name
  +
  ''' ,@value=''' + CAST(NEWID() AS varchar(100)) + ''';'
FROM sys.tables AS TABLELIST
INNER JOIN sys.all_columns AS COLLIST
  ON TABLELIST.object_ID = COLLIST.object_id
WHERE COLLIST.name LIKE '%' + @COLUMN_SEARCH + '%'
 
UNION ALL
  
SELECT
  SCHEMA_NAME(TABLELIST.schema_id) AS Schema_Name,
  TABLELIST.name AS Table_Name,
  COLLIST.name AS Column_Name,
  'exec sp_addextendedproperty @name=N''sys_sensitivity_label_name'',@level0type=N''schema'',@level0name=N'''
  +
  SCHEMA_NAME(TABLELIST.schema_id)
  +
  ''' ,@level1type=N''table'',@level1name=N'''
  +
  TABLELIST.name
  +
  ''' ,@level2type=N''column'',@level2name=N'''
  +
  COLLIST.name
  +
  ''' ,@value=N''Confidential'';' -- Change Sensitivity Level to one of the levels: Public, General, Confidential, Confidential - GDPR, Highly Confidential, Highly Confidential - GDPR 
FROM sys.tables AS TABLELIST
INNER JOIN sys.all_columns AS COLLIST
  ON TABLELIST.object_ID = COLLIST.object_id
WHERE COLLIST.name LIKE '%' + @COLUMN_SEARCH + '%'
 
UNION ALL
 
SELECT
  SCHEMA_NAME(TABLELIST.schema_id) AS Schema_Name,
  TABLELIST.name AS Table_Name,
  COLLIST.name AS Column_Name,
  'exec sp_addextendedproperty @name=N''sys_sensitivity_label_id'',@level0type=N''schema'',@level0name=N'''
  +
  SCHEMA_NAME(TABLELIST.schema_id)
  +
  ''' ,@level1type=N''table'',@level1name=N'''
  +
  TABLELIST.name
  +
  ''' ,@level2type=N''column'',@level2name=N'''
  +
  COLLIST.name
  +
  ''' ,@value=''' + CAST(NEWID() AS varchar(100)) + ''';'
 
FROM sys.tables AS TABLELIST
INNER JOIN sys.all_columns AS COLLIST
  ON TABLELIST.object_ID = COLLIST.object_id
WHERE COLLIST.name LIKE '%' + @COLUMN_SEARCH + '%'
ORDER BY TABLELIST.name
, COLLIST.name;
			

As you review the code, there are two additional items that need to be highlighted. 

For the below lines, in the first and third queries respectively, you would want to adjust for the correct information type and sensitivity level based on the values needed. These values are the available values in the Categorization tool in SSMS, and I would recommend not utilizing any other values.

''' ,@value=N''Banking'';'      --change this line to one of the information types: Banking, Contact Info, Credentials, Credit Card, Date of Birth, Financial, Health, Name, National ID, Networking, SSN, Other
''' ,@value=N''Confidential'';' --change Sensitivity Level to one of the levels: Public, General, Confidential, Confidential - GDPR, Highly Confidential, Highly Confidential - GDPR 
			

The second item is that the SELECT statement generates a new GUID using the NEWID() function; this methods provides a GUID ID for each ID field.

Running the above code generates the following query results.

extended property query results - Description: extended property query results

Utilizing the fourth column, we can copy and paste the results from that column into a new query tab and then execute the SQL code. I know it does not look pretty, but it functions well.

sql query

Using this method, as illustrated next, we were able to quickly apply categorization properties to 10 columns in just a few easy steps. Of course, using this method, requires us to be cautious if future changes to the Data Discovery and Categorization requires additional properties to be adjusted or set.

data classification
Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Scott Murray Scott Murray has a passion for crafting BI Solutions with SharePoint, SSAS, OLAP and SSRS.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Tuesday, September 15, 2020 - 8:07:58 AM - Greg Robidoux Back To Top (86477)
Hi Donna,

Take a look at this tip: https://www.mssqltips.com/sqlservertip/2040/use-sql-server-profiler-to-trace-database-calls-from-third-party-applications/

-Greg

Tuesday, September 15, 2020 - 7:46:56 AM - Scott Murray Back To Top (86476)
You could run a profiler session to see where the data is being saved in the database.

Tuesday, September 15, 2020 - 6:56:27 AM - Donna Back To Top (86472)
Scott,
thanks for your response.
Perhaps you could be a little more explicit about "try a profiler session"?
I 'm at a loss as to how that would help.
Cheers,
Donna Kelly

Monday, September 14, 2020 - 9:44:03 AM - Scott Murray Back To Top (86470)
Unfortunately, I do not. Maybe try a profiler session?

Monday, September 14, 2020 - 8:37:04 AM - Donna Back To Top (86468)
Scott,

thank you for an informative and useful article.

One question, if I may?

Do you happen to know where the rules for identifying Information Type and Recommended Label are stored? And if the base ruleset can be edited?

I need to apply different classifications e.g. Staff Ethnic Origin is OFFICIAL - MEDIUM and DPA SENSITIVE.

Cheers,
Donna Kelly














get free sql tips
agree to terms