By: Scott Murray | Updated: 2018-05-29 | Comments (5) | Related: > SQL Server 2017
Problem
How does the new data discovery and categorization features in SQL Server Management Studio 2017 work?
Solution
For many years, it seemed that SQL Server Management Studio (SSMS) languished as the basic go to tool for SQL Server querying and management, but very few new features were added. With the recent revision of SSMS 2017, one very beneficial new feature was implemented. The Data Discovery and Classification function provides, at its most true level, a way to investigate and classify the data in your database. The tool not only makes the search and investigation process easy, but it also provides rich reporting functions. The tool searches out sensitive data whether it be social security numbers, account numbers, healthcare details, or other personally identifiable data. It is no coincidence that this process is used to help with compliance of regulations such as GDPR. Additionally, the tool not only tells you where sensitive data may be, but it also assists with determine the level of sensitivity with the data.
We will look at how to do this with SSMS as well as with T-SQL scripts.
Before being able to use the data discovery tool, the 17.6 version of SSMS 2017 must be downloaded and then installed. You can have both SSMS 2016 and SSMS 2017 installed on the same machine so no need to worry about using both tools if need.
Using SQL Server Data Categorizations with SSMS GUI
The Microsoft website has similar steps as to what I show below, but I thought it would be good to just quickly show how this can be done using the SSMS GUI.
Upon installation, a new option is available in the Tasks context menu for a database, as shown below.
Initiating the process generates the following, fairly non-descript report. We do need to drill into the various options to further utilize this tool. However, at the onset, we see that no current fields have been classified. However, we have a recommendation, shown at both the top and bottom of the window, that potentially 92 columns have traits that would possibly indicate classification would be beneficial.
Clicking on the recommend option provides us with a list of fields (and their related tables and schemas) and suggestions for the type of information along with the sensitivity level. Both these last items can be changed as needed.
Selecting the “Accept selected recommendations” option applies the classification details to the checked fields.
As an alternative, individual classifications can be added manually using the Add Classification option shown in step 1 below. After selecting Add Classification, the schema, table, column, information type, and sensitivity level are specified. Finally, to add the classification, select Add.
Deleting the classification for a particular column is as simple as clicking the trash can button and then saving the changes.
The classification report can be executed using the View Report option.
This report could easily be the first step in providing you with the list of tables and fields that may require further review and possibly the application of encryption, masking, or other means of restricting access to the data or even ultimately removal of the data. Additionally, the report could easily be provided to external parties who are inquiring about the sensitivity of your organization’s data, such as an auditor.
Using SQL Server Data Categorizations with T-SQL
As an alternative, the classification data is actually stored in the extended properties for each table, so you could directly query the sys.extended_properties view and specifically look for the sys_sensitivity_label_name and sys_information_type_name data rows. I have included a “sample query” below which illustrates some of the data that can be queried.
WITH INFOTYPE AS ( SELECT SCHEMA_NAME(TABLELIST.schema_id) AS Schema_Name ,TABLELIST.name AS Table_Name ,COLLIST.name AS Column_Name ,TABLELIST.object_id ,COLLIST.column_id ,EXTPROP.value AS Information_Type FROM sys.extended_properties AS EXTPROP INNER JOIN sys.tablesAS TABLELIST ON EXTPROP.major_id = TABLELIST.object_id INNER JOIN sys.all_columns AS COLLIST ON TABLELIST.object_ID = COLLIST.object_id AND EXTPROP.minor_id = COLLIST.column_id WHERE EXTPROP.name ='sys_information_type_name' ) , SENSITIVITY_LEVEL AS ( SELECT SCHEMA_NAME(TABLELIST.schema_id) AS Schema_Name ,TABLELIST.name AS Table_Name ,COLLIST.name AS Column_Name ,TABLELIST.object_id ,COLLIST.column_id ,EXTPROP.value AS Sensitivity_Level FROM sys.extended_properties AS EXTPROP INNER JOIN sys.tablesAS TABLELIST ON EXTPROP.major_id = TABLELIST.object_id INNER JOIN sys.all_columns AS COLLIST ON TABLELIST.object_ID = COLLIST.object_id AND EXTPROP.minor_id = COLLIST.column_id WHERE EXTPROP.name ='sys_sensitivity_label_name' ) SELECT INFOTYPE.Schema_Name ,INFOTYPE.Table_Name ,INFOTYPE.Column_Name ,INFOTYPE.Information_Type ,SENTLEVEL.Sensitivity_Level FROM INFOTYPE AS INFOTYPE INNER JOIN SENSITIVITY_LEVEL AS SENTLEVEL ON INFOTYPE.object_id = SENTLEVEL.object_id AND INFOTYPE.column_id = SENTLEVEL.column_id;
GO
This query provides the following results.
As the data for the categories is stored in the extended properties for each table, we can insert the data directly into the properties using the sp_addextendedproperty function. Both, the sensitivity level and the information type require two items, a name and a GUID value, which means four executions of sp_addextendedproperty. Thus, four data points are inserted using sp_addextendedproperty.
For the sensitivity level, you will want to insert only one of the following values for the name:
- Public
- General
- Confidential
- Confidential – GDPR
- Highly Confidential
- Highly Confidential – GDPR
For the information type, you will want to insert only one of the following values for the name:
- Banking
- Contact Info
- Credentials
- Credit Card
- Date of Birth
- Financial
- Health, Name
- National ID
- Networking
- SSN
- Other
For the ID field, we would want to generate a GUID to insert for that value. A sample set of the 4 queries is illustrated below.
EXEC sp_addextendedproperty @name=N'sys_sensitivity_label_id', @level0type= N'schema', @level0name=N'Purchasing', @level1type=N'table', @level1name= N'Suppliers', @level2type=N'column', @level2name=N'BankAccountCode', @value= '8F2DA722-043B-452D-87F2-02239F9613D6'; GO EXEC sp_addextendedproperty @name=N'sys_sensitivity_label_name', @level0type= N'schema', @level0name=N'Purchasing', @level1type=N'table', @level1name= N'Suppliers', @level2type=N'column', @level2name=N'BankAccountCode', @value= N'Confidential'; GO EXEC sp_addextendedproperty @name=N'sys_information_type_id', @level0type= N'schema', @level0name=N'Purchasing', @level1type=N'table', @level1name= N'Suppliers', @level2type=N'column', @level2name=N'BankAccountCode', @value= 'B6900930-EBAF-468D-A4E9-DC7142F95C38'; GO EXEC sp_addextendedproperty @name=N'sys_information_type_name', @level0type= N'schema', @level0name=N'Purchasing', @level1type=N'table', @level1name= N'Suppliers', @level2type=N'column', @level2name=N'BankAccountCode', @value= N'Banking'; GO
Executing these four statements successfully adds categorization to the Suppliers.BankAccountCode field.
Certainly, situations may arise where you will have a large number of fields to update with these categorizations. Using the tool in SSMS could be tedious as the number of fields grows. Thus, we could use the power of SQL to generate the sp_addextendedproperty statements needed for a long list of columns. As illustrated in the following code, we use SQL to generate each of the four sp_addextendedproperty statements for any column \ field that contains “BANK” in the field name. The parameter at the top of the query allows us to quickly change what column names potentially would be included.
DECLARE @COLUMN_SEARCH varchar(100);SET @COLUMN_SEARCH = 'BANK'; SELECT SCHEMA_NAME(TABLELIST.schema_id) AS Schema_Name, TABLELIST.name AS Table_Name, COLLIST.name AS Column_Name, 'exec sp_addextendedproperty @name=N''sys_information_type_name'',@level0type=N''schema'',@level0name=N''' + SCHEMA_NAME(TABLELIST.schema_id) + ''' ,@level1type=N''table'',@level1name=N''' + TABLELIST.name + ''' ,@level2type=N''column'',@level2name=N''' + COLLIST.name + ''' ,@value=N''Banking'';' --Change this line to one of the information types: Banking, Contact Info, Credentials, Credit Card, Date of Birth, Financial, Health, Name, National ID, Networking, SSN, Other FROM sys.tables AS TABLELIST INNER JOIN sys.all_columns AS COLLIST ON TABLELIST.object_ID = COLLIST.object_id WHERE COLLIST.name LIKE '%' + @COLUMN_SEARCH + '%' UNION ALL SELECT SCHEMA_NAME(TABLELIST.schema_id) AS Schema_Name, TABLELIST.name AS Table_Name, COLLIST.name AS Column_Name, 'exec sp_addextendedproperty @name=N''sys_information_type_id'',@level0type=N''schema'',@level0name=N''' + SCHEMA_NAME(TABLELIST.schema_id) + ''' ,@level1type=N''table'',@level1name=N''' + TABLELIST.name + ''' ,@level2type=N''column'',@level2name=N''' + COLLIST.name + ''' ,@value=''' + CAST(NEWID() AS varchar(100)) + ''';' FROM sys.tables AS TABLELIST INNER JOIN sys.all_columns AS COLLIST ON TABLELIST.object_ID = COLLIST.object_id WHERE COLLIST.name LIKE '%' + @COLUMN_SEARCH + '%' UNION ALL SELECT SCHEMA_NAME(TABLELIST.schema_id) AS Schema_Name, TABLELIST.name AS Table_Name, COLLIST.name AS Column_Name, 'exec sp_addextendedproperty @name=N''sys_sensitivity_label_name'',@level0type=N''schema'',@level0name=N''' + SCHEMA_NAME(TABLELIST.schema_id) + ''' ,@level1type=N''table'',@level1name=N''' + TABLELIST.name + ''' ,@level2type=N''column'',@level2name=N''' + COLLIST.name + ''' ,@value=N''Confidential'';' -- Change Sensitivity Level to one of the levels: Public, General, Confidential, Confidential - GDPR, Highly Confidential, Highly Confidential - GDPR FROM sys.tables AS TABLELIST INNER JOIN sys.all_columns AS COLLIST ON TABLELIST.object_ID = COLLIST.object_id WHERE COLLIST.name LIKE '%' + @COLUMN_SEARCH + '%' UNION ALL SELECT SCHEMA_NAME(TABLELIST.schema_id) AS Schema_Name, TABLELIST.name AS Table_Name, COLLIST.name AS Column_Name, 'exec sp_addextendedproperty @name=N''sys_sensitivity_label_id'',@level0type=N''schema'',@level0name=N''' + SCHEMA_NAME(TABLELIST.schema_id) + ''' ,@level1type=N''table'',@level1name=N''' + TABLELIST.name + ''' ,@level2type=N''column'',@level2name=N''' + COLLIST.name + ''' ,@value=''' + CAST(NEWID() AS varchar(100)) + ''';' FROM sys.tables AS TABLELIST INNER JOIN sys.all_columns AS COLLIST ON TABLELIST.object_ID = COLLIST.object_id WHERE COLLIST.name LIKE '%' + @COLUMN_SEARCH + '%' ORDER BY TABLELIST.name , COLLIST.name;
As you review the code, there are two additional items that need to be highlighted.
For the below lines, in the first and third queries respectively, you would want to adjust for the correct information type and sensitivity level based on the values needed. These values are the available values in the Categorization tool in SSMS, and I would recommend not utilizing any other values.
''' ,@value=N''Banking'';' --change this line to one of the information types: Banking, Contact Info, Credentials, Credit Card, Date of Birth, Financial, Health, Name, National ID, Networking, SSN, Other ''' ,@value=N''Confidential'';' --change Sensitivity Level to one of the levels: Public, General, Confidential, Confidential - GDPR, Highly Confidential, Highly Confidential - GDPR
The second item is that the SELECT statement generates a new GUID using the NEWID() function; this methods provides a GUID ID for each ID field.
Running the above code generates the following query results.
Utilizing the fourth column, we can copy and paste the results from that column into a new query tab and then execute the SQL code. I know it does not look pretty, but it functions well.
Using this method, as illustrated next, we were able to quickly apply categorization properties to 10 columns in just a few easy steps. Of course, using this method, requires us to be cautious if future changes to the Data Discovery and Categorization requires additional properties to be adjusted or set.
Next Steps
- See all the tips on enencrypting data in SQL Server
- Masking Sensitive Data
- Dynamic Data Masking
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2018-05-29