New Command in SQL Server 2019 ADD SENSITIVITY CLASSIFICATION

By: Aaron Bertrand | Updated: 2018-10-12 | Comments (3) | Related: > SQL Server 2019

Problem

Microsoft added a task in SQL Server Management Studio 17.5 called "Classify Data." The purpose is to help identify columns that are potentially sensitive in nature, and that may need to be protected in order to remain compliant under various regulations, including SOX, HIPAA, PCI, and GDPR. It does a pretty good job of identifying vulnerable columns and helping you classify them depending on the risk and the type of exposure they present. But aside from showing you the results of this classification, this feature does not raise any additional visibility to these sensitive columns, never mind any suspicious access to them.

Solution

For background on the functionality introduced in SQL Server 2017 and SSMS 17.5, imagine we have this table:

CREATE TABLE dbo.Contractors
(
  ContractorID  int,
  FirstName     sysname,
  LastName      sysname,
  SSN           char(9),
  Email         varchar(320),
  PasswordHash  varbinary(256),
  HourlyRate    decimal(6,2)
);

When I run the wizard, I get the following recommendations:

Data sensitivity classification recommendations in SSMS 17.5

You might not agree with all of the classifications, so you can change them, and even can add your own, specifying various information types and sensitivity levels. These are:

sql server 2019 add sensitivity classification function 002

When you select the columns you want to classify, click Accept, and then Save, it adds extended properties around those columns. The report in SSMS uses those extended properties, and displays them as follows:

SQL Data Classification Report in SSMS 17.5

You could of course write your own app that consumes them, and maybe there are already tools out there that do, but nothing really happens here inside of SQL Server, except these extended properties exist and they surface the relevant columns on the report.

SQL Server SENSITIVITY CLASSIFICATION

In SQL Server 2019, there is a lot more automation built into the system, allowing you to easily add sensitive data labels to columns, which will get pulled into audits by default. Let's take the same table above, and add our own sensitivity classifications similar to the ones we added above with extended properties, using the ADD SENSITIVITY CLASSIFICATION command (already available in Azure SQL Database):

ADD SENSITIVITY CLASSIFICATION TO
    dbo.Contractors.FirstName, 
    dbo.Contractors.LastName
WITH (LABEL = 'Confidential - GDPR', INFORMATION_TYPE = 'Contact Info');

ADD SENSITIVITY CLASSIFICATION TO dbo.Contractors.SSN
WITH (LABEL = 'Highly Confidential', INFORMATION_TYPE = 'National ID');

ADD SENSITIVITY CLASSIFICATION TO
    dbo.Contractors.email, 
    dbo.Contractors.PasswordHash
WITH (LABEL = 'Confidential', INFORMATION_TYPE = 'Credentials');

ADD SENSITIVITY CLASSIFICATION TO dbo.Contractors.HourlyRate
WITH (LABEL = 'Highly Confidential', INFORMATION_TYPE = 'Financial');

These don't create extended properties, but rather we can see these in sys.sensitivity_classifications:

sql server 2019 add sensitivity classification function 004

If we are creating an audit, we don't have to do anything special to pick up these classifications, except be auditing the table (in other words, existing audits will simply start inheriting these classifications as they are added). So, if we have a server audit:

USE master;
GO

CREATE SERVER AUDIT GDPRAudit TO FILE (FILEPATH = 'C:\temp\Audit\');
GO  

ALTER SERVER AUDIT GDPRAudit WITH (STATE = ON);
GO

Then a database audit that is monitoring read activity on the table:

USE HR;
GO

CREATE DATABASE AUDIT SPECIFICATION AuditContractors
 FOR SERVER AUDIT GDPRAudit
 ADD (SELECT ON dbo.Contractors BY dbo) WITH (STATE = ON);

With the audit enabled, if we run a couple of queries:

SELECT * FROM dbo.Contractors;
SELECT FirstName, LastName, HourlyRate FROM dbo.Contractors;

We can observe access to specific types of information this way, with the new column data_sensitivity_information that is included in the audit:

SELECT
  session_server_principal_name,
  event_time,
 [host_name],
  [object] = [database_name] + '.' + [schema_name] + '.' + [object_name],
  [statement],
  data_sensitivity_information = CONVERT(xml, data_sensitivity_information)
FROM sys.fn_get_audit_file ('c:\temp\Audit\GDPRAudit_*.sqlaudit', default, default)
WHERE action_id = 'SL'; -- SELECT

Results:

You can click on any XML column value to see these results:

-- from the first row, with SELECT *:
<sensitivity_attributes>
  <sensitivity_attribute label="Confidential - GDPR" information_type="Contact Info" />
  <sensitivity_attribute label="Highly Confidential" information_type="National ID" />
  <sensitivity_attribute label="Confidential" information_type="Credentials" />
  <sensitivity_attribute label="Highly Confidential" information_type="Financial" />
</sensitivity_attributes>
 
-- from the second row, with specific columns:
<sensitivity_attributes>
  <sensitivity_attribute label="Confidential - GDPR" information_type="Contact Info" />
  <sensitivity_attribute label="Highly Confidential" information_type="Financial" />
</sensitivity_attributes>

It is interesting to note that FirstName and LastName combined to yield only one element in the XML. This means that whether you have one or 50 columns with a specific label and information type combination, it won’t add any unnecessary data to the audit. Anyway, for those of you that are XML-savvy, you can likely see how easy it would be to extract that information as nodes and then run queries against it, making it very easy to identify all the accesses to specific labels or label/type combinations.

I feel like there might be a market for a conversion tool that would translate these data classifications you may have already invested in (or built tools that consume) from extended properties to sensitivity classifications, or vice-versa. I'm not going to handle synchronization here, but I'll offer up a start, taking the extended properties and building dynamic statements to migrate them:

DECLARE @sql nvarchar(max) = N'';

SELECT @sql += N'ADD SENSITIVITY CLASSIFICATION TO ' 
  + QUOTENAME(s.name) + '.' + QUOTENAME(o.name) + '.' + QUOTENAME(c.name)
  + ' WITH (LABEL = ''' 
  + REPLACE(CONVERT(nvarchar(256), l.value), '''', '''''') 
  + ''', INFORMATION_TYPE = ''' 
  + REPLACE(CONVERT(nvarchar(256), t.value), '''', '''''') 
  + ''');' + CHAR(13) + CHAR(10)
FROM sys.extended_properties AS t
INNER JOIN sys.extended_properties AS l
ON t.class = l.class AND t.major_id = l.major_id AND t.minor_id = l.minor_id
INNER JOIN sys.objects AS o
ON t.major_id = o.[object_id]
INNER JOIN sys.columns AS c
ON t.major_id = c.[object_id]
AND t.minor_id = c.column_id
INNER JOIN sys.schemas AS s
ON o.[schema_id] = s.[schema_id]
WHERE t.name = N'sys_information_type_name'
  AND l.name = N'sys_sensitivity_label_name';

PRINT @sql;

That's just a quick exercise. To keep things relatively consistent between the SSMS report and the audit (and any other tool that consumes either or both), and potentially between different versions of SQL Server in your environment, you'll also want to work out the opposite: how to convert sensitivity classifications to sp_addextendedproperty calls.

Summary

There is still going to be some subjectivity here; only you can truly know how sensitive each entity in your data model might be. But SQL Server continues to introduce innovate features that help you protect your sensitive data and remain compliant with various standards and regulations.

Next Steps

Read on for related tips and other resources:

About the author

Aaron Bertrand (@AaronBertrand) is a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com, and also blogs at sqlblog.org.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Article Last Updated: 2018-10-12

Monday, September 14, 2020 - 8:41:43 AM - Donna	Back To Top (86469)
Aaron, thank you for an informative and useful article. One question, if I may? Do you happen to know where the rules for identifying Information Type and Recommended Label are stored? And if the base ruleset can be edited? I need to apply different classifications e.g. Staff Ethnic Origin is OFFICIAL - MEDIUM and DPA SENSITIVE. Cheers, Donna Kelly

Tuesday, April 2, 2019 - 8:14:38 PM - Joe McBrat	Back To Top (79449)
Good article. Seems MS has a long way to go on this. I would like to customize my own categories CJIS, NFIRS and a few more I have to worry about and report on. The Alerting aspects is nice also. I am not using Azure, yet. So I will have to wait for some of these features. But this helps get me going.