Use Dynamic Data Masking in SQL Server 2016 CTP 2.1

By:   |   Updated: 2015-07-09   |   Comments (4)   |   Related: > SQL Server 2016


Problem

SQL Server 2016 introduces a new security feature called Dynamic Data Masking (DDM). In this tip we'll describe its purpose, show a brief example of how it works, list some limitations, and note how the feature has already changed since CTP 2.0 was first released in May.

Note: This tip was written when CTP 2.1 was the most recent CTP. Some of these details may change by the time SQL Server 2016 is released.

Solution

Dynamic Data Masking is not meant to be a bulletproof security measure like proper encryption. You can consider it to be more like obfuscation of personally-identifiable information (PII) from lesser-privileged users, and it can be easy to apply the masking constraints to, say, a development or test environment where this information should not be visible.

Note that for CTP 2.0, at least at the time of writing, the documentation shows that trace flags 209 and 219 are required to use Dynamic Data Masking. This restriction has been lifted since CTP 2.1, and should only be necessary if you are still running build 13.0.200. (In fact if you enable the trace flags in newer builds, it seems to have the opposite effect: You get an error message when trying to use DDM.)

To show how the feature works, let's create a simple set of 10 rows from the metadata:

USE tempdb;
GO
SELECT TOP (10) 
  ID        = IDENTITY(INT, 1, 1),
  FirstName = RIGHT(o.name, 8), 
  LastName  = LEFT(o.name, 12), 
  Email     = LEFT(o.name, 9) + '@' + RIGHT(o.name, 11) + '.net',
  SSN       = STUFF(STUFF(RIGHT('000000000' 
            + RTRIM(ABS(CHECKSUM(NEWID()))),9),4,0,'-'),7,0,'-'),
  BirthDate = DATEADD(DAY, -ABS(CHECKSUM(NEWID())%10000), o.modify_date)
INTO dbo.DDM
FROM sys.all_objects AS o
ORDER BY NEWID();

Now, let's add some masks to some of these columns to protect the privacy of the data from normal (non-sysadmin) users. There are three types of masks currently supported:

  • default()
    • for strings, shows x for each character (up to 4)
    • for numeric types, shows 0
    • for dates, shows 2000-01-01
  • email()
  • partial()
    • you can define a custom string to represent the mask, including how many leading and trailing characters to reveal from the original string (examples below)

So for these columns, the following mask settings might make sense:

ALTER TABLE dbo.DDM ALTER COLUMN FirstName 
    ADD MASKED WITH (FUNCTION = 'partial(1, "XXXXX", 0)');
    -- show only the first character of the first name

ALTER TABLE dbo.DDM ALTER COLUMN LastName  
    ADD MASKED WITH (FUNCTION = 'partial(2, "XXXXXXXX", 1)');
    -- show the first two characters and the last character of the last name

ALTER TABLE dbo.DDM ALTER COLUMN Email     
    ADD MASKED WITH (FUNCTION = 'email()');
    -- all addresses will show as [email protected]

ALTER TABLE dbo.DDM ALTER COLUMN SSN       
    ADD MASKED WITH (FUNCTION = 'partial(1,"XX-XX-XXX",1)');
    -- SSNs will become nXX-XX-XXXn

ALTER TABLE dbo.DDM ALTER COLUMN BirthDate 
    ADD MASKED WITH (FUNCTION = 'default()');
    -- all Birthdates will show as 2000-01-01 

Now, what does this data look like? Well, to a sysadmin or db_owner, there is no masking implemented at all (note that the script populates arbitrary data, so your values won't be the same):

SELECT * FROM dbo.DDM;

Dynamic Data Masking not quite working

However, if we create a peon user and just grant them SELECT on the table:

CREATE USER blat WITHOUT LOGIN;
GRANT SELECT ON dbo.DDM TO blat;

Then we can try running the same query in their security context:

EXECUTE AS USER = N'blat';
GO
SELECT * FROM dbo.DDM;
GO
REVERT;

And the results:

Dynamic Data Masking in action

Now we can see that the data is masked as expected.

Limitations

Bypassing masking

The current version of the documentation states that you can bypass the dynamic data masking by applying explicit conversions to masked columns in ad hoc queries. This was possible in CTP 2.0, but is no longer possible as of CTP 2.1.

EXECUTE AS USER = N'blat';
GO
SELECT TOP (1) FirstName, CAST(FirstName AS VARCHAR(32)) FROM dbo.DDM;
GO
REVERT;

In CTP 2.0, this would have yielded something like:

sXXXXX  sactions

In CTP 2.1, the data masking does not apply exactly as specified, but still protects the data; it reverts to the default four-character masking:

sXXXXX  xxxx

There is another way to bypass the masking functionality, at least as of CTP 2.1: Involve a second table.

CREATE TABLE dbo.SecondTable(ID INT);
INSERT dbo.SecondTable(ID) VALUES(1);
GO
EXECUTE AS USER = N'blat';
GO
SELECT d.FirstName FROM dbo.DDM AS d
  WHERE EXISTS (SELECT 1 FROM dbo.SecondTable AS s
    WHERE s.ID = d.ID);
GO
REVERT;

This will yield the un-masked value:

sactions

In fact all that's really required is to give the table an alias other than its own name; even when a join is not present, this will disable any masking when the data is returned (again, in CTP 2.1; I'm sure this will be fixed in future builds):

EXECUTE AS USER = N'blat';
GO
SELECT TOP (1) FirstName FROM dbo.DDM AS d ORDER BY d.ID;
GO
REVERT;

Results:

sactions

If you drop the alias or use "DDM" as the alias, masking still occurs, so it only seems to get thwarted if there is some type of mapping to a new alias. It is important to note that this means that, even if you create stored procedures, your users may still have access to the un-obfuscated data, as they likely would with ad hoc query privileges.

Data Types

As with most v1 features, there are some limitations here, which may or may not be permanent. The following data types are unlikely to be supported when SQL Server 2016 is released:

  • (var)binary / image
  • xml
  • sql_variant
  • hierarchyid
  • uniqueidentifier
  • rowversion (timestamp)
  • spatial types

Also, not all masking functions are relevant for all data types: One of the first things I tried was to only mask the month/day of the birthdate, so you could determine approximate age without really getting into PII. I received this error message:

Msg 16003, Level 16, State 0
The data type of column 'BirthDate' does not support data 
masking function 'partial'.

Finally, you won't be able to apply data masking functions against encrypted columns or columns that participate in a sparse column set.

Changing/removing a mask

To change the definition for an existing masked column, you can just run the same ALTER COLUMN command (it is not intuitive, but there is no change/alter mask DDL (yet)):

ALTER TABLE dbo.DDM ALTER COLUMN SSN       
    ADD MASKED WITH (FUNCTION = 'partial(2,"X-XX-XXX",1)');
    -- SSNs will now become nnX-XX-XXXn

And to remove the mask definition altogether, you can just say DROP MASKED:

ALTER TABLE dbo.DDM ALTER COLUMN SSN DROP MASKED;
Summary

Dynamic data masking provides a simple way to implement obfuscation of private data (and in my tests so far, the performance impact is negligible). It is important to understand the limitations, and keep in mind that it is not true encryption and that the data cannot be protected in all scenarios. In another tip I will talk about a more end-to-end way to protect sensitive data, even from the sysadmin role, in SQL Server 2016: Always Encrypted.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Aaron Bertrand Aaron Bertrand (@AaronBertrand) is a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com, and also blogs at sqlblog.org.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2015-07-09

Comments For This Article




Tuesday, September 10, 2019 - 9:51:09 AM - Ivan Saez Back To Top (82335)

Aaron,

Great article! I've been looking information about the pricing of Dynamic Data Masking but haven't found any usable information. Do you have some url, info about the pricing?

regards,

Ivan


Wednesday, July 15, 2015 - 1:09:23 PM - Aaron Bertrand Back To Top (38218)

AZJim, it's meant to hide or obscure data from users who don't have direct ad hoc query access. So for example, if you have an application that shows SSN on a form, it can show the masked or unmasked value depending on the user's permissions, with no changes to the app and no need to create a view or dynamic SQL.

If you have users with direct access to the tables, the ability to run ad hoc queries, etc., then this is not going to provide a lot of value (even when they fix these minor issues I've pointed out, more still remain).

It is meant mostly as a complement to other data protection techniques and/or to keep the casual observer from accidentally seeing more than they should. If you're truly trying to protect your data from malicious activity, to meet HIPAA and other regulations, etc., you'll want to use encryption, not masking.


Wednesday, July 15, 2015 - 11:48:30 AM - AZJim Back To Top (38217)

 

Aaron ... I am wondering how useful this truly is.  I have done this in the past with the use of views.  Am I missing something?


Wednesday, July 15, 2015 - 1:43:13 AM - Edward Back To Top (38210)

Nice to know about this feature that would help setting up the test and non prod environments. Good to know the limitiations as of now and hope MS solves these in the final release. Thanks for sharing...















get free sql tips
agree to terms