Protecting Yourself from SQL Injection in SQL Server

Problem

In my last tip, I showed how to validate and protect yourself from user-supplied input instead of blindly concatenating the values into dynamic SQL strings. This included table, schema, server, database and procedure names, but there are other common scenarios that I wanted to address as well.

Solution

Some other areas where people often write dynamic SQL that is vulnerable to SQL injection are when they allow users to pass in the entire output column list, an IN (@list) of string or numeric literals (usually separated by commas), and entire WHERE clause expressions. I have some ideas for at least partially protecting yourself from some of these patterns, too.

Output column list

Sometimes you want to allow users to select which columns to include. So you may be expected to build this:

SET @sql = N'SELECT ' + @ColumnList + N' FROM dbo.Users;';

The column list might come in looking like this:

N'col1, col2, col3'

On the other hand if someone does something malicious, it may come in looking like this instead:

N'foo = 1; DROP TABLE dbo.Users; SELECT * '

To protect yourself against malicious behavior, you can use a split function to validate the existence of each column in the list (hopefully, though, you don't allow column names that contain things like commas or semi-colons). I like this function because it preserves the order (which might be important when reconstructing the column list, if end users care about column order):

CREATE FUNCTION dbo.ParseColumnList
(
  @List NVARCHAR(MAX)
)
RETURNS TABLE
AS
  RETURN 
  (
    SELECT i, 
      c = LTRIM(CONVERT(SYSNAME, SUBSTRING(@List, i, CHARINDEX(',', @List + ',', i) - i)))
    FROM (SELECT ROW_NUMBER() OVER (ORDER BY [object_id]) FROM sys.all_columns) AS n(i)
    WHERE i <= LEN(@List) AND SUBSTRING(',' + @List, i, 1) = ','
  );
GO

Note: sys.all_columns was used coincidentally, and not because this problem has anything to do with columns; it just happens to be a reliably large-ish catalog view. On my system, the number of rows in this catalog view will support strings up to ~9K, which would allow over 70 columns at the maximum identifier length (128 characters). If you need to cover more than this, then you can use a cross join or a numbers table (see some other splitting techniques here).

As an example, given a comma-separated list like this:

SELECT c FROM dbo.ParseColumnList(N'col1, col2, col3');

This function would return a set like this:

c
----
col1
col2
col3

Now, you can choose to return an error if any column in the list doesn't exist in the source table:

IF OBJECT_ID(N'dbo.Users') IS NOT NULL AND EXISTS 
(
  SELECT 1 FROM dbo.ParseColumnList(N'col1, col2, col3') AS f
  LEFT OUTER JOIN sys.columns AS c
  ON f.c = c.name
  AND c.[object_id] = OBJECT_ID(N'dbo.Users')
  WHERE c.name IS NULL
)
BEGIN
  RAISERROR(N'Not all columns exist.', 11, 1);
END

Or you can choose to build a column list anyway, ignoring any invalid columns and dropping them on the floor. You can guard against the case where this results in an empty list, too.

DECLARE @cols NVARCHAR(MAX), @sql NVARCHAR(MAX);

SELECT @cols = STUFF((SELECT N',' + QUOTENAME(c.name)
  FROM sys.columns AS c
  INNER JOIN dbo.ParseColumnList(N'col1, col2, col3') AS f
  ON c.name = f.c
  AND c.[object_id] = OBJECT_ID(N'dbo.Users')
  ORDER BY f.i -- important to maintain same order
  FOR XML PATH, TYPE).value(N'.[1]',N'nvarchar(max)'),1,1,N'');
  
SELECT @cols = COALESCE(@cols, N' TOP (0) [Invalid column list]=''?''');

SET @sql = N'SELECT ' + @Cols + N' FROM dbo.Users;';

If you expose aliases to users and they're not actually selecting the underlying columns, you may want to create a table that represents that mapping, so you still have something tangible to validate against.

Another workaround, in SQL Server 2012 and above, would be to build the query with the column list supplied by the user and pass it into the new dynamic management function, sys.dm_exec_describe_first_result_set. This function is meant to show you all of the column names and data types that are output by an ad hoc query, but we can take advantage of the fact that it will expose (but not raise) an error message if the query is invalid - without having to run the actual query. A limitation is that the function only looks at the first query, so that would be easy to bypass with a semi-colon. So a quick example that eliminates semi-colons from the passed-in column list (if you allow semi-colons in column names, then you likely have different problems to prioritize):

DECLARE @cols NVARCHAR(MAX) = N'col1, col2, col3';

DECLARE @sql SYSNAME = N'SELECT ' + REPLACE(@cols, ';', '') + N' FROM dbo.table;';

IF EXISTS 
(
  SELECT 1 
    FROM sys.dm_exec_describe_first_result_set(@sql, N'', 0)
    WHERE error_type IS NOT NULL
)
BEGIN
  RAISERROR(N'Column list was invalid', 11, 1);
END

IN (@list)

Often people will want to pass strings like this:

@list = '1,4,12'
--or 
@list = N'Ford,Fritchey,White'

Into a query like this:

SELECT * FROM dbo.table WHERE column IN (@list);

Parameters just don't work this way in SQL Server; @list in this context is not an array or a set, but a single string value. So the next thing people will often try is:

SET @sql = N'SELECT * FROM dbo.table WHERE column IN (' + @list + N');';

This works as is for the integer case, but for the string case, it doesn't work until you manipulate the string to surround each individual value in @list with its own string delimiters (and also handle any embedded delimiters). In both cases, though, the code is still vulnerable to SQL injection.

There are two easy workarounds to this scenario. One is to use the same split function as above, and join against the output:

DECLARE @list NVARCHAR(MAX) = N'Ford,Fritchey,White';

SELECT t.*
  FROM dbo.table AS t
  INNER JOIN dbo.ParseColumnList(@list) AS f
  ON t.column = f.c;

Note: You would probably give the function a different name; I'm just reusing it here for convenience. And if you are passing in lists of different data types, you may want to consider a different function for each data type to avoid things like implicit conversions.

Another workaround (in SQL Server 2008 and above) is to use a table-valued parameter, and pass in a set in the first place. A common trend I see is people who already have set data in a structure like a DataTable in C#, and they will transform the rows into a single, comma-separated string (or worse, XML) and pass it in to SQL Server as a simple parameter, only to have to parse it and split it apart again. We can do better than that in modern versions of SQL Server (2008+). First, create a table type:

CREATE TYPE dbo.Usernames AS TABLE(Username NVARCHAR(128));

Then you can actually populate a local variable directly:

DECLARE @list dbo.Usernames;

INSERT @list(Username) VALUES(N'Ford'),(N'Fritchey'),(N'White');

SELECT t.*
  FROM dbo.table AS t
  INNER JOIN @list AS u
  ON t.column = u.Username;

Or pass your DataTable from C# into a stored procedure parameter:

CREATE PROCEDURE dbo.FindUsers
  @list dbo.Usernames READONLY
AS
BEGIN
  SET NOCOUNT ON;
  SELECT t.*
    FROM dbo.table AS t
    INNER JOIN @list AS u
    ON t.column = u.Username;
END
GO

Then from C#, given a DataTable tvp containing those same usernames:

DataTable tvp = new DataTable();
tvp.Columns.Add(new DataColumn("Username"))
// populate DataTable

using (connectionObject)
{
    SqlCommand cmd = new SqlCommand("dbo.FindUsers", connectionObject);
    cmd.CommandType = CommandType.StoredProcedure;
    SqlParameter tvparam = cmd.Parameters.AddWithValue("@list", tvp);
    tvparam.SqlDbType = SqlDbType.Structured;
    // execute, consume results
}

Yes, it's more code, but whichever method you choose is no longer vulnerable to SQL injection, and you don't have to worry about issues with string delimiters embedded inside actual string values (you may have to choose a value delimiter other than comma, though, if it's possible for a comma to be in your data and you are using the function to parse the comma-separated list).

WHERE Clause

One of the most common uses I see for dynamic SQL is to solve the "kitchen sink" problem. You want a stored procedure that can handle any number of optional parameters, let's say for a silly table like this:

CREATE TABLE dbo.[table]
(
  col1 INT, 
  col2 NVARCHAR(MAX), 
  col3 NVARCHAR(MAX),
  col4 DATETIME
);

The first attempt to support optional parameters to search against all columns usually goes something like this (grossly simplified):

SELECT * FROM dbo.[table]
WHERE (col1 = @p1 OR @p1 IS NULL)
  AND (col2 = @p2 OR @p2 IS NULL)
  AND (col3 = @p3 OR @p3 IS NULL)
  AND (col4 = @p4 OR @p4 IS NULL);

SQL Server has a terrible time optimizing this, because it builds a plan based both on the first set of parameters that are populated and on the values of those parameters. The whole point is flexibility, but reusing that same plan over and over again for what ultimately are very different queries can lead to unpredictable performance. You can add OPTION (RECOMPILE) to eliminate plan reuse, but this is sometimes a bit heavy-handed - it means you pay compilation costs every time, even when the same parameters are sent with the same values, over and over again. In most environments, it's likely to be a mix - for some sets of parameters, you want a new plan every time, and for other sets of parameters that are more stable, plan reuse is okay.

A compromise here can be dynamic SQL; you only include the clauses that are required given the parameters that are actually populated. Generally it looks like this (and I elaborate in this video):

SET @sql = N'SELECT * FROM dbo.[table] WHERE 1 = 1';

IF @p1 IS NOT NULL
  SET @sql += N' AND col1 = @p1';
IF @p2 IS NOT NULL
  SET @sql += N' AND col2 = @p2';
IF @p3 IS NOT NULL
  SET @sql += N' AND col3 = @p3';

EXEC sys.sp_executesql @stmt = @sql,
  @params = N'@p1 INT, @p2 DATETIME, @p3 SYSNAME',
  @p1 = @p1, @p2 = @p2, @p3 = @p3;

This is relatively safe because everything that is user input is not trusted directly - the values are strongly typed and parameterized (please don't just concatenate the parameter values into the SQL string). This works best with the server option optimize for ad hoc workloads, which doesn't cache a full plan until the same query has been requested more than once (this prevents the plan cache from filling up with single-use plans). You may still find it advantageous to optionally include OPTION (RECOMPILE) at the end of @sql for certain sets of parameters that tend to lead to poor performance due to parameter sniffing, lack of updated statistics, or extreme data skew and volatility.

Now, it may be the case that you actually have to be more flexible than that - let's say you have a LOT of columns and users can also specify the operators (>, <=, LIKE, etc.) that are used to compare the column to the parameter values, and may have to be able to specify the same column more than once (for example a date range query). You might break down and let people pass in complete where clauses, with literals instead of parameters, like this:

@wc1 = N'col1 >= 5',
@wc2 = N'col2 < 20',
@wc3 = N'col3 LIKE ''%foo%'''
@wc4 = N'col4 >= ''20140101'''

Now your query is probably going to be something more like this:

SET @sql = N'SELECT * FROM dbo.[table] WHERE 1 = 1'
  + COALESCE(N' AND ' + @wc1, N'')
  + COALESCE(N' AND ' + @wc2, N'')
  + COALESCE(N' AND ' + @wc3, N'')
  + COALESCE(N' AND ' + @wc4, N'');

You start getting into pretty dangerous territory here, because it is much harder to validate these expressions and prevent things like semi-colons - what if they are literally searching for LIKE '%;%'? As long as the supported expressions are relatively simple, you can still take steps to protect yourself, but nothing is 100% foolproof. For example, you could parse each expression and ensure that everything up to the first non-alphanumeric character represents a column in that table (again, this relies on sensible naming conventions, as well as column names that don't end in Unicode or other characters outside of the search range, and expressions that list the column name first without applying any expressions to it):

DECLARE @cols TABLE(name NVARCHAR(MAX),expr NVARCHAR(MAX));

INSERT @cols(name,expr) VALUES
  (LEFT(@wc1,NULLIF(PATINDEX(N'%[^A-Za-z0-9]%',@wc1)-1,-1)),@wc1),
  (LEFT(@wc2,NULLIF(PATINDEX(N'%[^A-Za-z0-9]%',@wc2)-1,-1)),@wc2),
  (LEFT(@wc3,NULLIF(PATINDEX(N'%[^A-Za-z0-9]%',@wc3)-1,-1)),@wc3),
  (LEFT(@wc4,NULLIF(PATINDEX(N'%[^A-Za-z0-9]%',@wc4)-1,-1)),@wc4);

-- if you want to raise an error:
IF EXISTS 
(
  SELECT 1 FROM @cols AS c
  WHERE NOT EXISTS
  (
    SELECT 1 FROM sys.columns
	WHERE [object_id] = OBJECT_ID(N'dbo.Users')
	AND name = c.name
  )
)
BEGIN
  RAISERROR(N'Invalid experession.', 11, 1);
END

-- if you just want to concatenate the ones with valid column references:
DECLARE @wc NVARCHAR(MAX) = N'', @sql NVARCHAR(MAX);

SELECT @wc += N' AND ' + expr
  FROM @cols AS c
  WHERE EXISTS (SELECT 1 FROM sys.columns
  WHERE name = c.name
  AND [object_id] = OBJECT_ID(N'dbo.Users'));

SET @sql += N'SELECT * FROM dbo.table WHERE 1 = 1' + @wc;

Like I said, though, this has many of holes, and the more flexibility you give your users, the tougher it gets to guard against possible exploits. This pattern is a good demonstration of that. Another way to solve this problem is to pass the column name, the operator, and the literal as three different columns in a TVP. You could even have a different column for each data type to prevent conversions and invalid predicates. For example:

CREATE TYPE dbo.WhereClauses AS TABLE
(
  ColumnName SYSNAME,
  Operator VARCHAR(10) NOT NULL CHECK (Operator IN 
    ('>','<','<=','>=','<>','=','LIKE','NOT LIKE')),
  BigintValue BIGINT,
  DatetimeValue DATETIME,
  StringValue NVARCHAR(MAX)
);

Now you could populate a table-valued parameter using this type, for example:

DECLARE @wcs dbo.WhereClauses;

INSERT @wcs(ColumnName,Operator,BigintValue,DatetimeValue,StringValue)
VALUES
  (N'col1', '<',        20,   NULL,       NULL),
  (N'col2', 'LIKE',     NULL, NULL,       N'''%foo%'''),
  (N'col3', 'NOT LIKE', NULL, NULL,       N'''%bar%'''),
  (N'col4', '>',        NULL, '20140101', NULL);

Now, it gets a little complex, because in order to continue passing strongly-typed parameters based on the values inserted into our table type, we need to build up a handful of strings and execute nested dynamic SQL (sp_executesql calling sp_executesql).

DECLARE 
  @sql       NVARCHAR(MAX), 
  @mastersql NVARCHAR(MAX),
  @p         NVARCHAR(MAX) = N'@p0 INT',
  @plist     NVARCHAR(MAX) = N',@p0 = NULL';

SET @sql = N'SELECT * FROM dbo.[table] WHERE 1 = 1';

SELECT @sql += CHAR(13) + CHAR(10) + N' AND ' + QUOTENAME(c.name)
    + N' ' + t.Operator + N' @p' + CONVERT(VARCHAR(11), t.ParamIndex),
 
 @p += N',@p' + CONVERT(VARCHAR(11), t.ParamIndex) 
   + CASE 
     WHEN t.BigintValue IS NOT NULL THEN N' BIGINT'
     WHEN t.DatetimeValue IS NOT NULL THEN N' DATETIME'
     WHEN t.StringValue IS NOT NULL THEN N' NVARCHAR(MAX)'
     ELSE N'' END,
   
 @plist += N',@p' + CONVERT(VARCHAR(11), t.ParamIndex) + N' = ' 
   + CASE 
     WHEN t.BigintValue IS NOT NULL THEN CONVERT(VARCHAR(32), t.BigintValue)
     WHEN t.DatetimeValue IS NOT NULL THEN 
       N'''' + CONVERT(CHAR(8), t.DatetimeValue, 112) + N' ' 
       + CONVERT(CHAR(12), t.DatetimeValue, 114) + N''''
     WHEN t.StringValue IS NOT NULL THEN t.StringValue
     ELSE N'NULL' END
     
 FROM @wcs AS t
 INNER JOIN sys.columns AS c
 ON t.ColumnName = c.name
 WHERE c.[object_id] = OBJECT_ID('dbo.[table]');

PRINT N'-- @sql:';
PRINT @sql;

SET @mastersql = N'EXEC sys.sp_executesql @stmt = N''' 
  + @sql + N';''' + CHAR(13) + CHAR(10) 
  + N',@params = N''' + @p + N'''' + CHAR(13) + CHAR(10)
  + @plist;

PRINT N'-- @mastersql:';
PRINT @mastersql;
EXEC sys.sp_executesql @mastersql;

In this case the query is run against dbo.[table] (returning no results, because the table is empty), and the printed output is:

-- @sql:
SELECT * FROM dbo.[table] WHERE 1 = 1
 AND [col1] < @p1
 AND [col2] LIKE @p2
 AND [col3] NOT LIKE @p3
 AND [col4] > @p4

-- @mastersql:
EXEC sys.sp_executesql @stmt = N'SELECT * FROM dbo.[table] WHERE 1 = 1
 AND [col1] < @p1
 AND [col2] LIKE @p2
 AND [col3] NOT LIKE @p3
 AND [col4] > @p4;'
,@params = N'@p0 INT,@p1 BIGINT,@p2 NVARCHAR(MAX),@p3 NVARCHAR(MAX),@p4 DATETIME'
,@p0 = NULL,@p1 = 20,@p2 = '%foo%',@p3 = '%bar%',@p4 = '20140101 00:00:00:000'

Not the tidiest thing in the world, but it works, and again, is a lot safer than just blindly appending entire ad hoc predicates onto the end of a query. I have tested various combinations, and any manipulating you try to do to the strings stored in the table type lead to syntax errors (which is better than injection!) or just searching for the wrong string value. If you come up with any way to execute an unchecked DROP TABLE, for example, with access only to the values that get inserted into the TVP, please let me know in the tip comments below so I can investigate.

You could go ever further, like adding columns that indicate correlation and predicate nesting, as well as adding ordering, grouping, and various other constructs. Really, you can get as complicated as you want. But this is already pretty complex, and likely a lot to digest, so I'll leave it at that for now.

Next Steps

Make a plan to review your usage of dynamic SQL and identify opportunities to make it safer.
Review the following tips and other resources:

About the author

Aaron Bertrand (@AaronBertrand) is a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com, and also blogs at sqlblog.org.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Article Last Updated: 2015-06-01

Protecting Yourself from SQL Injection in SQL Server - Part 2