By: Koen Verbeeck | Updated: 2021-10-05 | Comments | Related: 1 | 2 | 3 | 4 | 5 | 6 | > Functions System
Problem
A common use case when working with data stored in relational databases is "gluing" values together into one single big value, which is often described as concatenating. In this tip, we'll go over the various options you have in Microsoft SQL Server T-SQL for concatenating values.
Solution
To illustrate with an example, we're trying to achieve the following:
If the values are stored in a table over different columns, we want to concatenate over these columns. It's also possible to concatenate row values inside a single column (sometimes referred to as "grouped concatenation").
However, this is not the focus of this SQL tutorial. If you're interested in this scenario, check out the excellent tip Using FOR XML PATH and STRING_AGG() to denormalize SQL Server data.
Some examples in this tip are shown using the Adventure Works sample database. You can install it on your machine so you can execute the queries yourself to see the results.
Concatenate Values Using the Concatenation Operator
In any version of SQL Server, you can use the plus sign as a concatenation operator with the following syntax:
SELECT 'Hello'+' World'+'!';
It doesn't really matter if you're using string literals or variables or columns values to concatenate strings.
DECLARE @string1 VARCHAR(50) = 'Hello' ,@string2 VARCHAR(50) = ' World' ,@string3 VARCHAR(50) = '!'; SELECT @string1 + @string2 + @string3;
SELECT FullName = [FirstName] + ' ' + [LastName] FROM [AdventureWorks2017].[Person].[Person];
Mixed Data Types
When you concatenate values together, you typically have to use a separator. This ensures your values are not "sticking together". In the example above, we used a space as a separator between the first and the last name. Even though the concatenation operator is simple in its use, it has a couple of significant drawbacks. Let's try to mix some data types.
SELECT Test = 'Employee ' + [FirstName] + ' ' + [LastName] + ' (ID = ' + [BusinessEntityID] + ')' -- add int + ' has been modified at ' + [ModifiedDate] -- add date FROM [AdventureWorks2017].[Person].[Person];
Because of data types precedence, the strings are being converted to integers, which of course fails. When using the plus sign, it's meaning depends on the data types used (this is called an overloaded operator). When using strings, it concatenates values. When using integers, it adds them together (and the same goes for dates). When mixing data types, you need to make sure you have converted all columns to strings if necessary. To make our expression, we need to convert the int and datetime columns using either CONVERT or CAST.
SELECT Test = 'Employee ' + [FirstName] + ' ' + [LastName] + ' (ID = ' + CONVERT(VARCHAR(10),[BusinessEntityID]) + ')' -- add int + ' has been modified at ' + CONVERT(CHAR(10),[ModifiedDate],23) -- add date FROM [AdventureWorks2017].[Person].[Person];
Dealing with NULL Values
Another issue with the concatenation operator are NULL values. With the default settings, if any of the expressions you're trying to concatenate is a NULL, the result will also be NULL. Let's take a look at the Person table and concatenate the first name, the middle name and the last name.
SELECT FullName = [FirstName] + ' ' + [MiddleName] + ' ' + [LastName] FROM [AdventureWorks2017].[Person].[Person];
Not all persons have a middle name, so some rows will return a NULL instead of a string. To solve this issue, we can use ISNULL or COALESCE:
SELECT FullName = [FirstName] + ' ' + ISNULL([MiddleName],'') + ' ' + [LastName] FROM [AdventureWorks2017].[Person].[Person];
The downside is you'll have to apply ISNULL/COALESCE to every column or variable that can potentially contain NULLs. Another option might be to set the CONCAT_NULL_YIELDS_NULL session parameter to OFF. However, this option is deprecated and should be avoided. For more information about the SET statement, check out the tip Determining SET Options for a Current Session in SQL Server.
Concatenate Values Using the CONCAT function
In SQL Server 2012 and later, we can use the CONCAT function. With this string function, you can concatenate multiple character string values at once. Let's reprise the example from the previous paragraph:
DECLARE @string1 VARCHAR(50) = 'Hello' ,@string2 VARCHAR(50) = ' World' ,@string3 VARCHAR(50) = '!'; SELECT CONCAT(@string1,@string2,@string3);
The beauty of the CONCAT function is that it implicitly converts all expressions to string values (if possible) within a SELECT statement. This means you don't have to use CAST or CONVERT anymore as shown in the following example!
SELECT Test = CONCAT('Employee ',[FirstName],' ',[LastName] ,' (ID = ',[BusinessEntityID],')' -- add int ,' has been modified at ',[ModifiedDate]) -- add date FROM [AdventureWorks2017].[Person].[Person];
However, as you can see in the results, sometimes you'll still want to use CONVERT with a specific style to get datetime values into the correct format. Another advantage of the CONCAT function is that it also converts NULLs into empty strings. So you don't need to use ISNULL or COALESCE anymore either!
SELECT FullName = CONCAT([FirstName],' ',[MiddleName],' ',[LastName]) FROM [AdventureWorks2017].[Person].[Person];
If all columns are NULL, the result is an empty string. More tips about CONCAT:
- Concatenation of Different SQL Server Data Types
- Concatenate SQL Server Columns into a String with CONCAT()
- New FORMAT and CONCAT Functions in SQL Server 2012
Concatenate Values Using CONCAT_WS
In SQL Server 2017 and later, we can use the function CONCAT_WS. The WS stands for "with separator". It basically does the same thing as the CONCAT function, but you specify a separator as well which is placed between each string value. This makes some expressions a bit easier and shorter to write, especially if there are a lot of arguments. The example of the full name from the previous paragraphs now becomes:
SELECT FullName = CONCAT_WS(' ',[FirstName],[MiddleName],[LastName]) FROM [AdventureWorks2017].[Person].[Person];
Use Case – Calculating a Row Hash
Let's apply everything we learned in this tip in an example. Often, you want to know if a record has changed; meaning one or more columns have been updated. For example when loading data into a dimension table.
When you have incoming data and you compare this with the current data in a table, comparing each individual column is not efficient, especially when the table has a lot of columns. An option for a quick check is calculating a row hash. If the hash of the current row in the table is different from the incoming row, you know that at least one column has a different value. This pattern is described in the tip Additional Optimizations for Type 2 Slowly Changing Dimensions in SQL Server Integration Services - Part 3.
Calculating a hash in Transact-SQL can be done with the function HASHBYTES. To calculate the hash for an entire row, we first need to concatenate all columns together into one single value. This is where CONCAT_WS comes into the picture. Let's calculate a hash for the Person table with the following SQL commands:
SELECT [BusinessEntityID] ,[PersonType] ,[NameStyle] ,[Title] ,[FirstName] ,[MiddleName] ,[LastName] ,[Suffix] ,[EmailPromotion] ,[ModifiedDate] ,MyHash = HASHBYTES('SHA2_256' ,CONCAT_WS('||', [BusinessEntityID] ,[PersonType] ,[NameStyle] ,[Title] ,[FirstName] ,[MiddleName] ,[LastName] ,[Suffix] ,[EmailPromotion] ,[ModifiedDate] ) ) FROM [AdventureWorks2017].[Person].[Person];
Some remarks:
- The algorithm SHA2_256 is used to calculate the hash. The HASHBYTES function supports other algorithms, but since SQL Server 2016 all algorithms except SHA2-256 and SHA2-512 are deprecated.
- The columns AdditionalContactInfo and Demographics are left out of the example since they are of the XML data type and cannot be implicitly converted.
- Two pipe symbols are used as a separator. Any string expression can be used, it doesn't have to be one single character.
You'll definitely want to use a separator, otherwise you might get "collisions": different rows who result in the same hash. For example, if you don't use a separator, the following the rows will return the same hash:
SELECT Column1 = 'AB', Column2 = NULL, MyHash = HASHBYTES('SHA2_256',CONCAT('AB',NULL)) UNION ALL SELECT Column1 = 'A', Column2 = 'B', MyHash = HASHBYTES('SHA2_256',CONCAT('A','B'));
Such collisions can easily be avoided by adding a separator. Either replace CONCAT with CONCAT_WS if you're using SQL Server 2017 or higher, or put a separator manually between each column if you're on an older version of SQL Server.
If you have a lot of nullable columns, a collision can often happen, especially if you use a lot of flag columns (columns with the bit data type) as in this example:
SELECT Flag1 = 1 ,Flag2 = NULL ,Flag3 = NULL ,Flag4 = 0 ,MyHash = HASHBYTES('SHA2_256',CONCAT(1,NULL,NULL,0)) UNION ALL SELECT Flag1 = 1 ,Flag2 = NULL ,Flag3 = 0 ,Flag4 = NULL ,MyHash = HASHBYTES('SHA2_256',CONCAT(1,NULL,0,NULL)) UNION ALL SELECT Flag1 = 1 ,Flag2 = 0 ,Flag3 = NULL ,Flag4 = NULL ,MyHash = HASHBYTES('SHA2_256',CONCAT(1,0,NULL,NULL));
Since the concatenation results in '10' each time, all the rows have the same hash. However, in this example CONCAT_WS doesn't solve the issue!
The problem here is CONCAT_WS doesn't put a separator between two columns if they are NULL, as explained in the documentation.
To solve this, we can either use ISNULL to replace NULL with some sort of dummy value or use CONCAT and put the separator explicitly between all the columns. A dummy value is not always a good idea. For example, suppose you have an integer column that can contain any values an integer can hold. Meaning, 0 and -1 are legit values for this column. What is your dummy value going to be? If you choose 0, an update that changes a NULL value to 0 will go unnoticed. Therefore, using an explicit separator is the best option.
The SQL statement then becomes:
SELECT Flag1 = 1 ,Flag2 = NULL ,Flag3 = NULL ,Flag4 = 0 ,MyHash = HASHBYTES('SHA2_256',CONCAT(1,'||',NULL,'||',NULL,'||',0)) UNION ALL SELECT Flag1 = 1 ,Flag2 = NULL ,Flag3 = 0 ,Flag4 = NULL ,MyHash = HASHBYTES('SHA2_256',CONCAT(1,'||',NULL,'||',0,'||',NULL)) UNION ALL SELECT Flag1 = 1 ,Flag2 = 0 ,Flag3 = NULL ,Flag4 = NULL ,MyHash = HASHBYTES('SHA2_256',CONCAT(1,'||',0,'||',NULL,'||',NULL));
Next Steps
- The following tip goes deeper into concatenation and also shows its usage in different database platforms: Using SQL Server Concatenation Efficiently.
- More tips about HASHBYTES:
- If you don't have a relational source, you can calculate a hash in SSIS as well: Using hash values in SSIS to determine when to insert or update rows.
- Learn about SQL Server Stored Procedures.
- Lean about SQL Server Functions:
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2021-10-05