By: Aaron Bertrand | Updated: 2017-06-22 | Comments | Related: > Functions System
Problem
SQL Server 2016 introduced a new built-in function, STRING_SPLIT(), which offers much better performance than the methods many of us have implemented ourselves. The function does come with some limitations, though; most notably, it only supports a single-character delimiter, which means you can’t use the function directly on data that has a separator using more than one character, like ^^^ or ~|~.
Solution
The most obvious solution would be to change your data so that it only requires a single-character separator. Of course, this is not practical for most users; the data – and all of its touchpoints – are not always in your control, and can’t always all be changed at once. So what else can be done?
Well, let’s say you have a table like this:
CREATE TABLE dbo.Pets
(
StudentID int PRIMARY KEY,
PetNames nvarchar(4000)
);
And it currently stores data like this:
INSERT dbo.Pets(StudentID, PetNames) VALUES
(1, N'Furface~|~Snowball~|~Max'),
(2, N'Kirby~|~Quigley'),
(3, N'Dax~|~Spike~|~Bowser~|~Rosco');
Depending on the balance of reads and writes, and how often you need the data in concatenated vs. separated forms, you have three general options:
1. Change the data on read
If you have an application or report that mostly relies on the concatenated data appearing as above (with the ~|~ delimiter), but still need to occasionally split the data out into a relational form, then you could create a view that STRING_SPLIT looks at, which first replaces the multi-character separator with a single-character separator before handing the data off to the function. The trick is to choose a Unicode character that is not possible to appear in the actual data, like NCHAR(9999), which is a pencil (✏). So your view would look like this:
CREATE VIEW dbo.AltPets
AS
SELECT StudentID,
PetNames = REPLACE(PetNames, N'~|~', NCHAR(9999))
FROM dbo.Pets;
Now the application can still look at the table directly if it wants the 3-character delimiter version. But other code that wants to split the data can instead look at the view, which contains values that can be processed by STRING_SPLIT:
SELECT v.StudentID, s.Value
FROM dbo.AltPets AS v
OUTER APPLY STRING_SPLIT(v.PetNames, NCHAR(9999)) AS s;
You can do this without the view, of course, but you’d have to inline the REPLACE() expression into all the queries:
SELECT p.StudentID, s.Value
FROM dbo.Pets AS p
OUTER APPLY STRING_SPLIT(REPLACE(p.PetNames, N'~|~', NCHAR(9999)), NCHAR(9999)) AS s;
I talked about this recently in an answer on the Database Administrators Stack Exchange.
2. Change the data on write
If your application spends more time splitting the data than storing it, then you may want to handle this the other way around – replace the delimiters as you write the data to the table, either via a trigger or changing the data access layer. Then a view could be created to run the replacement the other way when you need to show the original format with the 3-character separator:
CREATE VIEW dbo.AltPets2
AS
SELECT StudentID,
PetNames = REPLACE(PetNames, NCHAR(9999), N'~|~')
FROM dbo.Pets;
3. Change the storage altogether
The above two options work best when you can’t change all aspects – for example, the application might be hard-coded to concatenate the data with this 3-character delimiter (or expect it to be presented that way), and you can’t change the app due to resources or because the app belongs to a 3rd party.
If you do have control over the app, though, a better solution might be to stop splitting and concatenating in the first place. The schema could instead be:
CREATE TABLE dbo.StudentPets
(
StudentID int,
PetName nvarchar(255)
);
Then you could have a table-valued parameter used to insert/update delete data:
CREATE TYPE dbo.PetNameTVP
(
PetName nvarchar(255)
);
Then the application – which is probably already using a structured data type, like a DataTable, to store the data before passing it to SQL Server anyway – can just send the DataTable to a stored procedure, such as:
CREATE PROCEDURE dbo.OverwriteStudentPets
@StudentID int,
@PetNames dbo.PetNameTVP READONLY
AS
BEGIN
SET NOCOUNT ON;
DELETE dbo.StudentPets WHERE StudentID = @StudentID;
INSERT dbo.StudentPets(StudentID, PetName)
SELECT @StudentID, PetName FROM @PetNames;
END
GO
Now there is no messy splitting happening, delimiters become irrelevant, and performance will probably improve, too.
Next Steps
- Review the following tips and other resources:
- SQL Server 2016 STRING_SPLIT Function
- Parsing string data with the new SQL Server 2016 STRING_SPLIT function
- SQL Server CLR and SQL split functions to parse a delimited string
- Split strings the right way – or the next best way
- Comparing string splitting / concatenation methods
- Performance Surprises and Assumptions : STRING_SPLIT()
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2017-06-22