Dealing with the single-character delimiter in SQL Server's STRING_SPLIT function

By:   |   Updated: 2017-06-22   |   Comments   |   Related: > Functions System


Problem

SQL Server 2016 introduced a new built-in function, STRING_SPLIT(), which offers much better performance than the methods many of us have implemented ourselves. The function does come with some limitations, though; most notably, it only supports a single-character delimiter, which means you can’t use the function directly on data that has a separator using more than one character, like ^^^ or ~|~.

Solution

The most obvious solution would be to change your data so that it only requires a single-character separator. Of course, this is not practical for most users; the data – and all of its touchpoints – are not always in your control, and can’t always all be changed at once. So what else can be done?

Well, let’s say you have a table like this:

CREATE TABLE dbo.Pets
(
  StudentID int PRIMARY KEY,
  PetNames  nvarchar(4000)
);

And it currently stores data like this:

INSERT dbo.Pets(StudentID, PetNames) VALUES
  (1, N'Furface~|~Snowball~|~Max'),
  (2, N'Kirby~|~Quigley'),
  (3, N'Dax~|~Spike~|~Bowser~|~Rosco');

Depending on the balance of reads and writes, and how often you need the data in concatenated vs. separated forms, you have three general options:

1. Change the data on read

If you have an application or report that mostly relies on the concatenated data appearing as above (with the ~|~ delimiter), but still need to occasionally split the data out into a relational form, then you could create a view that STRING_SPLIT looks at, which first replaces the multi-character separator with a single-character separator before handing the data off to the function. The trick is to choose a Unicode character that is not possible to appear in the actual data, like NCHAR(9999), which is a pencil (✏). So your view would look like this:

CREATE VIEW dbo.AltPets
AS
    SELECT StudentID,
           PetNames = REPLACE(PetNames, N'~|~', NCHAR(9999))
    FROM dbo.Pets;

Now the application can still look at the table directly if it wants the 3-character delimiter version. But other code that wants to split the data can instead look at the view, which contains values that can be processed by STRING_SPLIT:

SELECT v.StudentID, s.Value
  FROM dbo.AltPets AS v
  OUTER APPLY STRING_SPLIT(v.PetNames, NCHAR(9999)) AS s;

You can do this without the view, of course, but you’d have to inline the REPLACE() expression into all the queries:

SELECT p.StudentID, s.Value
  FROM dbo.Pets AS p
  OUTER APPLY STRING_SPLIT(REPLACE(p.PetNames, N'~|~', NCHAR(9999)), NCHAR(9999)) AS s;

I talked about this recently in an answer on the Database Administrators Stack Exchange.

2. Change the data on write

If your application spends more time splitting the data than storing it, then you may want to handle this the other way around – replace the delimiters as you write the data to the table, either via a trigger or changing the data access layer. Then a view could be created to run the replacement the other way when you need to show the original format with the 3-character separator:

CREATE VIEW dbo.AltPets2
AS
    SELECT StudentID,
           PetNames = REPLACE(PetNames, NCHAR(9999), N'~|~')
    FROM dbo.Pets;

3. Change the storage altogether

The above two options work best when you can’t change all aspects – for example, the application might be hard-coded to concatenate the data with this 3-character delimiter (or expect it to be presented that way), and you can’t change the app due to resources or because the app belongs to a 3rd party.

If you do have control over the app, though, a better solution might be to stop splitting and concatenating in the first place. The schema could instead be:

CREATE TABLE dbo.StudentPets
(
  StudentID int,
  PetName nvarchar(255)
);

Then you could have a table-valued parameter used to insert/update delete data:

CREATE TYPE dbo.PetNameTVP
(
  PetName nvarchar(255)
);

Then the application – which is probably already using a structured data type, like a DataTable, to store the data before passing it to SQL Server anyway – can just send the DataTable to a stored procedure, such as:

CREATE PROCEDURE dbo.OverwriteStudentPets
  @StudentID int,
  @PetNames dbo.PetNameTVP READONLY
AS
BEGIN
  SET NOCOUNT ON;
  DELETE dbo.StudentPets WHERE StudentID = @StudentID;
  INSERT dbo.StudentPets(StudentID, PetName)
    SELECT @StudentID, PetName FROM @PetNames;
END
GO

Now there is no messy splitting happening, delimiters become irrelevant, and performance will probably improve, too.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Aaron Bertrand Aaron Bertrand (@AaronBertrand) is a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com, and also blogs at sqlblog.org.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2017-06-22

Comments For This Article

















get free sql tips
agree to terms