By: Ben Snaidero | Updated: 2022-02-23 | Comments (9) | Related: > TSQL
Problem
When searching a character-based column in a SQL Server table, it's very rare that we know the exact string we are searching for and can perform the query using the = operator. The SQL LIKE operator can be used to search for static and wildcard string patterns within any character-based column.
In this tutorial we will go through examples showing the many different ways the LIKE operator can be used. We will cover the most basic use cases all the way up to some advanced techniques using indexes on computed columns to speed up text-based searches.
Solution
When learning about any operator, I find the best method for understanding how it works is to just work through a bunch of examples to show what can be done with it. With that being said, it is always a good idea to read through the documentation and become familiar with syntax and the different options available. You can read more about this from this link, but I will be explaining in this tip the main points from the documentation.
SQL LIKE Statement Tutorial
Like any operator, the SQL Server LIKE operator goes between the two expressions/patterns it will be evaluating. The expressions can be columns or hard coded values, both of which can include wildcard characters (more on those below). As with most other operators, the NOT clause can also be added to negate the condition being checked. One other special clause with this operator is the ESCAPE clause which allows you to add wildcard characters in the WHERE clause as literal characters in your expressions. Here is the basic syntax.
match_expression [ NOT ] LIKE pattern [ ESCAPE escape_character ]
There are 4 different SQL LIKE wildcard characters that can be used in the pattern to perform your search in the WHERE clause. We will go through examples of each character to better explain how they work, but here is a short description of each specified pattern.
- % - matches any string of zero of more characters
- _ - matches any single character using an underscore
- [] - matches any single character within the specified range or set of characters
- [^] - matches any single character not within the specified range or set of characters
Now that we understand the basic syntax, let's get into some examples to see how the LIKE operator can be used. Note that all examples below were tested against SQL Server 2019 using the AdventureWorks 2017 database. After restoring the database, we also added some additional data using the T-SQL below to make a few of the examples work.
INSERT INTO Person.EmailAddress (BusinessEntityID,EmailAddress) VALUES (1,'[email protected]'); INSERT INTO Person.EmailAddress (BusinessEntityID,EmailAddress) VALUES (1,'ken[[email protected]'); UPDATE Person.Person SET FirstName = '1Terry' WHERE BusinessEntityID=2;
SQL LIKE Wildcard Example
The most common use case for the LIKE condition is searching for a partial string match. In this first example, we can search for all email addresses that start with 'ken' and have any other string (zero or more characters) following it by adding the % wildcard to the end of the character string.
SELECT * FROM Person.EmailAddress WHERE EmailAddress LIKE 'ken%';
Similarly, we can add this LIKE % wildcard to both sides of a string and use pattern matching for any email address with '@adventure-works-test.' in the address as shown here.
SELECT * FROM Person.EmailAddress WHERE EmailAddress LIKE '%@adventure-works-test.%';
SQL NOT LIKE Example
Now, if we didn't already know that these two email addresses existed, we could also use the SQL NOT LIKE clause and instead search for any email addresses that don't match '@adventure-works.'. To find them with this method, we can use the following query with a NOT operator.
SELECT * FROM Person.EmailAddress WHERE EmailAddress NOT LIKE '%@adventure-works.%';
There are some performance implications when you add the wildcard as a prefix to your search pattern that we will discuss a bit later in this tip.
SQL LIKE Wildcard versus Equals
One other thing that I want to point out in regards to comparing text columns using the LIKE operator is that the column type can affect your result comparing values. Unlike the = operator, the LIKE operator takes into account all characters including leading and trailing spaces. This means you need to be careful when looking at char or nchar columns or any other fixed length datatype, but this does not apply to varchar or nvarchar columns. For example, if you were to search for a ProductLine with the value 'T' using the = operator, it would return data as follows.
SELECT ProductID, Name, ProductNumber,ProductLine FROM Production.Product WHERE ProductLine = 'T';
Using the LIKE operator (without a wildcard) will not return any data due to the trailing space. One option is to use the RTRIM function to remove trailing spaces.
SELECT ProductID, Name, ProductNumber,ProductLine FROM Production.Product WHERE ProductLine LIKE 'T';
LIKE Operator in SQL to Match Any Single Character
Another fairly common use case when using the LIKE operator is to match a single character in a string. This is done with the _ (underscore) wildcard.
From our example above using 'ken%', we can see that there were many different matches. But what if we wanted to match 'ken' exactly? We know that all email addresses have a number following the first name so in this case we could use the _ underscore wildcard as follows and only return addresses with the name 'ken'.
SELECT * FROM Person.EmailAddress WHERE EmailAddress LIKE'ken_@%';
SQL LIKE Syntax to Match Any Single Character Within Specified Range
The _ wildcard matches any single character but what if we wanted to be even more specific and force it to match a number in this place? This is where the [] wildcard can be used.
The wildcard matches any single character or set of characters specified between the brackets in a SQL statement. For example, we could specify that the character before the @ symbol must be a number as shown below. Note that the character can be specified as a range (first example) or as a list of characters (second example).
SELECT * FROM Person.EmailAddress WHERE EmailAddress LIKE 'ken[0-9]%';SELECT * FROM Person.EmailAddress WHERE EmailAddress LIKE 'ken[0123456789]%';
We could also use this wildcard with the NOT clause to ensure that all email addresses follow this format and search for anything that doesn't using the following query.
SELECT * FROM Person.EmailAddress WHERE EmailAddress NOT LIKE '%[0-9]@%.com' AND EmailAddress NOT LIKE '%[0-9]@%.ca';
SQL LIKE Statement to Match Any Single Character Not Within Specified Range
The final wildcard that can be used allows us to ensure that a character in a certain place in the string does not match a specific character. Going back to the earlier example where we were searching for email addresses with only 'ken[0-9]', what if in a different scenario we were looking for the opposite and wanted to exclude these but include all other email address that start with 'ken'?
We could use the [^] operator as follows to get these results.
SELECT * FROM Person.EmailAddress WHERE EmailAddress LIKE 'ken[^0-9]%@%';
SQL Server LIKE Searching for Wildcard Characters as Literal Values
What happens if you want to actually match a string for one of these special wildcard characters. Say we wanted to find all email addresses with the '[' character in them. If we write the query as follows, there is no data returned.
SELECT * FROM Person.EmailAddress WHERE EmailAddress LIKE 'ken[%';
There are actually two ways we can write this query to match this special character. The first is enclosing the special character using the [] wildcard as shown below.
SELECT * FROM Person.EmailAddress WHERE EmailAddress LIKE 'ken[[]%';
The other method is to use the ESCAPE clause and specify an escape character that can be used in your pattern.
SELECT * FROM Person.EmailAddress WHERE EmailAddress LIKE 'ken\[%' ESCAPE '\';
You can see that both cases will return the same result.
SQL LIKE Performance Implications
As useful as the LIKE operator can be for matching patterns in strings, there are some performance implications we should be aware of when using it in a query. Let's take a look at the query plan for the first query from our examples above.
SELECT * FROM Person.EmailAddress WHERE EmailAddress LIKE 'ken%';
Everything looks good as the query is doing an index seek and a key lookup. This is always the case when the wildcard is used as the suffix. Let's say we want to look for all email address with a '.ca' suffix. To do this we need to put the wildcard as a prefix in the pattern as shown below.
SELECT * FROM Person.EmailAddress WHERE EmailAddress LIKE '%ca';
If we look at the plan for this query, we can see that the query planner can no longer use the index and we end up doing an index scan.
Using 'SET STATISTICS IO ON' we can see it had to perform quite a few reads to execute this query.
(2 row(s) affected) Table 'EmailAddress'. Scan count 1, logical reads 191, physical reads 1, page server reads 0, read-ahead reads 182, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Fortunately, there is something we can do to address this issue. If we add a computed column to the table and an index on the column, we can get the query planner to use an index seek for this query as well. Here is the T-SQL to add the column and index.
ALTER TABLE Person.EmailAddress ADD EmailAddressRev AS REVERSE(EmailAddress); CREATE NONCLUSTERED INDEX IX_EmailAddress_EmailAddressRev ON Person.EmailAddress (EmailAddressRev);
We also need to rewrite the query slightly as shown below.
SELECT * FROM Person.EmailAddress where REVERSE(EmailAddress) LIKE REVERSE('%ca');
We could also write this query and reference the EmailAddressRev column directly but I think the above query is more readable.
SELECT * FROM Person.EmailAddress where EmailAddressRev LIKE 'ac%';
Now if we take a look at the execution plan, we can see that it is again doing an index seek and lookup.
The number of reads required is much lower, down to 10 from 192 with the original query.
(2 row(s) affected) Table 'EmailAddress'. Scan count 1, logical reads 7, physical reads 3, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
The only downside to this method is the extra space required for the index. The computed column does not even have to be persisted so no extra space is required there. Given that it performed much fewer reads during execution by avoiding the index scan, the extra space used is definitely worth it.
Next Steps
- Read more on Full Text Search
- Read more on Using Regular Expressions with T-SQL
- Read other tips on using computed columns
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2022-02-23