SQL Server RegEx Capabilities Using Python

By:   |   Updated: 2020-07-30   |   Comments (1)   |   Related: > TSQL


Problem

As you may already know, in SQL Server 2017 and later, SQL Server has the ability to execute both R and Python scripts natively. This produces some interesting functionality and patterns that become available because of this feature.

Today we look at the ability to integrate regular expressions straight into our T-SQL statement result sets with a dynamic stored procedure using the Python native feature and a few pre-installed Python libraries.

Before we begin, regular expressions are somewhat supported in the WHERE clause of SQL Server, for instance saying where Field1 like '[0-9]' is using regex type capabilities. However, returning results on a regex match is not natively supported, this means you would have to develop a CLR procedure to do this. Integrating Python is much easier and does not require as much development time as we will see in this tip.

Solution

First off, let's enable external scripts:

--Advanced Options Enable External scripts			
sp_configure 'external scripts enabled', 1; 
RECONFIGURE WITH OVERRIDE;			

Once that is enabled, we should be able to run a test python script:

EXEC sp_execute_external_script
  @language =N'Python',
  @script=N'import sys; print("\n".join(sys.path))'

This will confirm the location that our Python-SQL server environment is running, see subheading section for details of adding additional Python libraries to this environment. If Python is not installed, check out this tutorial.

python scripting

Stored Procedure for String Evaluation

This stored procedure accepts two dynamic parameters, the regular expression pattern and the evaluation string.

Building the procedure like this gives a lot of flexibility to return our evaluation criteria and allows for a single procedure to handle all of our regex needs.

IF EXISTS(SELECT * FROM SYS.procedures WHERE NAME = 'RegexSelect') DROP PROC RegexSelect
GO

CREATE PROCEDURE DBO.RegexSelect ( @REGEXEXPRESSION VARCHAR(MAX), @STRING VARCHAR(MAX) )
AS
 
DECLARE @PYTHON NVARCHAR(MAX)
SET @PYTHON = '
import pandas as pd
import re
import sys
from io import StringIO
Test_String = "' + @STRING + '"
Regex_Pattern = r"'+ @REGEXEXPRESSION + '"
match = re.findall(Regex_Pattern, Test_String)
out = ""
for x in match:
   out += x
df = pd.DataFrame({"results":[out]}) 
OutputDataSet = df'
EXECUTE sp_execute_external_script @language = N'Python'
   , @script = @PYTHON
   , @params = N'@P1 VARCHAR(MAX),@P2 VARCHAR(MAX)'
   , @P1 = N' @REGEXEXPRESSION VARCHAR(MAX)'
   , @P2 = N'@STRING'
   WITH RESULT SETS((RESULT VARCHAR(MAX)));
GO

After creating the stored procedure lets test the execution:

--SAMPLE EXECUTIONS FOR DIFFERENT PATTERNS

EXEC DBO.RegexSelect '(?<=My)(.*?)(?=ABC)','My name ABC' --EXTRACT STRING BETWEEN TWO STRINGS

EXEC DBO.RegexSelect '\d\d\D\d\d\D\d\d\d\d','XXXXXXXXXXXXXX06-11-2015YYYYYYYYY' --EXTRACT DATE PATTERN 

EXEC DBO.RegexSelect '[0-9]','ABC1DEF2GHI3' --EXTRACT DIGITS FROM STRING

Now that we verified that our new procedure is firing let's propose a use case for our newly forged procedure.

Let's imagine that you have a free field to collect phone numbers, that is of type varchar(12). This field will accept parenthesis for area code, hyphens if the user chooses and numeric characters. Using our new procedure, we can extract the numeric characters of these fields.

First, let's build the stage table that we will be processing:

--EXAMPLE PARSING A SINGLE FORMAT TO PHONE NUMBERS
IF OBJECT_ID('TEMPDB..#TEMP_PROCESS') IS NOT NULL DROP TABLE #TEMP_PROCESS

SELECT '911' AS PHONENUMBER --EMERGENCY NUMBER
INTO #TEMP_PROCESS 
UNION 
SELECT '+1442041349987'     --INTERNATIONAL NUMBER
UNION 
SELECT '(123)-6782134'      --US FORMAT WITH AREA CODE
UNION 
SELECT '123-321-4412'       --US FORMAT HYPHEN DELIMITED
UNION 
SELECT '8762341414'         --US FORMAT NOT DELIMITED

Next, we will build a results table:

--RESULTS TEMP TABLE
IF OBJECT_ID('TEMPDB..#TEMP_CLEANED') IS NOT NULL DROP TABLE #TEMP_CLEANED

CREATE TABLE #TEMP_CLEANED ( CLEANPHONENUM VARCHAR(MAX) )

Now let's process all the data in our stage table and push it to the results table:

DECLARE @PHONENUMBER VARCHAR(MAX)
WHILE EXISTS(SELECT * FROM #TEMP_PROCESS)
   BEGIN
      SET @PHONENUMBER = (SELECT TOP 1 PHONENUMBER FROM #TEMP_PROCESS)
      INSERT INTO #TEMP_CLEANED
      EXEC dbo.RegexSelect '[0-9]',@PHONENUMBER
      DELETE FROM #TEMP_PROCESS WHERE PHONENUMBER = @PHONENUMBER
   END

Lastly, let's view the data from the results of our stored procedure:

SELECT * FROM #TEMP_CLEANED
query results

And clean up our temp table:

DROP TABLE #TEMP_CLEANED,#TEMP_PROCESS

Managing the SQL Python Environment

As a quick note, if you ever want to add modules to your python – SQL Server specific environment here is how you can do that. As a small note this Python environment is completely different than if you have an Anaconda or Python stand alone install already on your workstation.

Inside of your command prompt:

cd C:\Program Files\Microsoft SQL Server\SERVERINSTANCE\PYTHON_SERVICES\Scripts
pip list

pip is a CLI tool that helps manage Python libraries, if you run pip in this directory you will see exactly what your SQL python scripts have access to and can add to that if you wish with pip install.

If you wish to install an additional Python library you can do so with the pip.exe as follows:

pip.exe install text-tools

Conclusion

This is one way to get regex type features in SQL Server.  It would be nice if this was a function, but it is not possible to create a function when you use sp_execute_external_script, so stay tuned for other tips on how to get regex features in SQL Server.

Next Steps

Check out these related articles:



sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Ian Fogelman Ian Fogelman is professional DBA with a hunger and interest for data science and application development. Ian enjoys leveraging cloud-based technologies and data to solve and automate tasks. Ian holds a MS in computer science with a concentration in software development. Ian is also a member of his local SQL Server user group and a member of his local data science user group in Columbus Georgia.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2020-07-30

Comments For This Article




Monday, September 7, 2020 - 6:27:57 AM - Jim Dickson Back To Top (86425)
Very nice.

I am new to ms sql dbs.

Will your technique work in Azure SQL database / managed instance?

Is there a similar approach for Java?














get free sql tips
agree to terms