By: Aaron Bertrand | Updated: 2023-03-09 | Comments | Related: > Dates
Problem
A while back, I wrote a couple of tips about simulating TRY_CONVERT
–
a function added in SQL Server 2012 – in earlier versions (see
Part 1 and
Part 2). Recently, someone in the community had a similar requirement: they
wanted to make TRY_PARSE
work for dates in SQL Server 2008. Now, I'm
not one to help people stay on versions of SQL Server that are now 15 years old,
but I do acknowledge that hands may be tied, and they can't always control
the version they're stuck with.
Solution
We can't make TRY_PARSE
just work in SQL Server 2008; if you
want to use the function directly, you'll need to upgrade. What we can do
is make our own version of the function. That said, we can't make the syntax
fully compatible with TRY_PARSE
so that, when we upgrade, the change
is minimal. The syntax for TRY_PARSE
is:
TRY_PARSE(string_input AS data_type [ USING culture ] )
If we write our own function (even in CLR), we won't be able to call it with that syntax exactly, and we won't be able to have optional arguments. It would be more like this (including a type-specific suffix to avoid future naming conflicts):
dbo.try_parse_date(@string_input, @culture)
TRY_PARSE
is complex, and if we're going to write our own
version, we should set some ground rules:
- Let's stay away from CLR (mostly because it's not compatible with Edge, Azure, etc.);
- Let's handle dates only and not try to parse strings into any types that also include time;
- Let's handle numeric dates only and not try to determine the validity of, say, foreign month names; and,
- Let's keep one eyebrow raised for two-digit years; two decades after Y2K, these should be fossils.
When we think about the problem at a high level, we want to provide a generic
way to take a "date" in one of many different formats and interpret
it as accurately as possible, using rules from potentially any culture. If we can't
do that reliably, we should return NULL
instead of an error like we
would see from PARSE
or CONVERT
. However:
- As a human, I can't tell by looking at 12/8/13 whether you meant December 8th or August 12th (and further, whether you meant 2012 or 2013, or 1912 or 1913, for that matter). This can make programmatically parsing strings very hard, and we're not even talking about bringing in regional-specific elements like day names or month names.
- A regional string like m/d/y or d/m/y may have 2- or 4-digit years and may or may not have leading zeros on the day or month. Software is very lenient with some of these aspects – I haven't used Microsoft Access in over two decades, but I remember it being forgiving to a fault. If you typed 12/8/2022, it would store December 8th, but if you entered 13/8/2022, it would assume you messed up that value only and would happily store August 13th – without warning you that it changed its interpretation. This is dangerous behavior and leads to what I consider data corruption. We don't want to do this here.
With those things in mind, let's look at a few formats, how we want to
interpret them based on the specified culture, and how we can silently fail when
a value can't be parsed correctly. The whole point of TRY_PARSE
is that if it can't evaluate the input string as a date according to the culture,
it should return NULL
instead. So consider this input, how we can translate
it to a universally safe and unambiguous string format, and what the output should
be.
Input string | Specified culture | Safe format (style #) | ISDATE(Safe format) | Output (date) |
---|---|---|---|---|
2/30/2022 | en-US | 20220230 (112) | 0 (no Feb 30th) | NULL |
2.7.2022 | en-US | 20220207 (112) | 1 | 2022-02-07 |
2-14-2022 | en-GB | 20221402 (112) | 0 (no 14th month) | NULL |
12/8/22 | en-US | 221208 (12) | 1 | 2022-12-08 |
12/8/22 | en-GB | 220812 (12) | 1 | 2022-08-12 |
We can't convert the last two to a safer format (yyyyMMdd
)
because we don't want to blindly add '20'
and make
assumptions about 2-digit year cutoffs (otherwise, we'd have to add logic
for when we consider a two-digit year to be from last century).
For pseudo-code of how to interpret these correctly, we can think about it this
way, where |
represents any of the three most common separators (/
,
-
, or .
):
- For any culture:
- If the format is
####|##|##
or########
, in spite of known exceptions involving datetime, for dates, we assumeYMD
- If the format is
- If the culture is en-US or the few others that assume
MDY
:- if the format is
##|##|####
then we know it isMDY
- if the format is
##|##|##
then we assumeMDY
and rely on 2-digit year handling
- if the format is
- If the culture is, basically, anything except en-US and a few others:
- If the format is
##|##|####
then we know it isDMY
- If the format is
##|##|##
then we assumeDMY
and rely on 2-digit year handling
- If the format is
- If any of the above is true, rearrange the string to
yyyyMMdd
(112) oryyMMdd
(12) - If none of those formats match, return
NULL
- Even if a format matches, leave the result out if it can't be converted
to a date, according to
ISDATE
To implement this logic, I created two tables, one to store the various formats
and how to parse each one for extracting the intended year, month, and day, and
another for storing the regions/cultures we want to support and which formats belong
to each one. Storing this information in a table means that our code to deal with
any given format can be that much simpler (and not involve a bunch of CASE
expressions and other redundant code).
Since SQL Server doesn't support RegEx natively, I stored a row in our lookup table for every possible format combination.
CREATE TABLE dbo.DateFormats ( DateStyle char(3), -- 'all', 'mdy', 'dmy' Pattern varchar(64), ys tinyint, -- character where year starts yl tinyint, -- length of year ms tinyint, -- character where month starts ml tinyint, -- length of month ds tinyint, -- character where day starts dl tinyint, -- length of day INDEX CIX_DateFormats_DateStyle CLUSTERED(DateStyle) );
Now, all the potential formats and the positional markers to help extract the date parts given the format:
INSERT dbo.DateFormats(DateStyle,Pattern,ys,yl,ms,ml,ds,dl) VALUES
-- universal 112/yyyyMMdd ('all', '[0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9]' ,1,4,5,2,7,2), -- universal 120/yyyy-[M]M-[d]d or 102/yyyy.[M]M.[d]d or 111/yyyy/[M]M/[d]d ('all', '[0-9][0-9][0-9][0-9][./-][0-1][0-9][./-][0-3][0-9]' ,1,4,6,2,9,2), ('all', '[0-9][0-9][0-9][0-9][./-][0-9][./-][0-9]' ,1,4,6,1,8,1), ('all', '[0-9][0-9][0-9][0-9][./-][0-9][./-][0-3][0-9]' ,1,4,6,1,8,2), ('all', '[0-9][0-9][0-9][0-9][./-][0-1][0-9][./-][0-9]' ,1,4,6,2,9,1), -- MDY 1/101 [m]m/[d]d/[yy]yy or [m]m.[d]d.[yy]yy or [m]m-[d]d-[yy]yy ('mdy', '[0-1][0-9][./-][0-3][0-9][./-][0-9][0-9][0-9][0-9]' ,7,4,1,2,4,2), ('mdy', '[1-9][./-][0-3][0-9][./-][0-9][0-9][0-9][0-9]' ,6,4,1,1,3,2), ('mdy', '[0-1][0-9][./-][1-9][./-][0-9][0-9][0-9][0-9]' ,6,4,1,2,4,1), ('mdy', '[1-9][./-][1-9][./-][0-9][0-9][0-9][0-9]' ,5,4,1,1,3,1), ('mdy', '[0-1][0-9][./-][0-3][0-9][./-][0-9][0-9]' ,7,2,1,2,4,2), ('mdy', '[1-9][./-][0-3][0-9][./-][0-9][0-9]' ,6,2,1,1,3,2), ('mdy', '[0-1][0-9][./-][1-9][./-][0-9][0-9]' ,6,2,1,2,4,1), ('mdy', '[1-9][./-][1-9][./-][0-9][0-9]' ,5,2,1,1,3,1), -- DMY 3/103 [d]d/[m]m/[yy]yy or [d]d.[m]m.[yy]yy or [d]d-[m]m-[yy]yy ('dmy', '[0-3][0-9][./-][0-1][0-9][./-][0-9][0-9][0-9][0-9]' ,7,4,4,2,1,2), ('dmy', '[1-9][./-][0-1][0-9][./-][0-9][0-9][0-9][0-9]' ,6,4,3,2,1,1), ('dmy', '[0-3][0-9][./-][1-9][./-][0-9][0-9][0-9][0-9]' ,6,4,4,1,1,2), ('dmy', '[1-9][./-][1-9][./-][0-9][0-9][0-9][0-9]' ,5,4,3,1,1,1), ('dmy', '[0-3][0-9][./-][0-1][0-9][./-][0-9][0-9]' ,7,2,4,2,1,2), ('dmy', '[1-9][./-][0-1][0-9][./-][0-9][0-9]' ,6,2,3,2,1,1), ('dmy', '[0-3][0-9][./-][1-9][./-][0-9][0-9]' ,6,2,4,1,1,2), ('dmy', '[1-9][./-][1-9][./-][0-9][0-9]' ,5,2,3,1,1,1);
Then the table that maps these regional formats to specific cultures, which I determined using the following query:
DECLARE @t table(culture char(5)), @d char(10) = '02/07/2022'; INSERT @t(culture) VALUES ('en-US'),('de-DE'),('fr-FR'),('ja-JP'),('da-DK'),('es-ES'),('it-IT'), ('nl-NL'),('nn-NO'),('pt-PT'),('fi-FI'),('sv-SE'),('Cs-CZ'),('Hu-HU'), ('Pl-PL'),('Ro-RO'),('hr-HR'),('Sk-SK'),('Sl-SI'),('El-GR'),('bg-BG'), ('Ru-RU'),('Tr-TR'),('en-GB'),('Et-EE'),('lv-LV'),('lt-LT'),('pt-BR'), ('zh-TW'),('Ko-KR'),('zh-CN'),('ar-SA'),('Th-TH'); SELECT culture, [month] = DATENAME(MONTH, TRY_PARSE(@d AS date USING culture)) FROM @t ORDER BY [month], culture;
Results:
[month] = NULL | [month] = February (MDY) | [month] = July (DMY) | |
---|---|---|---|
ar-SA | en-US | bg-BG | lv-LV |
Hu-HU | Cs-CZ | nl-NL | |
ja-JP | da-DK | nn-NO | |
Ko-KR | de-DE | Pl-PL | |
lt-LT | El-GR | pt-BR | |
sv-SE | en-GB | pt-PT | |
zh-CN | es-ES | Ro-RO | |
zh-TW | Et-EE | Ru-RU | |
fi-FI | Sk-SK | ||
fr-FR | Sl-SI | ||
hr-HR | Th-TH | ||
it-IT | Tr-TR |
So, ignoring ar-SA
for the time being, we can put these values in:
CREATE TABLE dbo.CultureMapping ( Culture varchar(5), DateStyle char(3) CONSTRAINT PK_CDFM PRIMARY KEY(Culture, DateStyle) ); INSERT dbo.CultureMapping(Culture, DateStyle) VALUES ('en-US', 'mdy'), ('bg-BG', 'dmy'), ('lv-LV', 'dmy'), ('Hu-HU', 'mdy'), ('Cs-CZ', 'dmy'), ('nl-NL', 'dmy'), ('ja-JP', 'mdy'), ('da-DK', 'dmy'), ('nn-NO', 'dmy'), ('Ko-KR', 'mdy'), ('de-DE', 'dmy'), ('Pl-PL', 'dmy'), ('lt-LT', 'mdy'), ('El-GR', 'dmy'), ('pt-BR', 'dmy'), ('sv-SE', 'mdy'), ('en-GB', 'dmy'), ('pt-PT', 'dmy'), ('zh-CN', 'mdy'), ('es-ES', 'dmy'), ('Ro-RO', 'dmy'), ('zh-TW', 'mdy'), ('Et-EE', 'dmy'), ('Ru-RU', 'dmy'), ('fi-FI', 'dmy'), ('Sk-SK', 'dmy'), ('fr-FR', 'dmy'), ('Sl-SI', 'dmy'), ('hr-HR', 'dmy'), ('Th-TH', 'dmy'), ('it-IT', 'dmy'), ('Tr-TR', 'dmy');
Then we create an inline table-valued function that looks at the string and, based on the defined culture, checks which pattern it matches, reconstructs the string in a safe format, and then checks to see if that value is a date:
CREATE FUNCTION dbo.try_parse_date ( @DateString varchar(10), @Culture varchar(5) ) RETURNS TABLE WITH SCHEMABINDING AS RETURN ( SELECT TheDate = CONVERT(date, DateString, DestStyleNumber) FROM ( SELECT TOP (1) DestStyleNumber = CASE WHEN yl = 2 THEN 12 ELSE 112 END, DateString = SUBSTRING(@DateString, ys, yl) + RIGHT('0' + SUBSTRING(@DateString, ms, ml),2) + RIGHT('0' + SUBSTRING(@DateString, ds, dl),2) FROM dbo.DateFormats AS df INNER JOIN dbo.CultureMapping AS cm ON df.DateStyle IN (cm.DateStyle, 'all') WHERE @DateString LIKE df.Pattern AND (@Culture LIKE cm.Culture) ORDER BY df.DateStyle -- pick 'all' first ) AS Translated WHERE ISDATE(DateString) = 1 );
Now all the complicated logic is hidden away in our tables and function, and we can write queries like this:
DECLARE @table TABLE(string varchar(10), culture char(5)); INSERT @table(string, culture) VALUES ('2/30/2022', 'en-US'), -- invalid date ('2.7.2022', 'en-US'), -- perfectly fine ('2-14-2022', 'en-GB'), -- no 14th month ('12/8/22', 'en-US'), -- Aug 12 ('12/8/22', 'en-GB'); -- Dec 8 SELECT t.string, t.culture, f.TheDate FROM @table AS t OUTER APPLY dbo.try_parse_date(t.string, t.culture) AS f;
Output:
string culture TheDate =========== ======= ========== 2/30/2022 en-US NULL 2.7.2022 en-US 2022-02-07 2-14-2022 en-GB NULL 12/8/22 en-US 2022-12-08 12/8/22 en-GB 2022-08-12
Checking my work more thoroughly, I ran this query to spot-check a few formats
against every culture and compared the output to TRY_PARSE
.
WITH src(DateString) AS ( SELECT d FROM ( VALUES ('2/30/2022'), ('2.7.2022'), ('2-14-2022'), ('12/8/22'), ('2022/4/5'), ('2022/04/19'), ('20220419'), ('2/15/2023'),('2/29/2020'), ('2021-31-07') ) AS d(d) ) SELECT DateString, Culture, [Function], TryParse FROM ( SELECT src.DateString, m.Culture, [Function] = f.TheDate, TryParse = TRY_PARSE(src.DateString AS date USING m.Culture) FROM src CROSS JOIN dbo.CultureMapping AS m OUTER APPLY dbo.try_parse_date(src.DateString, m.Culture) AS f ) AS x WHERE [Function] <> TryParse OR ([Function] IS NOT NULL AND TryParse IS NULL) OR ([Function] IS NULL AND TryParse IS NOT NULL) ORDER BY DateString, TryParse, Culture;
Most of the output from the 320 rows returned by the function (277, or 86.6%)
matched TRY_PARSE
, but I did see some interesting results – 43
rows came back where the output was different:
- [7 rows]: For all 7 of the non-
en-US
cultures whereTRY_PARSE
interprets 2/7/2022 asMDY
(Feb. 7th, 2022), 12/8/22 is interpreted asDMY
instead (August 22nd, 2012). So, yes, two-digit years are a problem. - [4 rows]: Using
Th-TH
,TRY_PARSE
converted values like 2.7.2022 -> 1479-07-02, 12/8/22 -> 1979-08-12, and 2022/04/19 -> 1479-04-19. This was unexpected, and the function treats these correctly (the first two as DMY and the third not mysteriously subtracting 543 years). - [32 rows]:
TRY_PARSE
refused to recognizeyyyyMMdd
as a date in any culture, even though this is a universally safe format for usingCAST
orCONVERT
.
If you want behavior perfectly congruent with TRY_PARSE
, this function
won't quite do that. I don't know if we can work around these
exceptions in a tidy way, so unless these are critical formats and cultures to support,
I would probably remove two-digit year formats from mdy,
and –
like ar-SA
– just skip Th-TH
completely.
DELETE dbo.DateFormats WHERE yl = 2 AND DateStyle = 'mdy'; DELETE dbo.CultureMapping WHERE Culture = 'Th-TH';
This also doesn't provide perfect congruence, so which type of different
behavior you want is up to you. An easier decision, I think, is to let the function
treat yyyyMMdd
correctly instead of matching the incorrect (IMHO) behavior
of TRY_PARSE
. However, if you want to match the behavior where these
return NULL
(which would increase the compatibility to about 96.5%),
delete that single row from dbo.DateFormats
:
DELETE dbo.DateFormats WHERE Pattern = '[0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9]';
There may be other obscure format variations that cause issues with specific cultures, as it is tedious to test this thoroughly. But this should be a good starting point if you have nothing better to do than wait for someone to approve an upgrade.
Next Steps
- This is clearly incomplete – for simplicity, I ignored two cultures altogether, discarded some two-digit year patterns, and stuck to numeric-only date formats. You can extend this with all cultures and any other date formats you need to support, though I caution against trying to interpret month names, for example, in every single language SQL Server supports. If this is where you want to be, you should think about CLR (if Azure isn't on your horizon), upgrading to a version that supports TRY_PARSE, or not handling this type of fragile string interpretation inside SQL Server in the first place. One way to avoid the problem is to make sure your applications and forms use standard date pickers or calendar controls, so you're not ever trying to interpret free text input.
- See these tips and other resources:
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2023-03-09