Using the SQL Server MERGE Statement to Process Type 2 Slowly Changing Dimensions

By:   |   Updated: 2013-02-18   |   Comments (31)   |   Related: 1 | 2 | 3 | 4 | 5 | 6 | 7 | > TSQL


Problem

In my 18-plus years of T-SQL experience, the MERGE statement has got to be one of the most difficult statements I have had to implement. It is powerful and multifunctional, yet it can be hard to master. Oftentimes I would find examples of the MERGE statement that just didn't do what I needed it to do, that is to process a Type 2 slowly changing dimension.  Check out this tip to learn more.

Solution

The solution presented in this tip will walk through the steps of how to use the MERGE statement nested inside an INSERT statement to handle both new records and changed records in a Type 2 Slowly Changing Dimension table within a data warehouse. This solution will walk through the processing over three days. Each day, new and changed records are processed. Records are first inserted into a staging table and then the MERGE statement will insert new records into the slowly changing dimension table.

I suggest that you copy and paste each of the code samples into its own editor window in SQL Server Management Studio for ease of execution.

In Code Sample 1 below, we will create our staging table and our slowly changing dimension table.

--=============================================================================
-- Code Sample 1 
--=============================================================================
-- Create the staging table for the type two slowly changing dimension table data
create table dbo.tblStaging
(
  SourceSystemID int not null,
  Attribute1 varchar(128) not null 
    constraint DF_tblStaging_Attribute1 default 'N/A',
  Attribute2 varchar(128) not null 
    constraint DF_tblStaging_Attribute2 default 'N/A',
  Attribute3 int not null 
    constraint DF_tblStaging_Attribute3 default -1,
  DimensionCheckSum int not null 
    constraint DF_tblStaging_DimensionCheckSum default -1,
  LastUpdated datetime  not null 
    constraint DF_tblStaging_LastUpdated default getdate(),
  UpdatedBy varchar(50) not null 
    constraint DF_tblStaging_UpdatedBy default suser_sname()
)
-- Create the type two slowly changing dimension table
create table dbo.tblDimSCDType2Example
(
  SurrogateKey int not null identity(1,1) PRIMARY KEY,
  SourceSystemID int not null,
  Attribute1 varchar(128) not null 
    constraint DF_tblDimSCDType2Example_Attribute1 default 'N/A',
  Attribute2 varchar(128) not null 
    constraint DF_tblDimSCDType2Example_Attribute2 default 'N/A',
  Attribute3 int not null 
    constraint DF_tblDimSCDType2Example_Attribute3 default -1,
  DimensionCheckSum int not null 
    constraint DF_tblDimSCDType2Example_DimensionCheckSum default -1,
  EffectiveDate date not null 
    constraint DF_tblDimSCDType2Example_EffectiveDate default getdate(),
  EndDate date not null 
    constraint DF_tblDimSCDType2Example_EndDate default '12/31/9999',
  CurrentRecord char(1) not null 
    constraint DF_tblDimSCDType2Example_CurrentRecord default 'Y',
  LastUpdated datetime  not null 
    constraint DF_tblDimSCDType2Example_LastUpdated default getdate(),
  UpdatedBy varchar(50) not null 
    constraint DF_tblDimSCDType2Example_UpdatedBy default suser_sname()
)

In Code Sample 2, we insert two "new" records from the source system into the staging table. Then, we use the BINARY_CHECKSUM function to create a checksum value for the records in the staging table. This checksum value will be utilized later during the MERGE statement to detect changed records.

--============================================================================= 
--Code Sample 2
--=============================================================================
-- Start of Day 1 - truncate the staging table
truncate table dbo.tblStaging
-- insert a new record into the staging table into the staging table
insert into dbo.tblStaging
(SourceSystemID, Attribute1,   Attribute2, Attribute3)
values (1      , 'Mary Brown', 'Single'  , 143)
-- insert a new record into the staging table into the staging table
insert into dbo.tblStaging
(SourceSystemID, Attribute1,  Attribute2,  Attribute3)
values (2      , 'Ricky Green', 'Married'  ,   189)
-- Update the checksum value in the staging table
update dbo.tblStaging set DimensionCheckSum=
BINARY_CHECKSUM(SourceSystemID, Attribute1, Attribute2, Attribute3)

Code Sample 3 shows the MERGE statement (which happens to be embedded within an INSERT statement) that will be used to process the records between the two tables. The comments in the T-SQL code will help to explain what is happening throughout the statement. We will reuse Code Sample 3 throughout this tip.

There are two things that need to be noticed.

  • When executing Code Sample 3, only the count of updated records will be shown as "row(s) affected." Therefore, the first time it is executed against an empty slowly changing dimension table, there will be "(0 rows() affected)" because there are only inserts and no updates.
  • Please don't forget the semicolon at the end.
--=============================================================================
-- Code Sample 3
--=============================================================================
-- begin of insert using merge
insert into dbo.tblDimSCDType2Example
( --Table and columns in which to insert the data
  SourceSystemID,
  Attribute1,
  Attribute2,
  Attribute3,
  DimensionCheckSum,
  EffectiveDate,
  EndDate
)
-- Select the rows/columns to insert that are output from this merge statement 
-- In this example, the rows to be inserted are the rows that have changed (UPDATE).
select    
SourceSystemID,
Attribute1,
Attribute2,
Attribute3,
DimensionCheckSum,
EffectiveDate,
EndDate
from
(
  -- This is the beginning of the merge statement.
  -- The target must be defined, in this example it is our slowly changing
  -- dimension table
  MERGE into dbo.tblDimSCDType2Example AS target
  -- The source must be defined with the USING clause
  USING 
  (
    -- The source is made up of the attribute columns from the staging table.
    SELECT 
    SourceSystemID,
    Attribute1,
    Attribute2,
    Attribute3,
    DimensionCheckSum
    from dbo.tblStaging
  ) AS source 
  ( 
    SourceSystemID,
    Attribute1,
    Attribute2,
    Attribute3,
    DimensionCheckSum
  ) ON --We are matching on the SourceSystemID in the target table and the source table.
  (
    target.SourceSystemID = source.SourceSystemID
  )
  -- If the ID's match but the CheckSums are different, then the record has changed;
  -- therefore, update the existing record in the target, end dating the record 
  -- and set the CurrentRecord flag to N
  WHEN MATCHED and target.DimensionCheckSum <> source.DimensionCheckSum 
                                 and target.CurrentRecord='Y'
  THEN 
  UPDATE SET 
    EndDate=getdate()-1, 
    CurrentRecord='N', 
    LastUpdated=getdate(), 
    UpdatedBy=suser_sname()
  -- If the ID's do not match, then the record is new;
  -- therefore, insert the new record into the target using the values from the source.
  WHEN NOT MATCHED THEN  
  INSERT 
  (
    SourceSystemID, 
    Attribute1,
    Attribute2,
    Attribute3,
    DimensionCheckSum
  )
  VALUES 
  (
    source.SourceSystemID, 
    source.Attribute1,
    source.Attribute2,
    source.Attribute3,
    source.DimensionCheckSum
  )
  OUTPUT $action, 
    source.SourceSystemID, 
    source.Attribute1,
    source.Attribute2,
    source.Attribute3,
    source.DimensionCheckSum,
    getdate(),
    '12/31/9999'
) -- the end of the merge statement
--The changes output below are the records that have changed and will need
--to be inserted into the slowly changing dimension.
as changes 
(
  action, 
  SourceSystemID, 
  Attribute1,
  Attribute2,
  Attribute3,
  DimensionCheckSum,
  EffectiveDate,
  EndDate
)
where action='UPDATE';

So let's review the steps to get this example to work:

  • Execute Code Sample 1 to create the tables in this tip.
  • Execute Code Sample 2 to insert records into the staging table.
  • Execute Code Sample 3 to merge the new and changed records into the slowly changing dimension table.
  • Execute Code Sample 4 below to examine the records in both of the tables.
--============================================================================= 
--Code Sample 4
--=============================================================================
select * from dbo.tblStaging order by SourceSystemID
select * from dbo.tblDimSCDType2Example order by SourceSystemID, SurrogateKey

In the figure below, we see the results from Code Sample 4 where our two new records in the staging table have been inserted into the slowly changing dimension table. Please notice the SurrogateKey, EffectiveDate, EndDate, and CurrentRecord columns as they will change as we move forward with this example.

notice the SurrogateKey, EffectiveDate, EndDate, and CurrentRecord columns as they will change as move forward

In Code Sample 5 shown below, two new records (SourceSystemID 3 and SourceSystemID 4) and one updated record (SourceSystemID 2) are inserted into the staging table and checksums are calculated.

 
--=============================================================================
--Code Sample 5
--=============================================================================
--Start of Day 2 - truncate the staging table
truncate table dbo.tblStaging
-- insert a new record into the staging table
insert into dbo.tblStaging
(SourceSystemID, Attribute1, Attribute2, Attribute3)
values (3      , 'Jane Doe', 'Single'  , 123)
-- insert a new record into the staging table
insert into dbo.tblStaging
(SourceSystemID, Attribute1, Attribute2, Attribute3)
values (4      , 'John Doe', 'Married' , 246)
-- insert a changed record into the staging table
insert into dbo.tblStaging
(SourceSystemID, Attribute1,       Attribute2, Attribute3)
values (2      , 'Ricky L. Green', 'Married' , 189)
-- Update the checksum value in the staging table
update dbo.tblStaging set DimensionCheckSum=
BINARY_CHECKSUM(SourceSystemID, Attribute1, Attribute2, Attribute3)

After executing Code Sample 5 to insert records into the staging table, execute the MERGE/INSERT statement in Code Sample 3 and the SELECT statements in Code Sample 4. The results of the SELECT statements are shown in the figure below. Notice how there is a second record for SourceSystemID 2 (SurrogateKey 5) that reflects the change in the Attribute2 column. Also, notice how the DimensionCheckSum column is different between SurrogateKeys 2 and 5. Furthermore, the EndDate for SurrogateKey 2 has changed from 12/31/9999 to 01/27/2013 and the CurrentRecord is set to 'N'.

The results of the SELECT statements

In Code Sample 6 shown below, one new record (SourceSystemID 5) and three updated records (SourceSystemIDs 3, 4 and 5) are inserted into the staging table and checksums are calculated.

--=============================================================================
-- Code Sample 6
--=============================================================================
--Start of Day 3 - truncate the staging table
truncate table dbo.tblStaging
-- insert a changed record into the staging table
insert into dbo.tblStaging
(SourceSystemID, Attribute1,       Attribute2, Attribute3)
values (3      , 'Jane Doe-Jones', 'Married' , 123)
-- insert a changed record into the staging table
insert into dbo.tblStaging
(SourceSystemID, Attribute1,  Attribute2, Attribute3)
values (4      , 'John Doe', 'Married'  , 220)
-- insert a new record into the staging table
insert into dbo.tblStaging
(SourceSystemID, Attribute1,   Attribute2, Attribute3)
values (5      , 'Bill Smith', 'Married' , 198)
-- insert a changed record into the staging table
insert into dbo.tblStaging
(SourceSystemID, Attribute1,   Attribute2, Attribute3)
values (1      , 'Mary Brown', 'Single'  , 136)
-- Update the checksum value in the staging table
update dbo.tblStaging set DimensionCheckSum=
BINARY_CHECKSUM(SourceSystemID, Attribute1, Attribute2, Attribute3)

After executing Code Sample 6 to insert records into the staging table, again execute the MERGE/INSERT statement in Code Sample 3 and the SELECT statements in Code Sample 4. The results of the latest SELECT statements are shown in the figure below. Notice how there are new records for SourceSystemID 1, 3 and 4 where the CurrentRecord column is set to 'Y' and the EndDate is 12/31/9999.

there are new records for SourceSystemID 1, 3 and 4 where the CurrentRecord column is set to 'Y' and the EndDate is 12/31/9999
Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Dallas Snider Dr. Dallas Snider is an Assistant Professor in the Computer Science Department at the University of West Florida and has 18+ years of SQL experience.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2013-02-18

Comments For This Article




Friday, January 5, 2024 - 2:32:36 PM - David S Back To Top (91835)
If for example I have a record that's been inserted a bunch of times with minor changes that we see over time.

Then one day it is no longer in the source/staging table. So we close out the record with CurrentRecord=N and set an end date. All versions of the record are N.

Then some time later, the record appears back in the source, so we would expect it to fall into the Changes output part of the merge, and insert a new row for the record. However it only seems to be working by inserting the new record if there was an old record that it found to close out. In this case, there is no record to close out. Any idea why it wouldn't land in the Changes bucket in this scenario?

Friday, March 24, 2023 - 12:00:53 PM - Greg Robidoux Back To Top (91038)
Hi Sara,

here is one option that uses the OUTPUT clause to get values.

https://stackoverflow.com/questions/52372924/get-rowcount-of-affected-rows-by-action-from-merge-without-using-output

-Greg

Friday, March 24, 2023 - 11:14:23 AM - Sara Back To Top (91037)
Thanks for the code.
How can I get the number of inserted and updated records?

Saturday, December 26, 2020 - 10:04:55 AM - Rajeev Back To Top (87953)
I tried with same script in sql dwh but getting below error. could you please help.
Msg 103010, Level 16, State 1, Line 1
Parse error at line: 36, column: 3: Incorrect syntax near 'MERGE'.

Wednesday, August 12, 2020 - 2:42:41 AM - vishnu Back To Top (86286)
Thanks for this Amazing Article.
One question I had was regarding the "AS CHANGES" Keyword. Where can I get the documentation for this? could not find it anywhere.

Thanks.

Sunday, April 12, 2020 - 9:54:21 PM - GS Back To Top (85350)

thanks a lot for explaining the Merge. However there’s certain scenario and a question that I have: can you please help me?

Question: is it OK to have the expired row’s ENDing timestamp same as the STARTing of the newer version? or should there be a intentional lag? (this will affect how the READ program is coded for: the former cannot be written with a BETWEEN clause.

Scenario: for example,

we loaded records on Business day 1 so all are current
On business day 2, there were few records “updated” hence SCD 2 generated expired / current rows

now if there are some retro adjustments needed for Business Day 1, how can this be handled?

Best,

GS


Sunday, December 2, 2018 - 10:23:37 AM - Bharani Back To Top (78390)

Thanks Dallas, nice article. I have NULL's in my source system id and when i run the code 3 sample , they keep on adding to the dimension table. I have 20 rows with total and two rows are NULL's . When i updated a record in the staging and run the code sample 3 , the dimension table is getting 23 instead of 21 rows. Its adding the two null records.


Friday, November 30, 2018 - 11:22:35 AM - Bhargav Kandala Back To Top (78374)

 Great article, just curious what is the use of the constraints and why some are defaulted to -1 and some as 'N/A'


Wednesday, October 26, 2016 - 9:19:43 AM - Bert Zandbergen Back To Top (43640)

To close records that are deleted in the source the merge requires an extra clause:

  WHEN MATCHED and target.DimensionCheckSum <> source.DimensionCheckSum 
                                 and target.CurrentRecord='Y'
  THEN 
  UPDATE SET 
    EndDate=getdate()-1, 
    CurrentRecord='N', 
    LastUpdated=getdate(), 
    UpdatedBy=suser_sname()

The rows that are updated by this clause are added to the MERGE output. Since these rows have no source values, the rows
are all filled with null values. To prevent the inserting of thes, the outer INSERT needs an extra condition in its
WHERE clause:
WHERE action='UPDATE' AND source.SourceSystemID IS NOT NULL;

 


Tuesday, February 2, 2016 - 5:14:17 PM - Kevin Back To Top (40577)

 Could you add script to get count for inserts and updates

 


Tuesday, September 2, 2014 - 3:55:14 PM - Saul Back To Top (34356)

Great example Dallas,

Just would like to add that if for some reason you do not define the default values when creating/defining the tables, then you'll need to add the values for EffectiveFrom, EffectiveTo and IsCurrent to the code:

           ,getdate()
           ,CONVERT(datetime,'12/31/9999',101)
           ,'Y'

Thanks again.

 

Best Regards,

Saul.

 


Wednesday, November 27, 2013 - 5:29:14 PM - Kerri Back To Top (27631)

This just a temporary solution for us....we will move the logic to ETL during our next iteration of the project.


Wednesday, November 27, 2013 - 2:31:12 PM - Darryll Petrancuri Back To Top (27628)

I would strongly advise everyone to stay away from the use of MERGE. It is bug-ladden even today, this according to Microsoft. Also, it does not scale wel at all. It performs very poorly on large datasets. Furthermore, it does nothing that can't be accomplished with INSERT, UPDATE and DELETE and in a much more high performance and maintainable fashion. And for what it's worth, I was a previous user of MERGE. This bias is not uninformed, but rather comes from suffering too many gotchas and learning about the bugs that are still present in the implementation of the statement today.


Sunday, November 24, 2013 - 4:08:59 PM - Kerri Back To Top (27589)

Great Article!!!  Exactly what I was looking for!  Thank you!


Monday, August 12, 2013 - 7:27:26 AM - Dan Back To Top (26243)

After reading some of the comments here I can tell why the industry is lacking on good Ms BI developers. Try reading some Kimball books (for example) before making unnecessary comments and what about being more constructive?

For those who are happy to keep on using insert and update statements, what about trying something new and different? Jz...

Good article, well explained. I also use Merge for types 1/2 dimension loading and Merge only cannot solve the issue with type 2 SCD.


Sunday, June 23, 2013 - 6:43:11 PM - knyazs Back To Top (25531)

Creating merge statement for Slowly Changing Dimension can be very difficult and time consuming, not to mention time to test it. That is why I created FREE helper application for creating MERGE statement called SCD Merge Wizard. As I said, application is free and you can try it here: https://scdmergewizard.codeplex.com.


Saturday, March 30, 2013 - 12:00:37 PM - prince10 Back To Top (23089)

Hello Dallas,

Thanks for your respond. i have the below code that am using, tell me what the difference is between the merge and the insert statement below

 

 


GO
/****** Object:  StoredProcedure [dbo].[LoadDimDepot]    Script Date: 03/30/2013 15:51:03 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO


ALTER PROCEDURE [dbo].[LoadDimDepot]
AS

DECLARE @CurrentTime DATETIME
SET @CurrentTime = GETDATE()

--STEP 1 Insert rows for source rows with changed attributes

Insert into dbo.DimDepot
(
DepotID,
DepotName,
AddressL1,
AddressL2,
TownCity,
County,
PostCode,
Country,
ValidFrom,
ValidTo,
IsActive
)

SELECT S.DepotID,
ISNULL (S.DepotName,'Uknown') as DepotName,
ISNULL (S.AddressL1,'Uknown') as AddressL1,
ISNULL (S.AddressL2,'Uknown') as AddressL2,
ISNULL (S.TownCity,'Uknown') as TownCity,
ISNULL (S.County,'Uknown') as County,
ISNULL (S.PostCode,'Uknown') as PostCode,
ISNULL (S.Country,'Uknown') as Country,
GETDATE() as ValidFrom, '9999-12-31' as ValidTo, 2 as IsActive

FROM dbo.[Staging.FinalDimDepot] S
INNER JOIN dbo.DimDepot D ON D.DepotID=S.DepotID
WHERE D.DepotID=S.DepotID AND IsActive= 1 AND
(
ISNULL (D.DepotName,'Unknown')<> ISNULL (S.DepotName,'Uknown')
OR ISNULL (D.AddressL1,'Unknown')<> ISNULL (S.AddressL1,'Uknown')
OR ISNULL (D.AddressL2,'Unknown')<> ISNULL (S.AddressL2,'Uknown')
OR ISNULL (D.TownCity,'Unknown')<> ISNULL (S.TownCity,'Uknown')
OR ISNULL (D.County,'Unknown')<> ISNULL (S.County,'Uknown')
OR ISNULL (D.PostCode,'Unknown')<> ISNULL (S.PostCode,'Uknown')
OR ISNULL (D.Country,'Unknown')<> ISNULL (S.Country,'Uknown')
)

--- STEP 2. Expire active rows if attributes are changed.

UPDATE dbo.DimDepot
set ValidTo = @CurrentTime, IsActive = 0
FROM dbo.[Staging.FinalDimDepot]S
WHERE DimDepot.DepotID=S.DepotID AND IsActive=1 AND
(
ISNULL (DimDepot.DepotName,'Unknown')<> ISNULL (S.DepotName,'Uknown')
OR ISNULL (DimDepot.AddressL1,'Unknown')<> ISNULL (S.AddressL1,'Uknown')
OR ISNULL (DimDepot.AddressL2,'Unknown')<> ISNULL (S.AddressL2,'Uknown')
OR ISNULL (DimDepot.TownCity,'Unknown')<> ISNULL (S.TownCity,'Uknown')
OR ISNULL (DimDepot.County,'Unknown')<> ISNULL (S.County,'Uknown')
OR ISNULL (DimDepot.PostCode,'Unknown')<> ISNULL (S.PostCode,'Uknown')
OR ISNULL (DimDepot.Country,'Unknown')<> ISNULL (S.Country,'Uknown')
)

-- Step 3. Update IsActive = 2 rows to 1
UPDATE dbo.DimDepot SET IsActive = 1 where IsActive = 2


-- -- Step 4. Insert new rows

INSERT INTO DimDepot
(
DepotID,
DepotName,
AddressL1,
AddressL2,
TownCity,
County,
PostCode,
Country,
ValidFrom,
ValidTo,
IsActive
)

SELECT DepotID,
ISNULL (DepotName,'Uknown') as DepotName,
ISNULL (AddressL1,'Uknown') as AddressL1,
ISNULL (AddressL2,'Uknown') as AddressL2,
ISNULL (TownCity,'Uknown') as TownCity,
ISNULL (County,'Uknown') as County,
ISNULL (PostCode,'Uknown') as PostCode,
ISNULL (Country,'Uknown') as Country,
'1900-01-01', '9999-12-31', 1

FROM dbo.[Staging.FinalDimDepot]
WHERE DepotID NOT IN
(SELECT DISTINCT DepotID FROM dbo.DimDepot)

 

 


Friday, March 29, 2013 - 10:36:11 PM - Dallas Snider Back To Top (23079)

Prince10,

Based on what you are stating, I believe your ValidFrom column is synonymous to my EffectiveDate column, your ValidTo column is synonymous to my EndDate column, and your IsActive column is synonymous to my CurrentRecord column.  Please refer to how I handle these columns above.

I hope this helps.

Dallas


Friday, March 29, 2013 - 4:05:06 PM - prince10 Back To Top (23076)

Nice post

I love the merge statement however, i am confused on some part, I am using a SP to do my SCD, where i call the SP but i do understand that merge statement is good.

My question now is that, when i have  additional columns to my dimensions  namely:,

ValidFrom

ValidTo

IsActive

With this three coloumns  above , how can i update the ValidFrom when it expires or new records comes in?Because most post i have seen dont have endDate, StartDate etc.

 

Any help or explanation on how to update those three columns will be usefull.

 


Friday, March 8, 2013 - 1:00:14 PM - Lamprey Back To Top (22670)

I have no issue with the article for what it is demonstrating. What I do have issue with is passing this off as something to do in a data warehouse. A data warehouse is no place for a dimensional model. The dimensional model fails at so many levels when trying to do a temporal data warehouse. If you are interested in implementing an actual temporal data warehouse, then I suggest you read Temporal Data & the Relational Model by CJ Date (http://www.amazon.com/Temporal-Relational-Kaufmann-Management-Systems/dp/1558608559). Assuming you have a solid relation background it shouldn't be too hard to extend which ever RDBMS you work with to handle temporal data.


Friday, March 8, 2013 - 6:37:39 AM - John Martin Back To Top (22658)

If you are worried about the checksums you can use the following instead of the checksum comparison in the "WHEN MATCHED" clause:

WHEN MATCHED AND NOT EXISTS (
   SELECT
   source.Attribute1,
   source.Attribute2,
   source.Attribute3
   
   INTERSECT
   
   SELECT
   target.Attribute1,
   target.Attribute2,
   target.Attribute3
   )
AND target.CurrentRecord='Y'

This handles nulls also.

Take a look at http://sqlblog.com/blogs/paul_white/archive/2011/06/22/undocumented-query-plans-equality-comparisons.aspx

 


Friday, March 1, 2013 - 2:35:05 PM - Dallas Snider Back To Top (22517)

With a Type 2 Slowly Changing Dimension (SCD), the idea is to track the changes to (or record the history of) an entity over time.  To accomplish this tracking, rows should never be deleted and the attributes are never updated.  Instead, changes in the data are applied through the end-dating of the existing current record and by flagging the record as no longer being current; while a new record is inserted with the changes in the attributes.

The INSERT/MERGE code above accomplishes the goals of maintaining a Type 2 SCD with a “minimal” amount of code to execute.

 


Friday, March 1, 2013 - 2:14:28 PM - Dallas Snider Back To Top (22516)

I understand the well-documented concerns about CHECKSUM() or BINARY_CHECKSUM() failing to detect changes in records.  However, I used the BINARY_CHECKSUM() as shown in the tip on a 4 million records and growing table with 30+ attributes passed to the function for almost four years in a production environment and it never failed to detect a change. 


Friday, March 1, 2013 - 7:29:07 AM - Terence Hill Back To Top (22502)
The comments nicely prove the worth of the article itself. I'd still say that the code lacks handling for the WHEN NOT MATCHED BY SOURCE case. We'd want to end up with no records having Current='Y' for the source ID, the last EndDate showing the point of deletion. Would "where action='UPDATE' and SourceSystemID IS NOT NULL" be sufficient?

Wednesday, February 20, 2013 - 4:38:17 PM - the sqlist Back To Top (22317)

Thaky you both for clarifications. I now know what a Type 2 Slowly Changing Dimension is. I actually used it a lot in my experience, also the The Type 6 pure one, but I never was not familiar with these terms.

I have to admit that using the output result from MERGE statement like that is very clever and it also must be well performing compared to other, more clasic, methods.


Wednesday, February 20, 2013 - 2:41:34 PM - Ty Back To Top (22316)

http://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_II


Wednesday, February 20, 2013 - 2:39:30 PM - Ty Back To Top (22315)

The reason why he has the merge inside the insert is because the update in the merge isn't the data update. It's just updating the existing record and invalidating it as the current record. In other words, keeping a history of record updates. You'll note it flags the record as 'N' and updates the enddate. This effectively archives this record to determine when it last changed and what values it had at that time.

It's the outer insert itself which 'updates' the table with the current data record.


Tuesday, February 19, 2013 - 10:31:35 PM - John Back To Top (22290)

I kind of agree with the first post made by "the sqlist".  Normally, when I've used the MERGE statement in the past, it is pretty much a self-contained statement that does the INSERT, UPDATE, and/or DELETE.  Not fully sure why one needs to use a MERGE inside an INSERT statement. 

You can create a "derived table" expression within the USING-clause as necessary.  And it's  easy if you just code your UPDATE section based upon the PK in the target table.   Here's a sample of a simple call:

 

   MERGE INTO [dbo].[dimCUSTOMER_EMAIL] AS TGT
   USING
   (
      SELECT
         C.CUSTOMERID  ,
         C.LASTNAME      ,
         C.FIRSTNAME     ,
         CE.EMAIL 
       FROM dbo.dimCUSTOMER C           
       INNER JOIN dbo.CUSTOMER_EMAIL CE  ON CE.CUSTOMERID = C.CUSTOMERID
      
   ) AS SRC(CUSTOMERID, LASTNAME, FIRSTNAME, EMAIL)
     ON  TGT.CUSTOMERID  = SRC.CUSTOMERID       
   WHEN MATCHED THEN
      UPDATE
      SET
         TGT.LASTNAME   = SRC.LASTNAME ,
         TGT.FIRSTNAME  = SRC.FIRSTNAME,
         TGT.EMAIL      = SRC.EMAIL
   WHEN NOT MATCHED THEN
      INSERT (CUSTOMERID, LASTNAME, FIRSTNAME,  EMAIL)
      VALUES (SRC.CUSTOMERID, SRC.LASTNAME, SRC.FIRSTNAME, SRC.EMAIL)
   WHEN NOT MATCHED BY SOURCE THEN
      DELETE;


Tuesday, February 19, 2013 - 7:19:39 AM - the sqlist Back To Top (22262)

From my understanding what that INSERT from MERGE statement will do is update the target table and then insert the rows that were updated and returned by the OUTPUT clause again into it. Is that really the intention?


Tuesday, February 19, 2013 - 7:12:18 AM - the sqlist Back To Top (22261)

I don't get it. Isn't MERGE supposed to do the insert already, then why the INSERT ststement above it? The purpose of this statement is to do the I/U/D operations in one single shot based on the rules defined and by joining the target table rows with the source rows. The only output from MERGE would be the results of the changes from the OUTPUT clause. Is that needed in the target table too?


Monday, February 18, 2013 - 10:39:35 AM - John Back To Top (22239)

Disclaimer:  I didn't tke the time to read your whole article, so please forgive me if you address this.

My concern is about the "BINARY_CHECKSUM()" function.  Will this option always tell you when data has changed?  Consider a "pathological" worst case where "Attribute1" is set to "... San Diego, CA 91311" for one data row.  Let's say the data row gets changed to "... San Diego, CA 91302".  Here the ASCII value of one byte (1) was decremented by one to (0) and then the ASCII value of the next byte (1) was incremented by one to (2) -- with a net change of zero for the computed checksum.  Will this case be correctly flagged as data changed?















get free sql tips
agree to terms