Comparing Files from Different Folders Using SQL Server and XML

By:   |   Updated: 2009-05-11   |   Comments (8)   |   Related: > XML


Problem

Sometimes there is a need to process files in a folder, but first you need to determine which files need to be processed compared to older files that have already been processed. There are several ways that this can be done, but in this tip I show you a way this can be done using SQL Server and XML.

To further illustrate the issue I have set this up into three folders:

  • Source - files to be processed
  • Destination - files that have been processed
  • NewFiles - holding folder for new files to be processed after they have been compared

Elaborating the problem in depth, let's say I have three text files (or any type of file) in the Source folder and the Destination folder contains two files. The problem is to identify differences between the files in the Source and Destination folders and copy all the modified or new files from the Source Folder to the folder NewFiles.

Shown below are the set of files with their modified date in the Source folder:

folder tasks

Shown below are the set of files with their modified date in Destination folder:

text document

Looking at the images above, it is clear that Source Folder contains one extra file ‘text3' and one modified file ‘text2'.

I want to compare the files in these two folders and copy the modified file (text2) and the new file (text3) into folder NewFiles.

Solution

My first task was to get all the files from the specified folder. This can be achieved via the ‘DIR' command of DOS. DOS commands can be executed in SQL Server through xp_cmdshell system stored procedure, but it needs to be configured first.

The below code creates a new stored procedure called "MakeXML" which will create a XML document of all of the files in a folder.  The code does a few things which you can either turn on or off as desired. 

  • It first drops stored procedure MakeXML so it can create a new sp with this name
  • It enables xp_cmshell so we can shell out to DOS to get the folder listing
  • It does a DIR command against the specified folder with the /O-D parameter to order the list by modified date
  • It then inserts the data into a temporary table
  • Deletes any rows from this table that are not files that we want to compare
  • Parses the modified date
  • Parses the filename
  • Then it creates this data as a temporary XML object
IF OBJECT_ID (N'MakeXML') IS NOT NULL 
   DROP PROCEDURE MakeXML 
GO 

-------INPUT THE FILEPATH FOLDER AND IT WILL RETURN AN XML 
CREATE PROCEDURE MakeXML 
@FilePath VARCHAR(2000), 
@xmlFile XML OUTPUT 
AS 
DECLARE @myfiles TABLE (MyID INT IDENTITY(1,1) PRIMARY KEY,FullPath VARCHAR(2000),ModifiedDate datetime, FileName VARCHAR(100)) 
DECLARE @CommandLine VARCHAR(4000) 

----Configure xp_cmdshell in 2005 
-- To allow advanced options to be changed. 
EXEC sp_configure 'show advanced options', 1 
-- To update the currently configured value for advanced options. 
RECONFIGURE 
-- To enable the feature. 
EXEC sp_configure 'xp_cmdshell', 1 
-- To update the currently configured value for this feature. 
RECONFIGURE 

----- run dir command on the path specified by ordering on the date modified 
SELECT @CommandLine =LEFT('dir "' + @FilePath + '" /O-D',10000) 

------use xp_cmdshell stored procedure to run dos commands 
INSERT INTO @MyFiles (FullPath) 
EXECUTE xp_cmdshell @CommandLine 

------Delete all the files that are other than .txt or can specify the file type you want to compare on 
DELETE FROM @MyFiles WHERE fullpath IS NULL 
OR Fullpath NOT LIKE '%.txt%' 

------Separate the modified date from the fullpath  
UPDATE @myfiles SET ModifiedDate=SUBSTRING(fullpath,1,PATINDEX('%[AP]M%',fullpath)+1)  

------Separate out FileName from fullpath 
UPDATE @myfiles SET FileName=SUBSTRING(fullpath,40,LEN(fullpath)) 

-----Make an xml out of separated ModifiedDate,FileName columns and SET the OUTPUT parameter 
SELECT @xmlFile=(SELECT ModifiedDate,FileName 
                 FROM @MyFiles FOR 
                      XML PATH('FILE'), 
                      ROOT('ROOT') 
                      )  
GO 

For our example the stored procedure would return XML for the Source folder ordered on modified date as shown below.

root file

For our example the stored procedure would return XML for the Destination folder ordered on modified date as shown below.

modified date

Now, the next task is to compare the two XML documents from Source and Destination, this is achieved by the following stored procedure.

It takes three input parameters

  • @Sourcepath which is ‘D:\Source'
  • @DestPath is ‘D:\Destination' and
  • @NewFilesPath as ‘D:\NewFiles'
IF OBJECT_ID (N'CompareXML') IS NOT NULL 
   DROP PROCEDURE CompareXML 
GO 

--------COMPARE XML 
CREATE PROCEDURE CompareXML 
@SourcePath  VARCHAR(255), 
@DestPath  VARCHAR(255), 
@NewFilesPath  VARCHAR(255) 
AS 
DECLARE @SourceXML AS XML,@DestXML AS XML 

---Call the MakeXML sp to return the XML for Source and Destination Folders 
EXECUTE MakeXML @SourcePath,@SourceXML OUTPUT 
EXECUTE MakeXML @DestPath,@DestXML OUTPUT 

DECLARE @CURSOR_File AS CURSOR 
---------Join both the tables retrieved from xml on filename and get all the files having modifieddate mismatch 
---------Fetch all the values into a cursor 
SET @CURSOR_File= CURSOR 
FOR 
      SELECT A.FNAME  
      FROM 
      ( 
            SELECT N.value('ModifiedDate[1]','varchar(100)') AS MDATE , 
            N.value('FileName[1]','varchar(100)') AS FNAME  
            FROM  
            @SourceXML.nodes('ROOT/FILE') AS X(N) 
      )A 
      INNER JOIN  
      ( 
            SELECT N.value('ModifiedDate[1]','varchar(100)') AS MDATE , 
            N.value('FileName[1]','varchar(100)') AS FNAME  
            FROM  
            @DestXML.nodes('ROOT/FILE') AS X(N) 
      )B 
      ON A.FNAME=B.FNAME 
      AND A.MDATE<>B.MDATE 
UNION  
----------Fetch all the extra files in the source folder  
      SELECT A.FNAME  
      FROM 
      ( 
            SELECT N.value('ModifiedDate[1]','varchar(100)') AS MDATE , 
            N.value('FileName[1]','varchar(100)') AS FNAME  
            FROM  
            @SourceXML.nodes('ROOT/FILE') AS X(N) 
      )A 
      LEFT OUTER JOIN  
      ( 
            SELECT N.value('ModifiedDate[1]','varchar(100)') AS MDATE , 
            N.value('FileName[1]','varchar(100)') AS FNAME  
            FROM  
            @DestXML.nodes('ROOT/FILE') AS X(N) 
      )B 
      ON A.FNAME=B.FNAME 
      WHERE b.fname IS NULL 

DECLARE @file AS VARCHAR(255) 
DECLARE @Command AS VARCHAR(500) 

--------First copy all the files found mismatched on modified date into @NewFilesPath 
OPEN @CURSOR_File 
FETCH NEXT FROM @CURSOR_File INTO @file 

WHILE @@Fetch_Status=0 
BEGIN 
      SET @Command='xcopy "'+@SourcePath+'\'+ @file+ '" "' + @NewFilesPath + '"' 

      EXEC xp_cmdshell @Command,no_output 

      FETCH NEXT FROM @CURSOR_File INTO @file 
END 

CLOSE @CURSOR_File 
DEALLOCATE @CURSOR_File 
GO 

The CompareXML stored procedure above will first call the ‘MakeXML' sp and get the source and destination xml. It will fetch all the values from the nodes of both the XML documents and make two types of comparison on those.

  • First the comparison would be on the basis of filename having different modified date values.
  • The second comparison will fetch all the files from Source folder that do not exist in Destination folder.

The output of this is put into a Cursor so that such files are then copied to the ‘NewFiles' folder one by one using the DOS xcopy command.

After the two stored procedures have been created the code is executed as follows:

EXECUTE CompareXML @SourcePath='D:\Source', @DestPath='D:\Destination', @NewFilesPath='D:\NewFiles' 

Based on our example the "NewFiles" folder will now look like this.

file and folder tasks

The NewFiles folder has the same set of files which had a different modified date and the file which was extra in the Source folder. Now, just copy the files from this NewFiles folder to Destination folder.

The problem of comparing files based on modified date and filename has been drastically reduced compared to doing it manually.

Next Steps
  • To test this code create three folders (Source, Destination and NewFiles) and create some dummy (txt) files in the source folder and give it a try.:
  • To take this a step further you can even compare based on the file size
  • The extra or the modified files from ‘Source' folder can be copied directly to the ‘Destination' folder rather than copying them to the ‘NewFiles' folder.


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Divya Agrawal Divya Agrawal's bio is coming soon...

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2009-05-11

Comments For This Article




Thursday, August 29, 2013 - 3:48:54 PM - Tony Back To Top (26533)

In your example , as shown in your pics,  the MakeXML stored procedure would return two XML files for the Source and Destination folder . Are those two  files stored on a Temp Table or on a drive? 

What changes do I need to make to the code so that I would be able to store those files in a directory on my drive. I like to do that so I can compare the attributes of those two files against another XML file that I created/exported after  running a SQL query.

 

Thank you


Monday, November 23, 2009 - 8:48:43 PM - MooiLan Back To Top (4490)

Hi

 I tried the codes, it worked but get stucked in the datetime conversion problem as below:

Configuration option 'show advanced options' changed from 1 to 1. Run the RECONFIGURE statement to install.
Configuration option 'xp_cmdshell' changed from 1 to 1. Run the RECONFIGURE statement to install.

(15 row(s) affected)
Msg 242, Level 16, State 3, Line 24
The conversion of a char data type to a datetime data type resulted in an out-of-range datetime value.
The statement has been terminated.

(15 row(s) affected)


Below are the sample of MyFiles that I am getting. Can you help ?  Thanks

1   Volume in drive G is Data3 NULL NULL
2   Volume Serial Number is 7474-00B7 NULL NULL
3   NULL NULL NULL
4   Directory of G:\source NULL NULL
5   NULL NULL NULL
  6   24/11/2009  11:47 AM    <DIR>          .. NULL NULL
7  24/11/2009  11:47 AM    <DIR>          . NULL NULL
8  08/09/2009  06:29 PM               471 6.csv NULL NULL
9  08/09/2009  05:57 PM               470 4.csv NULL NULL
10  08/09/2009  05:44 PM               471 5.csv NULL NULL
11 18/06/2008 21:46 793 6om3.csv NULL NULL
12 18/06/2008 21:34 820 6mm.csv NULL NULL
13                5 File(s)          5 025 bytes NULL NULL
14                2 Dir(s)  104 954 339 328 bytes free NULL NULL
15 NULL NULL NULL


Thursday, May 21, 2009 - 9:51:06 PM - divya Back To Top (3419)

Robert,

I think that the filepath (C:\Source) which you have specified does not exist. Please read the whole article.. you need to have some files in the source filepath and destination filepath for comparing.. as i have not checked any exceptions in the article.. whenever no files are present in the filepath it will return one row as "File not found".

So, please make sure  you have kept some files in the source and destination filepath.. otherwise modify the code to return a msg to the user as "No files found"

 


Wednesday, May 20, 2009 - 2:00:15 PM - Robert Back To Top (3411)

i changed this line

set @FilePath='C:\Source'

an got that result

    MyID         FullPath

1                 Volume in drive C has no label.         NULL NULL
2                 Volume Serial Number is F80A-9A48 NULL NULL
3                 NULL                                              NULL NULL
4                 Directory of C:\                                NULL NULL
5                 NULL                                              NULL NULL
6                 File Not Found                                 NULL NULL
7                 NULL                                              NULL NULL

 thanks for you help!


Tuesday, May 19, 2009 - 11:03:11 PM - divya Back To Top (3402)

 Thanks Tom. I have used xml as i found it the best to compare tables with less efforts. It can be done without using XML.

And regarding using xcopy in bulk is possible too.

Thanks for ur suggestion.

 


Tuesday, May 19, 2009 - 3:16:55 PM - Tom Garth Back To Top (3400)

Hi Divya, 

I have similar code that is currently in production. Necessity is the mother, etc.

I was wondering why you chose to use XML.

I didn't do the following because the application I wrote must process and modify the file data before moving them, but had you thought of using xcopy in bulk?

SET @Command='xcopy "'+@SourcePath+'\*.* "' @NewFilesPath '\*.* /D /Q /Y /R /E /C /H"'

Thanks for a nice article.

Tom Garth


Tuesday, May 12, 2009 - 12:38:28 AM - divya Back To Top (3358)

 The error you might be getting because of the different way in the output of DIR command.

Could you please send me the output after running the below script:

DECLARE @myfiles TABLE (MyID INT IDENTITY(1,1) PRIMARY KEY,FullPath VARCHAR(2000),ModifiedDate datetime, FileName VARCHAR(100))
DECLARE @CommandLine VARCHAR(4000),@FilePath VARCHAR(2000)

set @FilePath='D:\source'

----Configure xp_cmdshell in 2005
      -- To allow advanced options to be changed.
      EXEC sp_configure 'show advanced options', 1
      -- To update the currently configured value for advanced options.
      RECONFIGURE
      -- To enable the feature.
      EXEC sp_configure 'xp_cmdshell', 1
      -- To update the currently configured value for this feature.
      RECONFIGURE

      ----- run dir command on the path specified by ordering on the date modified
      SELECT @CommandLine =LEFT('dir "' + @FilePath + '" /O-D',10000)

      ------use xp_cmdshell stored procedure to run dos commands
      INSERT INTO @MyFiles (FullPath)
      EXECUTE xp_cmdshell @CommandLine

     select * from @myfiles

 

The above script would return the list of files got by executing the DIR command and after that i am separating the modified date and filename from a single row returned. The conversion problem is in the same.

So, please mail me () or post the output of the above script, so that i can handle the error.

Thanks


Monday, May 11, 2009 - 2:17:27 PM - Robert Back To Top (3355)
i try running your example but I got the following error message:
 

Configuration option 'show advanced options' changed from 1 to 1. Run the RECONFIGURE statement to install.

Configuration option 'xp_cmdshell' changed from 1 to 1. Run the RECONFIGURE statement to install.

(13 row(s) affected)

(10 row(s) affected)

Msg 241, Level 16, State 1, Procedure MakeXML, Line 32

Conversion failed when converting datetime from character string.

 

Thanks for you help!















get free sql tips
agree to terms