By: Tim Cullen | Updated: 2010-03-05 | Comments (4) | Related: 1 | 2 | 3 | 4 | > XML
Problem
The institution I work with receives data files in a variety of formats for import into a central SQL Server 2000 repository. We have a good handle on importing the files; however, some of the providers want to send data using XML files. Is there a simple process I can create to import these files?
Solution
XML is one of the most common mechanisms for data transfer, so it's no surprise that XML files are being used. While being flexible in how it can be developed, it is rigid in its acceptance of data on the receiving end. Thankfully there is a method available to assist with bulk importing XML files-that method is using SQLXMLBulkLoad. SQLXMLBulkLoad is part of SQLXML, which was originally created after the release of SQL Server 2000 to provide a bridge between SQL Server and XML data. In concept bulk loading XML files is fairly simple, although you must be familiar with the structure of XML files, XML schema language, and Visual Basic Scripting. Let's review the steps required to develop an XML Bulk Load strategy.
Get To Know The XML Files
You no doubt already know the structure of your database, as well as the format of the files you've previously uploaded. XML files follow a format similar to HTML in that data is enclosed within a hierarchical tag structure. Since XML technically stands for "eXtensible Markup Language", tags can be named pretty much whatever you want them to be named. But both the file creator and file recipient must use, and be looking for, consistent tag naming. In order for an XML file to be readable it must be what is called "well-formed". That means:
- Although not required, there is usually a declaration in the first part indicating the XML version and encoding
- There must be a ROOT tag that encloses the dataset. This tag can simply be called "ROOT" or other meaningful name
- There are nodes that can be one level or more
- Each tag must have an opening and closing tag. How you provide the opening and closing tag is up to you and the file creator
As an example we will import an XML file consisting of simple computer inventory data into a SQL Server instance. To house the information we create the following table and XML file:
CREATE TABLE dbo.tbl_ComputerInventory ( ComputerName VARCHAR(50) , IPAddress VARCHAR(15) , MACAddress VARCHAR(17) , ComputerType VARCHAR(20) , ModelNumber VARCHAR(20) ) |
Click here for a sample XML file for the table above |
Develop The XML Schema Definition file
The XML Schema Definition file (XSD) is simply a blueprint of the XML file structure and is recommended by the World Wide Web Consortium (W3C) as a standard for defining XML files. It defines the expectations of both the preparer and recipient, so an attempt to validate an improperly formatted XML file against a XSD file will result in failure. This applies to both tag structure as well as data incompatibilities (i.e., an alphabetic character in an integer field).
If you are familiar with XML formats you can create the schema in Notepad; if not you can use XML Notepad 2007 to create the schema file. In XML schema files there are two types of data types: complex and simple. To simplify the concept there is only one difference between simple and complex types: simple types cannot have child elements or attributes, whereas complex types can. In cases where you're bulk-loading XML data you will use the complex type. Also in the file is a declaration that provides information on how the XML data is related to the SQL database. Each document instance is then mapped to a table using the "sql:relation" phrase. In this case the "InventoryItem" document instance is mapped to the "tbl_ComputerInventory" table. Click here for a sample XML Schema Definition file based on the XML files above.
Using Visual Basic Script to Create the SQLXMLBulkLoad Object
Now that we have the XML file to test and the XML Schema file created, the next step is to create the Visual Basic script to bulk load the files. This Visual Basic script can be either a .vbs file or an ActiveX Script Task in a DTS package. The first section of the script is creation of a SQLXMLBulkLoad object. Within the SQLXMLBulkLoad object are a number of parameters that can be set-some mandatory, others optional, depending on the needs of your organization.
set objBL = CreateObject("SQLXMLBulkLoad.SQLXMLBulkload.4.0") objBL.ConnectionString = "provider=SQLOLEDB;data source=CULLENWS01\PRD2008;database=MSSQLTIPS;integrated security=SSPI" objBL.KeepIdentity=False objBL.Transaction=True objBL.ErrorLogFile = "F:\xml\error.xml" objBL.Execute "F:\xml\ComputerInventorySchema.xml", "F:\xml\ComputerInventory02.xml" set objBL=Nothing MsgBox "The SQLXMLBulkLoad process completed successfully",0,"XML Bulk Load Process" |
In our case we created a DTS package that is executed by a SQL Server Agent job on a regular basis. The package actually loops each file in the network share and attempts the bulk load on each one.
One interesting thing to note is the dataset below. The first time I executed the bulk load statement, the IPAddress and MACAddress were not imported. Now, no error was thrown during the process, but the bulk process ignored the columns because IPAddress and MACAddress were spelled IPaddress and MACaddress. Once corrected to the actual case sensitive spelling, the import populated all of the information. Yet another reminder that XML is case-sensitive and to keep that in mind when creating the XML Schema Definition file. Also, there is another method for formatting XML files (click here for a sample) where all tags are enclosed in a single line. Although this file format is technically "properly formed", the XML Schema Definition file could not map the information to their respective columns. The bulk load process, however, did not fail-when a query was executed the rows that could not be mapped were NULL.
During testing with live data, I found that the process works for both local files and files located on network shares. For an XML file with over 10,700 rows, and four columns in each row, the average execution time was 1.8 seconds for local files and around 2.6 seconds for files located on network shares.
Next Steps
- Review current methods used by your company to import XML files to see if using this bulk import process would be more efficient and faster
- Download SQLXML 3.0 Service Pack 3
- Read documentation on XML on the Microsoft Developer Network
- Read information on XML Technology from the World Wide Web Consortium
- Read about the SQL Server XML Bulk Load Object Model for SQLXML 4.0 and SQLXML 3.0
- Read up on Examples of Bulk Loading XML Documents (the article is for SQLXML version 3, but you can find additional information on the SQLXML version on your SQL Server)
- Read how to bulk import files for SQL Server 2005 and SQL Server 2008 and
- Refer to the SQLXML 4.0 Programming Concepts if using SQL Server 2005 or 2008
- Read more XML tips on http://www.MSSQLTIPS.com!
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2010-03-05