By: Diana Moldovan | Updated: 2015-06-26 | Comments | Related: > PowerShell
Problem
A few months ago I had to fix a SQL Server collation issue for a database, so that the application could access the database. This involved the change of the collation for about 4000 columns for 700 tables. I used a PowerShell script to create the "alter column" and "drop / create index" T-SQL statements for each table. Working on the script, I noticed that SMO is very slow when having to iterate through a considerable number of database objects. The database I worked with is not that "big" (its size is around 64 GB), but it hosts over 1000 tables. Eventually the solution I've chosen was a mix between T-SQL (I've obtained the table metadata from the Information Schema Views) and PowerShell. However a question remained to be answered - is there a way to speed up SMO? Yes, there is - fortunately.
Solution
In order to work with database objects, SMO needs to retrieve the corresponding metadata from the current instance of SQL Server. You can see this if you start a SQL Trace or an Extended Events session while running a SMO based application.
Let's run a simple PowerShell script which uses SMO to retrieve the name and collation for a few columns in a table and see what's happening behind the scenes. I've run these examples on a SQL Server 2012 SP2 x64 instance.
Create a Database for Testing
First of all I created a simple test environment - a database with only one table containing varchar columns of different collations.
CREATE DATABASE [TestSMO] CONTAINMENT = NONE ON PRIMARY ( NAME = N'TestSMO', FILENAME = N'C:\SQLDataLocation\TestSMO.mdf' , SIZE = 4096KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB ) LOG ON ( NAME = N'TestSMO_log', FILENAME = N'C:\SQLLogLocation\TestSMO_log.ldf' , SIZE = 1024KB , MAXSIZE = 2048GB , FILEGROWTH = 10%) GO USE TestSMO GO CREATE TABLE [dbo].[Table_1]( [id] [int] NOT NULL, [CharColumn_1] [varchar](50) NULL, [CharColumn_2] [varchar](50) NULL, CONSTRAINT [PK_Table_1] PRIMARY KEY CLUSTERED ( [id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO --I'm changing the collation of this column to something else than the default database / server collation ALTER TABLE Table_1 ALTER COLUMN CharColumn_1 varchar(50) COLLATE SQL_Latin1_General_CP850_BIN GO
Retrieve Data With SMO Using Default Settings
The PowerShell script below will list all the columns having a collation different from the database default collation.
try{ $ErrorActionPreference = "Stop" [System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SqlServer.SMO') | out-null #I'm using the the "stopwatch" to display how much time needs this script to run $watch = [system.diagnostics.stopwatch]::StartNew() $server = new-object ('Microsoft.SqlServer.Management.Smo.Server') 'YOUR_SERVER_NAME' $db = $server.Databases["TestSMO"] $db_collation = $db.Collation $tables = $db.Tables foreach($table in $tables){ foreach ($column in $table.Columns){ if($column.Collation -and $column.Collation -ne $db_collation){ Write-Host $table.Name - $column.Name - $Column.Collation } } } Write-Host $watch.Elapsed.Hours : $watch.Elapsed.Minutes : $watch.Elapsed.Seconds } catch{ Write-Host -ForegroundColor Magenta $Error[0].Exception while($Error[0].InnerException){ Write-Host $Error[0].InnerException.Message } }
I've reproduced this test in my environment using both a trace which records "Stored Procedures ---> RPC:Starting" and "TSQL ---> SQL:BatchStarting" events and an Extended Events "TSQL detail tracking" session (almost nothing else than this test was running on my development machine). Regardless of the method I used, I noticed that there was a lot of T-SQL going on to retrieve information about the SQL Server instance, master database, TestSMO database, Table_1 and its columns.
For each of the Table_1 columns there were 3 queries which retrieved metadata. For example, the query below will retrieve the metadata for the Table_1.ID column. We only need the column name and collation to run the PowerShell script, but SMO retrieves much more information:
exec sp_executesql N'SELECT clmns.name AS [Name], clmns.column_id AS [ID], clmns.is_nullable AS [Nullable], clmns.is_computed AS [Computed], CAST(ISNULL(cik.index_column_id, 0) AS bit) AS [InPrimaryKey], clmns.is_ansi_padded AS [AnsiPaddingStatus], CAST(clmns.is_rowguidcol AS bit) AS [RowGuidCol], CAST(ISNULL(COLUMNPROPERTY(clmns.object_id, clmns.name, N''IsDeterministic''),0) AS bit) AS [IsDeterministic], CAST(ISNULL(COLUMNPROPERTY(clmns.object_id, clmns.name, N''IsPrecise''),0) AS bit) AS [IsPrecise], CAST(ISNULL(cc.is_persisted, 0) AS bit) AS [IsPersisted], ISNULL(clmns.collation_name, N'''') AS [Collation], CAST(ISNULL((select TOP 1 1 from sys.foreign_key_columns AS colfk where colfk.parent_column_id = clmns.column_id and colfk.parent_object_id = clmns.object_id), 0) AS bit) AS [IsForeignKey], clmns.is_identity AS [Identity], CAST(ISNULL(ic.seed_value,0) AS bigint) AS [IdentitySeed], CAST(ISNULL(ic.increment_value,0) AS bigint) AS [IdentityIncrement], (case when clmns.default_object_id = 0 then N'''' when d.parent_object_id > 0 then N'''' else d.name end) AS [Default], (case when clmns.default_object_id = 0 then N'''' when d.parent_object_id > 0 then N'''' else schema_name(d.schema_id) end) AS [DefaultSchema], (case when clmns.rule_object_id = 0 then N'''' else r.name end) AS [Rule], (case when clmns.rule_object_id = 0 then N'''' else schema_name(r.schema_id) end) AS [RuleSchema], ISNULL(ic.is_not_for_replication, 0) AS [NotForReplication], CAST(COLUMNPROPERTY(clmns.object_id, clmns.name, N''IsFulltextIndexed'') AS bit) AS [IsFullTextIndexed], CAST(COLUMNPROPERTY(clmns.object_id, clmns.name, N''StatisticalSemantics'') AS int) AS [StatisticalSemantics], CAST(clmns.is_filestream AS bit) AS [IsFileStream], CAST(clmns.is_sparse AS bit) AS [IsSparse], CAST(clmns.is_column_set AS bit) AS [IsColumnSet], usrt.name AS [DataType], s1clmns.name AS [DataTypeSchema], ISNULL(baset.name, N'''') AS [SystemType], CAST(CASE WHEN baset.name IN (N''nchar'', N''nvarchar'') AND clmns.max_length <> -1 THEN clmns.max_length/2 ELSE clmns.max_length END AS int) AS [Length], CAST(clmns.precision AS int) AS [NumericPrecision], CAST(clmns.scale AS int) AS [NumericScale], ISNULL(xscclmns.name, N'''') AS [XmlSchemaNamespace], ISNULL(s2clmns.name, N'''') AS [XmlSchemaNamespaceSchema], ISNULL( (case clmns.is_xml_document when 1 then 2 else 1 end), 0) AS [XmlDocumentConstraint], CASE WHEN usrt.is_table_type = 1 THEN N''structured'' ELSE N'''' END AS [UserType] FROM sys.tables AS tbl INNER JOIN sys.all_columns AS clmns ON clmns.object_id=tbl.object_id LEFT OUTER JOIN sys.indexes AS ik ON ik.object_id = clmns.object_id and 1=ik.is_primary_key LEFT OUTER JOIN sys.index_columns AS cik ON cik.index_id = ik.index_id and cik.column_id = clmns.column_id and cik.object_id = clmns.object_id and 0 = cik.is_included_column LEFT OUTER JOIN sys.computed_columns AS cc ON cc.object_id = clmns.object_id and cc.column_id = clmns.column_id LEFT OUTER JOIN sys.identity_columns AS ic ON ic.object_id = clmns.object_id and ic.column_id = clmns.column_id LEFT OUTER JOIN sys.objects AS d ON d.object_id = clmns.default_object_id LEFT OUTER JOIN sys.objects AS r ON r.object_id = clmns.rule_object_id LEFT OUTER JOIN sys.types AS usrt ON usrt.user_type_id = clmns.user_type_id LEFT OUTER JOIN sys.schemas AS s1clmns ON s1clmns.schema_id = usrt.schema_id LEFT OUTER JOIN sys.types AS baset ON (baset.user_type_id = clmns.system_type_id and baset.user_type_id = baset.system_type_id) or ((baset.system_type_id = clmns.system_type_id) and (baset.user_type_id = clmns.user_type_id) and (baset.is_user_defined = 0) and (baset.is_assembly_type = 1)) LEFT OUTER JOIN sys.xml_schema_collections AS xscclmns ON xscclmns.xml_collection_id = clmns.xml_collection_id LEFT OUTER JOIN sys.schemas AS s2clmns ON s2clmns.schema_id = xscclmns.schema_id WHERE (clmns.name=@_msparam_0)and((tbl.name=@_msparam_1 and SCHEMA_NAME(tbl.schema_id)=@_msparam_2))',N'@_msparam_0 nvarchar(4000),@_msparam_1 nvarchar(4000),@_msparam_2 nvarchar(4000)',@_msparam_0=N'id',@_msparam_1=N'Table_1',@_msparam_2=N'dbo'
SMO will run such a query for every column in every user table. This example involves a database with only one table, but you can imagine how this trace would look for a database with hundreds or thousands of tables.
Reduce Amount of Data Retrieved with SMO
Fortunately there is a simple way to reduce the quantity of T-SQL that SMO needs to do its job. You'll just need to specify the default initial fields for the column object - thus you "tell" SMO to retrieve only the metadata it needs.
By default, the only "default initial field" for the Column object is Name. Run the below script to see this:
[System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SqlServer.SMO') | out-null $server = new-object ('Microsoft.SqlServer.Management.Smo.Server') 'YOUR_SERVER_NAME' $server.GetDefaultInitFields([Microsoft.Sqlserver.Management.Smo.Column])
Let's add the Collation property to the set of "default initial fields". This script is the same as above, but I added the lines in red and ran it again with a SQL trace.
try{
$ErrorActionPreference = "Stop"
[System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SqlServer.SMO') | out-null
#I'm using the the "stopwatch" to display how much time needs this script to run
$watch = [system.diagnostics.stopwatch]::StartNew()
$server = new-object ('Microsoft.SqlServer.Management.Smo.Server') 'YOUR_SERVER_NAME'
#setting up the default initial fields
$initfields = $server.GetDefaultInitFields([Microsoft.Sqlserver.Management.Smo.Column])
$initfields.Add("Collation")
$server.SetDefaultInitFields([Microsoft.SqlServer.Management.Smo.Column], $initfields)
$db = $server.Databases["TestSMO"]
$db_collation = $db.Collation
$tables = $db.Tables
foreach($table in $tables){
foreach ($column in $table.Columns){
if($column.Collation -and $column.Collation -ne $db_collation){
Write-Host $table.Name - $column.Name - $Column.Collation
}
}
}
Write-Host $watch.Elapsed.Hours : $watch.Elapsed.Minutes : $watch.Elapsed.Seconds
}
catch{
Write-Host -ForegroundColor Magenta $Error[0].Exception
while($Error[0].InnerException){
Write-Host $Error[0].InnerException.Message
}
}
This time I did not find the 3 "huge" queries that retrieve the column metadata as we saw above. Instead SMO uses only a simple query that gets the "default initial" properties - the name and the collation of the column.
exec sp_executesql N'SELECT clmns.column_id AS [ID], clmns.name AS [Name], ISNULL(clmns.collation_name, N'''') AS [Collation] FROM sys.tables AS tbl INNER JOIN sys.all_columns AS clmns ON clmns.object_id=tbl.object_id WHERE (tbl.name=@_msparam_0 and SCHEMA_NAME(tbl.schema_id)=@_msparam_1) ORDER BY [ID] ASC',N'@_msparam_0 nvarchar(4000),@_msparam_1 nvarchar(4000)',@_msparam_0=N'Table_1',@_msparam_1=N'dbo'
Consequently, the script should take much less time to run. If the number of user tables in the database is significant, the difference will be significant. For example, on my development machine (Windows 8.1 Pro x64 INTEL I5 1.7Ghz 2 cores, 4GB RAM) for a database with ~100,000 tables, this version takes about 5 minutes to run. If I omit to set the "default initial fields", the script will "crawl" and takes about hour and a half - yes, the difference is that big.
Retrieving Additional Columns
Above I added to the "initfields" the "Collation" property of the SMO Column class. SMO queried sys.tables and sys.all_columns and retrieved the corresponding sys.all_columns.collation_name value. However, there isn't a "one to one" relationship between the Column class properties and the data from the system catalogs. For example, I've tried to add to the "initfields" set the Parent property of the column. For a column, the "parent" is the table it belongs to.
[System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SqlServer.SMO') | out-null $server = new-object ('Microsoft.SqlServer.Management.Smo.Server') 'TOSHIBA' $initfields = $server.GetDefaultInitFields([Microsoft.Sqlserver.Management.Smo.Column]) $initfields.Add("Parent") $server.SetDefaultInitFields([Microsoft.SqlServer.Management.Smo.Column], $initfields) $db = $server.Databases["TestSMO"] foreach($table in $db.Tables){ foreach($col in $table.Columns){ Write-Host $col.Parent - $col.Name } }
This script returns an error message that says that the Parent property is "unknown":
The following exception occurred while trying to enumerate the collection: "unknown property Parent". At C:\Notes\MSSQLTips\Powershell\SMO_behind_the_scenes\parent_unknown_property.ps1:9 char:21 + foreach($col in $table.Columns){ + CategoryInfo : NotSpecified: (:) [], ExtendedTypeSystemException + FullyQualifiedErrorId : ExceptionInGetEnumerator
Indeed there is no mention of the column "parent" in sys.all_columns and the table name is not returned as the column "parent" by any query.
Some Items Still Return More Data
When I added DataType to the "initfields", the system catalogs query returned data not only about the column type, but also about their scale and precision:
exec sp_executesql N'SELECT clmns.column_id AS [ID], clmns.name AS [Name], usrt.name AS [DataType], ISNULL(baset.name, N'''') AS [SystemType], CAST(CASE WHEN baset.name IN (N''nchar'', N''nvarchar'') AND clmns.max_length <> -1 THEN clmns.max_length/2 ELSE clmns.max_length END AS int) AS [Length], CAST(clmns.precision AS int) AS [NumericPrecision], CAST(clmns.scale AS int) AS [NumericScale], ISNULL(xscclmns.name, N'''') AS [XmlSchemaNamespace], ISNULL(s2clmns.name, N'''') AS [XmlSchemaNamespaceSchema], ISNULL( (case clmns.is_xml_document when 1 then 2 else 1 end), 0) AS [XmlDocumentConstraint], s1clmns.name AS [DataTypeSchema] FROM sys.tables AS tbl INNER JOIN sys.all_columns AS clmns ON clmns.object_id=tbl.object_id LEFT OUTER JOIN sys.types AS usrt ON usrt.user_type_id = clmns.user_type_id LEFT OUTER JOIN sys.types AS baset ON (baset.user_type_id = clmns.system_type_id and baset.user_type_id = baset.system_type_id) or ((baset.system_type_id = clmns.system_type_id) and (baset.user_type_id = clmns.user_type_id) and (baset.is_user_defined = 0) and (baset.is_assembly_type = 1)) LEFT OUTER JOIN sys.xml_schema_collections AS xscclmns ON xscclmns.xml_collection_id = clmns.xml_collection_id LEFT OUTER JOIN sys.schemas AS s2clmns ON s2clmns.schema_id = xscclmns.schema_id LEFT OUTER JOIN sys.schemas AS s1clmns ON s1clmns.schema_id = usrt.schema_id WHERE (tbl.name=@_msparam_0 and SCHEMA_NAME(tbl.schema_id)=@_msparam_1) ORDER BY [ID] ASC',N'@_msparam_0 nvarchar(4000),@_msparam_1 nvarchar(4000)',@_msparam_0=N'Table_3',@_msparam_1=N'dbo'
However the Column class does not feature any property related to the scale or precision of a numeric column.
Next Steps
- I've re-written the script I described in my previous article adding Collation and DataType as "default initial fields" for the Microsoft.SqlServer.Management.Smo.Column object. This time I'm not using TSQL to gather information about the database objects. I've tested the script in my environment (Windows 8.1 Pro x64 INTEL I5 1.7Ghz 2 cores, 4GB RAM) against a database with around 100,000 tables and it took about 25 minutes to run. I'd still stick with the SMO and Invoke-Sqlcmd version, which is even faster (~10 minutes) and uses less memory. You can download this version here. Setting the "default initial fields" makes the script functional (even if not super-fast) when dealing with a database with many objects (tables).
- I also tried a version which does not set the Column object "default initial fields", and after 2 hours there were no results and 98% of my 4GB of memory was in use.
- Try to run the 3 versions (SMO + Invoke-Sqlcmd, SMO + "default init fields", SMO) against a database having at least a few hundreds of tables, if possible, to notice any significant duration differences.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2015-06-26