SQL Server Database Partitioning Myths and Truths

By:   |   Updated: 2010-01-04   |   Comments (36)   |   Related: 1 | 2 | 3 | 4 | 5 | 6 | > Partitioning


Problem

Partitioning was introduced with SQL 2005 Enterprise Edition, is a godsend for working with very large tables. It provides the means to effectively manage and scale your data at a time when tables are growing exponentially, but maintenance windows are either shrinking or non-existent as with many 24/7 online environments.  In this tip I will cover some data partitioning myths and truths you should know about.

Solution

Having worked with large partitioned databases for a few years, I'd like to share some of my lessons learned. Its important to understand what partitioning is and also what it is not.

SQL Server Table and Index Partitioning

Table and index partitioning was an Enterprise only feature in SQL Server 2005 and remained so until SQL Server 2016 prior to SP1. The headaches, maintenance outages, weekend work et al associated with maintaining Very Large Databases can often be eliminated or minimized with a well thought out partitioning scheme.

Note: All the Enterprise features are also in Developer edition, so installing Developer edition on your workstation is a good way to test and play with Enterprise features. Often features that are Enterprise now could be in Standard down the road so you need to keep up to speed as a SQL Server professional.

You can't restore a database with partitioning to an instance where partitioning is not supported. Important to bear this in mind.  Any partition functions and schemes would need to be removed prior to backing up the database and restoring it to an edition that supports partitioning.

Processed 208 pages for database 'myDB', file 'myDB' on file 1.
Processed 4 pages for database 'myDB', file 'myDB_log' on file 1.
 
Msg 3167, Level 16, State 1, Line 1
RESTORE could not start database 'myDB'.
Msg 3013, Level 16, State 1, Line 1
 
RESTORE DATABASE is terminating abnormally.
Msg 905, Level 21, State 1, Line 1
Database 'myDB' cannot be started in this edition of SQL Server because it contains a partition function 'myRangePF1'. 
Only Enterprise edition of SQL Server supports partitioning.

Partitioned Views offer similar properties to partitioned tables, but do not require the partitioning feature in SQL Server. Partitioned views was a surprisingly effective but complicated way of 'partitioning' data in SQL 2000 and still works just as well in SQL 2005 and later.

Myth 1: Partitioning is a "Scale-Out" solution

Partitions cannot span servers or instances. Partitions have to be in the same instance and in the same database. Partitioning therefore is a scale-up solution. A scale-out solution for SQL Server can be implemented through distributed partitioned views hosted on Federated Database Servers. This is an advanced solution that I have yet to encounter in my career (I would like to hear from anyone who has actually implemented this in a production environment).

Myth 2: Partitions must be created on different filegroups

The partition scheme definition defines on which filegroup a partition resides. It's a common misconception that you have to spread your partitions out among multiple filegroups. Most code examples seem to use multiple filegroups and I think this is where this myth stems from. The only reason, in my opinion, that you would want multiple filegroups is if those filegroups reside on physically separate drives and you were doing this to improve your I/O for large range queries. I have not seen any performance benefit to having multiple filegroups located on the same drive.

Only split your partitions among multiple filegroups if you have a compelling reason to do so. That reason should be increased performance or manageability.

Myth 3: To partition a non-partitioned table you will need to drop and recreate it

Not true. You can partition an existing table by creating a clustered index (or rebuilding an existing clustered index) on your new Partition Scheme. This will effectively partition your data as the leaf level of a clustered index is essentially the data.  The example below creates a partitioned clustered index on a partition scheme PScheme_Day.

CREATE CLUSTERED INDEX idx ON tblPartitioned(SQLCreated) 
WITH DROP_EXISTING
ON PScheme_Day(SQLCreated)

Myth 4: Partitioning an existing table is a strictly offline operation

It's true that rebuilding or creating a clustered index is indeed an offline operation. Your table will not be available for querying during this operation. However, partitioning is an Enterprise feature, so we have the online index rebuild feature available to use. The ONLINE = ON option allows us to still query the table while under the covers the partitioning operation is going on. SQL Server does this by using an internal snapshot of the data. Obviously, there is a performance hit and I don't recommend you do this during a busy time but if you have a requirement for 24x7 availability then this is a possible solution.

CREATE CLUSTERED INDEX idx ON tblPartitioned(SQLCreated) 
WITH (DROP_EXISTING = ON, ONLINE = ON)
ON PScheme_Day(SQLCreated)

Myth 5: SWITCH'ing partitions OUT or IN in only a few seconds

You'll often read that that the reason partitioning operations like SWITCH IN and SWITCH OUT are so fast and efficient is that they are "metadata" only operations meaning that there is no actual data movement but only the pointers or internal references to the data get changed.

SWITCH'ing partitions OUT or IN is a truly "meta data" operation in that although the partitioned data has magically moved from the partitioned table to the SWITCH table or vice versa there hasn't actually been any movement of data on the disk or inside the data file. However, in my experience, I have found that with highly transactional partitioned tables the SWITCH operation can get blocked or cause blocking itself. This is due to the ALTER TABLE...SWITCH operation requiring a schema modify lock on both the source and target tables.

To get around this potential blocking issue I always set a timeout before any switch operation so that if the ALTER TABLE...SWITCH does start to cause blocking it will only be for as long as the timeout. I will then re-issue the ALTER TABLE...SWITCH statement repeatedly with the timeout until the operation goes through successfully. In this way the partition maintenance task will not interfere with the OLTP function of the database:

In the example below I set a 30 second timeout for the SWITCH to complete. If it timeouts I will repeat until it completes successfully.

SET LOCK_TIMEOUT 30000
ALTER TABLE myPartionedTable SWITCH PARTITION 2 TO SwitchTable
Msg 1222, Sev 16, State 56, Line 1 : Lock request time out period exceeded

Myth 6: Altering a partition function is a metadata only operation

In practice you may find out that a MERGE or SPLIT operation may take much longer than the few seconds expected. Altering a partition function is an offline operation and can also result in movement of data on disk and so become extremely resource intensive.

As long as the 2 partitions you are MERGEing together are empty then there will be no data movement and it will be a metadata only operation. If you have a sliding window and you are MERGEing the oldest 2 partitions together then a rule to follow is to always make sure both partitions are empty (SWITCHed OUT) before you MERGE. 

To avoid or minimize data movement when SPLITing a partition function always ensure that you know beforehand how many rows are in the underlying partition that is being SPLIT and how many rows would fall on each side of the new boundary. Armed with this information you should be able to determine beforehand the amount of data movement to expect depending on whether you have defined your partition function using RANGE RIGHT or LEFT.

Be aware that more than one table or index can reside on the same partition scheme or use the same partition function. When you run ALTER PARTITION FUNCTION it will affect all of these tables and indexes in a single transaction.

Myth 7: Partitioned tables improve query performance

If your query is written in such a way that it can read only the partitions it needs the data from then you will get partition elimination and therefore an equivalent performance improvement. If your query does not join or filter on the partition key then there will be no improvement in performance over an unpartitioned table i.e. no partition elimination. In fact, a query that hits a partitioned table has the potential to be even slower than an unpartitioned table even if both tables have the same index defined. This is due to the fact that each partition in a partitioned table is actually its own b-tree which means that a partitioned index seek will need to do one seek per partition as opposed to one seek per table for an unpartitioned index seek.

Both tables below have a nonclustered index defined on the commodity column. The index on table tblPartitioned has been created on the partition scheme to align it with the clustered index but as we can see from the below this causes each partition to be scanned because the partition key is actually on a different column. 

select count(*)
from tblPartitioned
where commodity = 'BOFRB'
 
select count(*)
from tblUnPartitioned
where commodity = 'BOFRB'
 
-----------
371
Table 'tblPartitioned'. Scan count 54, logical reads 109, physical reads 0, read-ahead reads 0, 
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 
-----------
371
Table 'tblUnPartitioned'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, 
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 

The graphical query plan clearly shows the difference in cost between the two queries. The top partitioned table query

graphical query plan

If we also filter on the partition key column we get the desired performance improvement over the equivalent query on the unpartitioned table. If you are partitioning tables in an existing database then its a good idea to test all your queries for performance. Don't assume that once the data is partitioned that your queries will improve. Queries may need to be updated to include the new partition key in the where clause or join to take advantage of the performance benefits of partitioning.

Myth 8: Partitioned tables ease maintenance of VLDBs and Very Large Tables

Partitioning can be effectively used to break up a very large table into multiple partitions based on a column or partitioning key. However, there can be significant management overhead in maintaining partitioned tables. The features of partitioned tables that give you the ease of management can also conspire against you to make your maintenance a nightmare!

Thinking of each partition as its own table will help you understand how best to approach your maintenance. Index rebuilds can be quicker because you can rebuild just those partitions that you identify as being fragmented. Conversely, full partitioned Index rebuilds will take longer as each partition is its own b-tree.

Note that old style DBCC SHOWCONTIG command does not work for partitioned indexes. You need to familiarize yourself with the more powerful DMV sys.dm_db_index_physical_stats. This DMV returns fragmentation information at a partition level and provides you the necessary information to make a decision on which partitions require rebuilding or reorganizing.

-- to return fragmentation information on partitioned indexes
SELECT
 object_name(a.object_id) AS object_name,
 a.index_id,
 b.name,
 b.type_desc,
 a.partition_number,
 a.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, 'LIMITED') a
JOIN sys.indexes b on a.object_id = b.object_id and a.index_id = b.index_id
order by object_name(a.object_id), a.index_id, b.name, b.type_desc, a.partition_number
 
fragmentation

Once fragmentation has been identified we can rebuild the index statement for just that single partition:

--Rebuild only partition 11.
ALTER INDEX IX_alert_events_timestamp
ON dbo.alert_events
REBUILD Partition = 11;
GO

If you're partitioning your data on a date field and you have a sliding window and you are only ever adding data to the right most partitions then you can save unnecessary overhead by not checking for fragmentation of older partitions where you already know they are not fragmented. sys.dm_db_index_physical_stats DMV allows you to specify a partition number. This greatly improves IO over full index scans and has the potential to shorten maintenance job durations.

Conclusion

Well designed database partitioning gives the DBA the opportunity to effectively work with massive amounts of data while at the same time providing the same level of performance as a small database. Having a sound partition management and maintenance plan will free up valuable DBA time to concentrate on more challenging problems and opportunities.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Alan Cranfield Alan Cranfield is a versatile SQL Server DBA with over 10 years experience managing critical systems in large distributed environments.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2010-01-04

Comments For This Article




Friday, March 3, 2023 - 3:11:25 PM - Duane Back To Top (90979)
I believe this statement is incorrect:

You can't restore a database with partitioning to an instance where partitioning is not supported. Important to bear this in mind. Any partition functions and schemes would need to be removed prior to backing up the database and restoring it to an edition that supports partitioning.

I believe the last sentence should end with "... an edition that does not support partitioning."

Tuesday, February 28, 2017 - 11:47:19 AM - vims Back To Top (46779)

 

Below scenirion how to create table parttion in SQL Server 2012 , We have datatime column  

 Partition the exiting table depend on day like Hot day <5 , warm day >5 and day <10 . cold day >=10 

I am awaiting your reply,

 


Thursday, April 21, 2016 - 12:42:14 AM - eric81 Back To Top (41290)

Great article, and you mentioned you mentioned unless you have multiple drives there is no need to create multiple filegroup for your partitioned column.  If discovered on my server all the LUNs share the same storage pool so no speration of drives.  In that case I would define my index under the PRIMARY file group?

 


Wednesday, March 2, 2016 - 12:28:50 PM - Ali Back To Top (40845)

 This is a great article. Took me some time to find information that could help me deciding partition key on a table.  Great work Alan.

 


Tuesday, November 3, 2015 - 6:42:30 PM - Alan Cranfield Back To Top (39013)

thanks Jeff!  I just reread and most still applies... some improvements with later versions around re-indexing individual partitions..  cheers


Tuesday, November 3, 2015 - 10:12:37 AM - Jeff Moden Back To Top (39010)

I know this article is a half decade old but wanted to say "Thanks for the great article, Alan".  Things like this are timeless.


Friday, May 29, 2015 - 12:03:30 PM - Chris Laffey Back To Top (37329)

Great post! This is very useful information, thank you!


Wednesday, October 16, 2013 - 5:53:05 AM - Juan Miguel Jimenez Back To Top (27165)

Thank you very much for this very helpful post, Alan.

In our enterprise we are starting using partitioning aproach and your experiences and the comments written by other user will be very interesting four us.


Thursday, March 28, 2013 - 12:03:03 PM - Alan Cranfield Back To Top (23060)

Nimesh

I would suggest creating a whole bunch of future hourly partitions and not running your window slide jobs so often.  Why do you need to move the partition window every hour?  would daily or weekly also work? Then you could do the sliding window move during a maintenance window or low traffic time. 


Wednesday, March 27, 2013 - 12:00:25 PM - Nimesh Back To Top (23034)

I've a sliding window approach implemented in production and this job runs on top of hour and slides partition by an hour. This job acquires a Schema modified lock correctly.

However, there is a .net process using bulk insert on these partitioned tables and gave following error:-Insert bulk failed due to a schema change of the target table

The error happened as the sliding partition job was running arround the same time. However, I assumed that SQL Server will handle this internally and make bulk insert  wait untill the partition job is over vs throwing an error.

Any inputs are appriciated.


 


Monday, September 3, 2012 - 4:29:30 PM - andrew hopkinson Back To Top (19377)

I have to disagree about multiple filegroups on the same disk, as long as the number is not too absurd, there are still advantages so subdividing your partitions by filegroup, for instance it makes it easier to remove the oldest partition by simply removing the filegroup, also it leaves you the option of moving to different drives in the future if necessary, makes it easier to visually see how your data is divided, easier for backups, restore specific partitions to other servers etc.

 

 


Tuesday, May 29, 2012 - 2:08:04 AM - Tejasvi Back To Top (17689)

Hey Thanks,

Certainly cleared by doubts.


Monday, May 28, 2012 - 1:26:52 AM - Tharindu Dhaneenja Back To Top (17678)

Hi alan, this is very helpful post.What are Disadvantages having table Partition? is Insert,Update and delete will slow?
Thanks
Tharindu Dhaneenja


Friday, May 25, 2012 - 4:14:45 AM - Just Back To Top (17657)

Great job! This clarify some doubts I had.

I have this table currently clustered by it's own key. It's growing big and have most of the acceses by another field (a foreign key to another table) We were thinking in to soluctions:

1) Change the current clustered PK for a nonclustered one. Add a new clustered index on the FK field. Assign specific filegroups to the table: one for the data, one for the indexes. Add just one file for each phisical unit and assign them to the filegroups. We hope SQL Server to split the table somehow equaly (with time) in the files and so the load.

2) Change the clustered index to the FK and partition the table by this key.  If most of the queries are including the key index, we shoul see and improvement. But: what happens with the PK? (We can´t include the partitioning key in every PK/UQ key? Will the PK nonclustered index be not aligned? Is it so terrible as I've read?

What would you recomend for this case?

Thanks in advance and keep on with the great job!


Monday, May 21, 2012 - 9:50:09 PM - Alan Cranfield Back To Top (17577)

The underlying partitioning on a table is transparent to your application. You dont need to test the application. What you do need to test maintenanc scripts that are using ALTER the partition schemes and functions, if any..SWITCH IN/OU etc...  Also, you may want to test that your SQL queries and SPs are using the partitioninh key in WHERE clauses and joins to take advantage of partition elinination.

Examine your original reason for using partitioning and then see if you have achieved what you set out by implementing this Enterprise feature.


Friday, May 18, 2012 - 2:45:15 AM - tejasvi Back To Top (17519)

Hey hi,

I have appilcation which has on SQL Enterprise database. If I have partitioned few tables in my databse, will there be any effects of these partitioned tables on my running application? What all things I will need to test from the application point of view, if I have partitioned some table underlying it?


Tuesday, February 21, 2012 - 8:45:20 AM - James Back To Top (16111)

Oh shoot...  Sorry in my previous comment, I meant to say "Another thing to add is that UPDATE STATISTICS operations can NOT be split by partition in partitioned tables."


Tuesday, February 21, 2012 - 8:44:17 AM - James Back To Top (16110)

Another thing to add is that UPDATE STATISTICS operations can be split by partition in partitioned tables.  The only workaround is to implement partitioned views.  I have encountered this and I wish I knew about this prior to implementing partitioning.


Tuesday, November 8, 2011 - 7:56:14 AM - arjun Back To Top (15046)

hi pinal,

partition column and UNIQUE clustered index key column is different. that time i created partioned for table. but table–>properties–>storage doesn’t show partition. but if i use select * from sys.partitions where object_name(object_id)=’tablename’. it shows partioning except clustered index..what is the difference?..is this correct way to partition?

advance THANKS for you help


Wednesday, June 1, 2011 - 1:44:12 AM - Alan Cranfield Back To Top (13943)

thanks for your interest....We have a monthly automated tape backup restore drill. The backup team restores a selection of database dumps from tape to a test server and we do the restores and verifies.   We also 'verify' all our db backups staright after the backups.  

We are paranoid so in addition to backups we also use db mirroring onsite and log shipping offsite...


Tuesday, May 31, 2011 - 12:15:09 PM - Feodor Georgiev Back To Top (13939)

Alan,

I am sincerely sorry that you have not succeeded in creating a way to avoid the non-clustered index pages during backup. 

 

You are right about not backing up the replicated data and about 'dumping' some of the old data to differernt storage and different formats. 

 

Today I read this amazing story on Paul Randal's blog about a failed backup and successful restore (http://sqlskills.com/BLOGS/PAUL/post/Unbelievable-tale-of-disaster-and-recovery.aspx). 

 

I know we are getting a bit off topic here, but I am curious - is your backup team testing the backups, and are they charging per MB of tested backup? (well, the second question is a joke, of course :) ) 

 

Regards,

Feodor


Tuesday, May 31, 2011 - 5:02:14 AM - Alan Back To Top (13935)

Hi

Sorry for the late reply.  Thanks for the info.  We've decided not to back up data that we can derive or put back together again from other data sources. For example we no longer back up replicas (replicated databases). 

We dont have a problem finding a good backup solution. We have a backup department who do all that stuff for us and merely charge us a fee/mbyte.

What we really need to do more aggresively is take offline the data that is no longer current or needed.  In the past I've used copy agents to move data data from expensive OLTP databases to less expensive, slower reporting databases. A second step is to BCP out the old data to flat files and restore only if needed...

Actually, once upon a time I was on the Quest Software SQL Server Customer Advisory Board and I told them that they needed to build into their LiteSpeed backup tool the ability to only backup the data and leave all the non-clustered indexes behind. they thought it was a great idea but nothing came of it....

Alan Cranfield


Wednesday, May 11, 2011 - 12:19:47 PM - Feodor Georgiev Back To Top (13804)

Cranfield,

you are correct that good design is essential for avoiding the maintenance overhead of the filegroups. I am far from the thought that someone will have 50 partitioned tables in one database with 10 partitions each on different filegroup each. (This actually sounds funny :) )

Here is a post I wrote a while back on how to treat your data. If you look at 'Steinar's graph' you will notice that the data can be categorized by 4 types: current (used or not used often) and archive (used or not used). This can be used as a general guideline for creating filegroups, do you agree? And also for making decisions about placing the filegroups on proper storage units.

There is a 5th type of data, though, and maybe I should think about integrating it in the 'Steinar's graph'. More specifically the 5th type of data is something you use, but you dont really need: non-clustered indexes.

I hope you get my point: you can save a lot on backup solutions by not backing up the non-clustered indexes; I am also very disapointed that SQL Server still does not engage an adequate solution for this, i.e. you cannot backup your database without the non-clustered indexes. (Well, technically you can, but then you will run in all kinds of problems).

So, out of curiosity - which is your biggest problem with finding a good backup solution:

  • storage price
  • maintenance price
  • volume of non-clustered indexes
  • else?

Feodor


Wednesday, May 11, 2011 - 10:29:07 AM - --cranfield Back To Top (13802)

Thanks for the comments on multiple filegroups.  Yes, you are correct - there are benefits to multiple filegroups. If correctly designed they can aid quick recovery and can also be used as a tool to improve backups time and keep down your tape backup costs.  But its a lot of overhead to manage these filegroups.   At the moment I'm looking at a solution to reduce our backup costs by putting some data into read_only filegroups. If consideration has not been given during the design phase then trying to implement this change once your database is in the terrabytes is a real pain.

--cranfield


Wednesday, May 11, 2011 - 8:24:22 AM - Feodor Back To Top (13799)

Alan, great job writing this article!

There is one thing I would like to add, though. You are saying that:

The only reason, in my opinion, that you would want multiple filegroups is if those filegroups reside on physically separate drives and you were doing this to improve your I/O for large range queries.

What I would like to add here is, that a very smart selection of filegroup strategy is essential to having the least amount of downtime.

In the Enterprise edition you also have a feature called ONLINE piecemeal restore, which allows you to restore a filegroup from backup, while the rest of the database is available to your users.

Here is an example: you have a table with 500 million rows, of which only 10% are used and the rest are never updated, just read. let's say you partition this table and set the partitions in the same filegroup. In case of a failure or data corruption - guess what - your entire filegroup will be unavailable. Your users will still be able to read and write, as long as the data is in cache, but if you would like to restore the filegroup, you wiould have to mark it as OFFLINE and then restore from backup (last full + differential + logs + tail of log + 'with reovery').

Obviously, if you have all eggs in the same basket and your filegroup has all 500 million rows, then it will take you significantly longer time to restore the filegroup as opposed to restoring only 50 million rows of important data and having the rest 450 million rows still available to the users.

/Feodor 


Friday, July 16, 2010 - 2:20:54 AM - ALZDBA Back To Top (5855)

Thank you for the feedback.

I'll give your alternative a try.


Thursday, July 15, 2010 - 10:50:21 PM - --cranfield Back To Top (5851)

[quote user="ALZDBA"]Would there be a performance (or any other) gain by giving each table its own PF/PS?[/quote]

 

No, the performance and managability gains come from having tables share the same PF/PS


Thursday, July 15, 2010 - 10:44:41 PM - --cranfield Back To Top (5850)

[quote user="ALZDBA"]Would it be enough to just select those for tables with tablokx ? (to avoid having to put the db in single-user mode and kick out everybody else)
[/quote]

Why doesn't your alter partition function succeed? does it timeout?   Instead of a try catch block and setting your DB to single-user try the following:

1. Put the  alter partition function step in a job step

2. set a timeout of 30 seconds (or whatever seconds is appropriate)

3. issue your  alter partition function statement

4. put a retry of 5 times on the alter partition function step above

This is what I have done in the past and it always works...  very rarely having to retry.  It eventually gets the schema lock it needs for the couple of seconds during the alter partition.

 e.g.

SET LOCK_TIMEOUT 30000

  -- Extend PARTITION SCHEME to accommodate new partition   
  ALTER PARTITION SCHEME MyPartScheme NEXT USED [PRIMARY]
 
  -- add new partition  
  ALTER PARTITION FUNCTION MyPartFunc() SPLIT RANGE (@partition_key)

 

 


Wednesday, July 14, 2010 - 1:22:44 PM - ruthst00 Back To Top (5825)

Alan,

With regard to a "metadata-only" trick to fool SQL Server into keeping all the data in partition 1, we spoke with a MS SQL Server expert and, just as we suspected, he said there are only 3 ways to partition an existing table that is clustered on PK (as mentioned in my previous post). So, we're sticking with CREATE CLUSTERED INDEX WITH DROP_EXISTING = ON which provides us the best performance.

With regard to your question about rebuilding the clustered index online... I just finished perf testing ONLINE = ON and am very pleased with the results: although the time it takes to rebuild the clustered and non-clustered indexes doubles, there is *no* impact to SELECT queries executed before, during, or after the indexes are rebuilt. After digging into BOL and other articles, I learned this is because SQL Server takes a snapshot of the table prior to the index rebuild, and routes queries to the source and/or target as it bulk-copies data into the new index. There is only a brief lock on the table before and after the rebuild to change meta data, swap the source with the target, etc.

The other lesson I learned during our perf testing is that when you specify ONLINE = ON and SORT_IN_TEMPDB = ON, make sure you have enough space for the entire clustered index in TempDb, and a copy of the clustered and non-clustered indexes on the drive where your data files live. So if you have a lot of tables to partition, then you can only do a few at a time otherwise you'll fill up the log and/or tempdb.

With regard to measuring potential data movement when adding partitions to an existing, partitioned table (via ALTER PARTITION FUNCTION...SPLIT), I learned you can use the $PARTITION function which will tell you the number of rows that will be inserted/deleted. The lesson that I learned here during our perf testing is to always add an empty partition to the end of a partitioned table to ensure a metadata-only operation and no data movement. Then all new rows will be inserted into this new partition.

On the the other hand, when I measured the impact of splitting an existing partition containing 60-days of data into two partitions each containing 30-days of data, SQL Server took a schema lock on the entire table and inserted/deleted 1.6m rows which took 2.5 minutes on a 4-socket, hex-core server with 50GB of memory assigned to our SQL Server 2008 instance. For us, that's too long to lock out our application which is used by hospitals and needs to be up 24/7. And those inserts/deletes are logged as one transaction so be careful of log growth...

Thanks for your help,

Steve


Wednesday, July 14, 2010 - 7:24:02 AM - ALZDBA Back To Top (5823)

 Did you perform tests regarding multiple tables using the same PF/PS or having a single PF/PS for each table?

I'm designing a set of 4 tables which are partitioned on a datetime column and are mapped to a filegroup per month.
Those 4 tables have the same life cycle, so it seemed logical to me to have them use the same PS.

Every month we alter the PS to accept the next month of data.
in a try catch (t-sql) block we first try the alter partition function. If that doesn't succeed the catch block will jump in. In that catch block we change database to master, set the userdatabase to single-user, and the run the alter partition function sql once more. After that, the database is being set multi-user. 

Would it be enough to just select those for tables with tablokx ? (to avoid having to put the db in single-user mode and kick out everybody else)

Would there be a performance (or any other) gain by giving each table its own PF/PS?

 


Tuesday, June 8, 2010 - 4:36:35 AM - --cranfield Back To Top (5666)

Hi

 Whats the performance hit when you create the partitioned index online? At least the data will stay online for your OLTP system and you can still get the partitioning done.

Have you had a look at Kalen's article on partitioning and how to determine when there will be data movement with you ALTER a paritition? It may be relevant:

http://sqlblog.com/blogs/kalen_delaney/archive/2009/08/16/altering-a-partition-function.aspx

Let me dig around and see if I can find a definitive answer for you. Watch this space.


Friday, June 4, 2010 - 12:33:54 PM - ruthst00 Back To Top (5649)

Alan, great article. Thanks for posting. There aren't many articles out there from folks who have partitioned existing, large tables, so I appreciate your effort...

At my company, we are prototyping methods for partitioning existing tables in our OLTP database that are clustered on PK, and your recommendation to use CREATE UNIQUE CLUSTERED INDEX idx ... WITH (DROP_EXISTING =ON) confirms the results of our performance testing which suggests this technique is several times faster than using "DROP CONSTRAINT PK WITH MOVE TO..." (as recommended by the MS SQL Server 2008 partitioning white paper) or "DROP / ADD CONSTRAINT", which is the code the SQL Server 2008 "Partition Wizard" generates.

 Nevertheless, it takes at least 20 minutes for us to partition a 17GB table, even though we have created a partition function that results in all the data staying on partition 1, and no data in partition 2. So we are puzzled why SQL Server has to touch all (or most of) the leaf data pages of the clustered index. If they are already associated with partition 1, why all the data movement going on? Do you have any insights on this? Is there a "meta-data only" method for telling SQL Server that the existing clustered index should belong to partition 1 of the table?

Essentially, we are looking for ways to reduce/eliminate data movement while partitioning an existing table that is clustered on PK, primarily because this table resides in an OLTP database and we don't have 20 minutes to partition a single table.

Any ideas you or the others on this thread have is greatly appreciated.

Thanks!


Monday, January 4, 2010 - 3:41:59 PM - --cranfield Back To Top (4638)

Thank you for reading, Armando.


Monday, January 4, 2010 - 3:27:38 PM - --cranfield Back To Top (4637)

yes, I've encountered a couple installations now where a partition scheme has been distibuted across 40+ filegroups all on the same drive - crazy. I usually change them back to 1 or 2 filegroups only.

Most of these partitioned schemas have been designed with good intentions by developers but become very difficult to administer when they are moved into a production environment.

 

thank you


Monday, January 4, 2010 - 2:53:48 PM - SankarReddy Back To Top (4634)

Alan,

Good Job on laying out the details on these Myths. I see people asking for multiple file groups all the time on the forums when planing for Partitioning.


Monday, January 4, 2010 - 2:49:20 PM - aprato Back To Top (4633)

 Good stuff, Alan















get free sql tips
agree to terms