Muller Jannie

About myself and everyone else.

Deleting from big tables.

Let’s try this backwards.

The moral of the story is testing and more testing. If it takes you a day to dev. It’s going to take at least twice as much time and probably more effort to test it properly.

The problem that I want to address, I’ve got a table with a billion rows, they are not too wide and the table is around 250gb.

The problem – how do I effectively delete from this table without blocking \ locking.

Here is how I would do it.

The first thing that comes to mind is batches, a single update simply won’t do. (Unless of course you have a transaction log greater than 250gb. Then sure you can go for it. It might actually be interesting to find out how long that will run especially in a full recovery model.

For the rest of us who don’t have transaction logs (or static systems) we need to delete as much as possible while maintaining operations.

The magic batch number.

Pre SQL 2005 the delete top (xxx) wasn’t available. You could still do batches with some clever joins etc. but for this example we are running on 2005+ so we can do delete top(xxx).

delete top (xxx) from table .

How do I calculate the x, well… I get the number of rows on a page. You can do this be grabbing the clustered index and then doing a dbcc page(dbid,file,obj) I think that’s correct. You can then summize from the output how many rows are on the page. Then I take this number and add a multiplier to it.

So

512 rows per page.
To start my multiplier is 1.

declare @multiplier int
set @multiplier = 1
declare @rows int
set @rows = 512 * @multiplier

delete top (@rows) from table.

Now, on the above code I’m missing a crucial hint, or hints. I prefer a rowlock or paglock.
The difference between the two, well one locks rows and other a page. It can be likened to eating grapes from a bunch.
You can pick the grapes of one by one and put it in your mouth, the grapes being rows … or you could pick the grapes from the bunch until your hand is full and then put it in your mouth. Basically if you were sharing the grapes picking them of one by one would improve concurrency for the other person also eating grapes. However, if you were grabbing a hand full. you might struggle to spot some grapes since the other person might have intent to take it 🙂

Beside for the fact that I’m actually in the mood for some grapes, I don’t think I should use any food analogies. I can’t concentrate when thinking of food…

Anyhow.

declare @multiplier int
set @multiplier = 1
declare @rows int
set @rows = 512 * @multiplier

delete top (@rows) from table with (paglock).

Now the fun starts, delete some rows and get the timings. You can toggle the multiplier and it will most likely increase speed up to a point and then decrease it again. You want to find the sweet spot. While doing this keep in mind caching etc.

Once you have the optimal amount i.e. multiplier. You can added you where clause if you have a condition to delete. Such as a deleted flag or a date or similar.

You can always keep an eye on sys.dm_tran_locks to view the locks acquired by your process. You can also use sp_who2 active or other dmv’s to see any contention.

Don’t be in a rush.

Deleting data in this way is never a good idea to get space out of your DBA if you are low on disk. Just because the delete operation will make transaction logs grow.
Your aim is to delete without contention. If the table haven’t been purged in two years, don’t feel obligated to get it done today.

Create a job that removes more rows per day than new rows. As long as you remove more you should be fine.

Obviously the code about will run once. You need a loop.

If it’s a once off you can use the condition for your loop
while @@rowcount > 0

If you delete on a date range and you don’t want ever new milisecond that turns over to trigger you loop specify this.
It means that your job will run and if there are more than 100 rows than it will continue looping until you get to that negative growth point.
while @@rowcount > 100.

If you don’t want to use rowcount because of the
1000 row(s) affected output, then build in your own rowcount. You can do this since you have a batch (@rows).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Information

This entry was posted on February 16, 2012 by in SQL Server and tagged .
%d bloggers like this: