Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Performance issue upon deleting large data

Author  Topic 

MIK_2008
Master Smack Fu Yak Hacker

1054 Posts

Posted - 2013-11-07 : 09:36:03
Folks, some info about the table

1) no of records 192 million
2) records needs to be removed are around 132 million
3) all fields are having non clustered indexes, individually
4) there's a clustered index on all columns of that table
DB is in simple recovery model.

The delete script I've written is

CREATE Procedure Xyz
AS

Declare @rowCount int
Set @rowCount=1
While (@rowcount>1)
Begin
DELETE top 10000
FROM TableName
WHERE Col1 = 'CharacterValue'
AND Col2 < DateAdd(dd,-30,getdate())

SET @RowCount=@@RowCount
END


Initially I was handling the while loop a bit different but I have tuned it like this.

It was taking few seconds to complete a batch .. I wanted to make it faster. So I disabled an index (not knowing that it was clustered index), turned the whole table inaccessable. Re-enabled it and updated the statistics. As a result, the execution plan changed from index Seek to Index scan, and started taking longer time for one loop to complete. I thought to make an other non clustered index just for the columns used in where clause, created it and recreated/recompile the stored procedure to see the affect, it turned out to be more slower - "select top 1" is now taking minutes to complete.
Any thought as why this is happening?

Secondly, I am thinking to drop all the indexes, delete the required data, and then recreate them again. However I am feeling that since the table is too big and might not affect since the query will then be scanning the whole table everytime for each chunk.

I would appreciate your suggestions/thoughts on how to handle it?

Thank you!

Cheers
MIK

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2013-11-07 : 11:24:41
Because you are deleting about 70% of the data - One suggestion is to select just the data you want into a new table, then drop the old table, then rename the new table back to what the original name.

Be One with the Optimizer
TG
Go to Top of Page

MIK_2008
Master Smack Fu Yak Hacker

1054 Posts

Posted - 2013-11-07 : 12:39:32
thanks TG, that's exactly what I proposed too but don't know why they are stick to the deletions. Though I believe my script is fine but I am just confused as why the "Select top 1 * from tableName where Col='X' and Col2>DateAdd(dd,-30,getdate()) is taking so long to execute. I just restored the database again with another name and executing the same query on both DB copies, showing clear difference, the one I restored executes it within a second ... whereas the older one, where I played with Indexes (removed some records, disabled clustered index, rebuild the clustered index, and updated the statistics) is taking 22 seconds to pull top 1 record.

Cheers
MIK
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2013-11-07 : 13:14:48
A bit off topic, but if you have access to drop clustered indexes, you should be able to do whatever you want. I'm not sure who "they" are, but give them the query and let it rip.

As far as the difference's between two databases, well you'll have to dig in to see if there are any real differences (file groups, indexes, etc..). Maybe one has gotten it's statistics out of wack. Maybe the hardware is different. Maybe one server is much more busy than the other. Hard to tell without digging into it.
Go to Top of Page

MIK_2008
Master Smack Fu Yak Hacker

1054 Posts

Posted - 2013-11-07 : 14:01:42
quote:
Originally posted by Lamprey

A bit off topic, but if you have access to drop clustered indexes, you should be able to do whatever you want. I'm not sure who "they" are, but give them the query and let it rip.

I've access to alter/create/drop/dml operations on the database only. 'they" = My boss and the other team heads :). Yes, ultimately I would be giving my query :)

quote:

As far as the difference's between two databases, well you'll have to dig in to see if there are any real differences (file groups, indexes, etc..). Maybe one has gotten it's statistics out of wack. Maybe the hardware is different. Maybe one server is much more busy than the other. Hard to tell without digging into it.


Its the same backup copy restored twice with two different names on same server.
Firs time when it was restored (database A.), I played with indexes by disabling an index (which I came to know that it was clustered one), which locked the whole table for any select/dml operation, so I rebuilt it and updated statistics. But then, the queries started executing very slowly, even the case of select top 1 * .....
The purpose for disabling indexes was to reduce the execution time, since the plan was showing that 95% of cost was for the "Cluster Index Delete" operation.


Then I asked our DBA to restore another copy on same machine with a new name. If I execute same queries on the database B they are executing much faster than the database A.

I am just wondering this slowness is only because of the Index rebuild and Update Statistics?

Cheers
MIK
Go to Top of Page
   

- Advertisement -