Deadlock during bulk delete - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

Transact-SQL (2008)

Deadlock during bulk delete

Author

Topic

Kristen
Test

22859 Posts

Posted - 2011-08-02 : 07:02:00

I have a table recording Sessions on a web site. Each new session adds records at the end. PK=Identity, clustered. Records are updated periodically during the user's session.

I need to delete a large number of records from the table. I think I have set this up to be "gentle", but I'm getting deadlocks after a number of iterations (the number varies). As this is a single table delete I'm surprised about that (although maybe checking the Foreign Key is adding locks and causing the deadlocks? There are 22,406 rows that have a value in the FKey column, out of 20,000,000 rows)

My thinking is that all the deletes are low-ID numbers, and new rows will be added with high IDs, so why are the deletes conflicting with and insert/update activity? There are no triggers on the table

Any thoughts would be appreciated


SELECT	[T_ID] = IDENTITY(int, 1, 1),
	[T_ThePK_ID] = ThePK_ID
INTO	#MyTempTable
FROM	dbo.MyTable WITH (NOLOCK)
WHERE	MyDateColumn < @MyStartDate -- 7 months ago

DECLARE	@intBatchStart int,
	@intBatchSize int,
	@intRowCount int,
	@dtStart datetime,
	@strMsg nvarchar(1000)

SELECT	@intBatchStart=MIN(T_ID), @intBatchSize=100, @intRowCount=1
FROM	#MyTempTable AS X
	JOIN dbo.MyTable AS D
		 ON D.ThePK_ID = X.ThePK_ID

WHILE @intRowCount >= 1
BEGIN
	SELECT	@dtStart = GetDate()
	DELETE D
	FROM	#MyTempTable AS X
		JOIN dbo.MyTable AS D
			 ON D.ThePK_ID = X.ThePK_ID
	WHERE	    T_ID BETWEEN @intBatchStart
		AND @intBatchStart + @intBatchSize
	SELECT	@intRowCount = @@ROWCOUNT,
		@intBatchStart = @intBatchStart + @intBatchSize
	SELECT	@strMsg = 'Loop:'
		+ CONVERT(varchar(20), @intBatchStart)
		+ ', Elapsed(ms)='
		+ CONVERT(varchar(20), DATEDIFF(Millisecond, @dtStart, GetDate()))
		+ ', Deleted=' + CONVERT(varchar(20), @intRowCount)
	RAISERROR (@strMsg, 10, 1) WITH NOWAIT
	WAITFOR DELAY '00:00:05'
END

I started off with @intBatchSize=50000 but I am getting the same thing with with @intBatchSize=100

Table script:


SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[MyTable]
(
	[ThePK_ID] [int] IDENTITY(10000,1) NOT NULL,
	[MyDateColumn] [datetime] NOT NULL,
	...
 CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED 
(
	[ThePK_ID] ASC
)WITH
(
	PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, 
		ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90
) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_Index1] ON [dbo].[MyTable] 
(
	[SomeColumn1] ASC
)WITH
(
	PAD_INDEX  = ON, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, 
		ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_Index2] ON [dbo].[MyTable] 
(
	[SomeColumn2] ASC
)WITH
(
	PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF,
		ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_Index3] ON [dbo].[MyTable] 
(
	[MyDateColumn] ASC
)WITH
(
	PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF,
		ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_Index4] ON [dbo].[MyTable] 
(
	[SomeColumn4] ASC
)WITH
(
	PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF,
		ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_Index5] ON [dbo].[MyTable] 
(
	[SomeColumn5] ASC
)WITH
(
	PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, 
		ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_Index6] ON [dbo].[MyTable] 
(
	[SomeColumn6] ASC
)WITH
(
	PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, 
		ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_Index7] ON [dbo].[MyTable] 
(
	[SomeColumn7] ASC
)WITH
(
	PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF,
		ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[MyTable]  WITH CHECK ADD  CONSTRAINT [FK_MyTable_SomeTable4] FOREIGN KEY
(
	[sm_ysvl_SomeColumn4]
)
REFERENCES [dbo].[SomeTable4] ([SomeColumn4])
GO
ALTER TABLE [dbo].[MyTable] CHECK CONSTRAINT [FK_MyTable_SomeTable4]
GO
ALTER TABLE [dbo].[MyTable] ADD  CONSTRAINT [DF_kk_SomeColumn3]  DEFAULT (0) FOR [SomeColumn3]
GO

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2011-08-02 : 08:02:55

Deadlock graph?

--
Gail Shaw
SQL Server MVP

lappin
Posting Yak Master

182 Posts

Posted - 2011-08-02 : 10:22:51

Your problem is most likely due to deleting in a loop. Use a set based approach, i.e. don't delete each row in a loop delete every row at once using a WHERE statement. Also your temporary table is written to disk which can also slow the process - consider deleting directly from dbo.MyTable or using a table variable instead of temp table.

Most locking can be prevented if queries can be speeded up.

lappin
Posting Yak Master

182 Posts

Posted - 2011-08-02 : 10:26:48

You might want to look at SWITCH.
http://sqlserverpedia.com/wiki/Switching_Partitions_-_Example

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2011-08-02 : 10:28:37

Batching is best for large deletes, otherwise the lock impact and log impact can get too large. Kristen's not deleting one row at a time, the deletes are just batched (sets of rows at a time)

--
Gail Shaw
SQL Server MVP

Kristen
Test

22859 Posts

Posted - 2011-08-02 : 11:11:00

I've had a go with Deadlock Graph, thanks for that Gail.

The two culprits are my delete batch and a call to a New Session Sproc which will be inserting a new record.

What I'm not understanding is that my delete is atomic (I think?) and, as far as I can see, has no Foreign Keys / triggers that might cause it to need to reference something else. Thus I would have expected it to be able to go through as a "one-shot" - but I expect I'm missing something :(

Can the Insert of a row cause deadlock by needing to lock pages of other indexes that the Delete is using? (and if so can I deduce which index that might be from the Deadlock Graph?)

The XML for the Deadlock Graph is on pastebin if that helps

http://pastebin.com/UCBGtjnB

Thanks for your help, as always.

Kristen
Test

22859 Posts

Posted - 2011-08-02 : 11:19:53

quote:
Originally posted by lappin

Your problem is most likely due to deleting in a loop. Use a set based approach, i.e. don't delete each row in a loop delete every row at once using a WHERE statement. Also your temporary table is written to disk which can also slow the process - consider deleting directly from dbo.MyTable or using a table variable instead of temp table.

Most locking can be prevented if queries can be speeded up.

Thanks for that but performance is NOT the key here, what I am after is minimum impact. The server is used by thousands of concurrent web users 24/7 who must not be disrupted by the deletion of some 20,000,000 rows.

In my experience deleting from a table using a range on a date column (which would be the case here), where that date column is not the PK / Clustered index, causes severe performance issues. We have always found it more efficient to make a query on the date column (or whatever the criteria columns are) and store the PK keys to a temporary table. This we can do using NOLOCK safely (we don't care about dirty reads, or falsely missing rows as they will be removed on a future run of the housekeeping task). We then join the temporary table to the primary table, by clustered index PK, to actually delete the rows. If the delete is large we can easily do this in batches - indeed the batch size can be adjusting according to how long the deletes are taking so that when the system is busy the delete batch size moderate itself and become as small as is still efficient; we can also force transaction log backups at frequent intervals (e.g. every time 100,000 rows have been deleted) to ensure that we do not extend the size of the LDF file (which, in our case, would wreck the careful hand optimisation we have made of the physical file and the number of VLFs it contains).

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2011-08-02 : 11:55:06

Personally I'd prefer the deadlock graph output from the 1222 traceflag, I don't use the XML much.

Check the resources that are locked and the resources that are being waited for by both.

--
Gail Shaw
SQL Server MVP

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2011-08-02 : 11:57:51

quote:
Originally posted by Kristen
Thanks for that but performance is NOT the key here, what I am after is minimum impact. The server is used by thousands of concurrent web users 24/7 who must not be disrupted by the deletion of some 20,000,000 rows.

In that case I'd be looking at partitioning the tables. If you can partition in such a way that all the rows you'll need to delete at any point are in one partition, you can get rid of them with a meta-data operations that's near instantaneous and has no impact.

Does need to be Enterprise Edition and there are other considerations for partitioned tables.

--
Gail Shaw
SQL Server MVP

lappin
Posting Yak Master

182 Posts

Posted - 2011-08-02 : 12:05:37

Sorry didn't read the code properly. If locking is caused by index being rebuilt you could look at splitting up PK, i.e. have a unique constraint on your existing column but cluster on the date field - this should mean finding the date range / deleting date range /re-indexing etc - causes less fragmentation and faster disk reads. Also less table locking as data more likely stored together on same page so page locks are more likely to meet optimiser's needs than table lock.

Kristen
Test

22859 Posts

Posted - 2011-08-03 : 02:10:07

"In that case I'd be looking at partitioning the tables"

Sadly we don't have Enterprise version on this server. As a matter of interest could the partitioning been created with zero/minimal impact?

"Personally I'd prefer the deadlock graph output from the 1222 traceflag"

Sorry, jsut to clarify: Do you mean the "picture" output by SQL Profiler, or something you track yourself using the traceflag?

(I have the picture, which I could post somewhere, or you can see it by saving the XML to file and opening it in SSMS if you can be bothered ... although I never found a Zoom button in SSMS, only in SQL Profiler)

Kristen
Test

22859 Posts

Posted - 2011-08-03 : 02:21:12

quote:
Originally posted by lappin

Sorry didn't read the code properly. If locking is caused by index being rebuilt you could look at splitting up PK, i.e. have a unique constraint on your existing column but cluster on the date field - this should mean finding the date range / deleting date range /re-indexing etc - causes less fragmentation and faster disk reads. Also less table locking as data more likely stored together on same page so page locks are more likely to meet optimiser's needs than table lock.

The existing PK on ID is used on all other tables connecting to this table, it would hurt performance to change that.

The ID is allocated in chronological order, but it sometimes happens that a session is held open for prolonged periods of time hence we are deleting based on that date. It is extrememly unlikely that there are any rows in the deletion zone that are NOT being included because their date is outside the desired range, and thus the deletion will almost certainly be a contiguous range of PK/clustered index records and thus there should be no fragmentation issues nor, or so I thought, any locking issues (which is the nub of my question - how to find where the locking issues ARE coming from)

However, with a PK/clustered index on [MyID]:


DELETE D
FROM MyTable
WHERE     MyID BETWEEN @StartID and @EndID
      AND MyDate < @EndDate

performs disastrously IME (i.e. when the date column is added to the WHERE clause, compared to just using a delete range on MyID. Dunno why, the records are in [MyID] order, I have not looked into why checking the [MyDate] slows the thing down)

Either way, we have found that the performance of a deletion where a simple TEMP table is joined to the main table on the PK/Clustered index works at very good speed and allows us to delete in batches (giving us the other benefits mentioned earlier).

I'm happy with the housekeeping method, just trying to establish why I am getting a deadlock periodically.

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2011-08-03 : 02:27:52

Nothing to add to the discussion other than Kristen's back!!!

(and yeah, I should read the post to try to help, but heck, it's late)

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2011-08-03 : 02:43:07

Ok, read it...

What else is going on? Something is operating against the records you're deleting?

What happens if you DELETE myTable WHERE T_id = xxx? (just one single record)

Look at the execution plan too.

Also, as Gail said, a look at the deadloc graph would be helpful.

What if you remove the nolock hint from the initial insert? Possible that if inserts are happening rapidly that splits are occuring that's causing your NOLOCK to actually be the culprit.

Anyway, WELCOME BACK! You've been missed. Stay away much longer and I'll catch ya on posts lol

Kristen
Test

22859 Posts

Posted - 2011-08-03 : 02:50:06

Just in case any help:

On the SQL profiler I ran to catch this I got:

Lock:Deadlock Chain Deadlock Chain SPID = 64 1:5538641 (that's my query's SPID)
Lock:Deadlock Chain Deadlock Chain SPID = 72 1:5575207
Lock:Deadlock Chain Parallel query worker thread was involved in a deadlock
Lock:Deadlock Chain Parallel query worker thread was involved in a deadlock
Lock:Deadlock 1:5538641 (this also showing my SPID)
then (a few seconds later):
Lock:Timeout object_id = 1238607801, stats_id = 2 (this on the other SPID)

the object_id is for a table I would not expect to have any reference to the Session table I am deleting from (but may well be included in an SProc that also accesses the session table)

I did see quite a few lock timeouts whilst running the Profile to catch the dealock. Various SPIDs from the website, and also blocks (i.e. 600-ish of them within a couple of seconds) of them for SA on TEMPDB.

Kristen
Test

22859 Posts

Posted - 2011-08-03 : 02:50:46

quote:
Originally posted by russell

Nothing to add to the discussion other than Kristen's back!!!

Yeah, but only 'coz I'm freeloading for some help ... :)

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2011-08-03 : 02:52:38

You're going to want to review the query SPID 72 is executing.

Also, please say you don't have cascading deletes...

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2011-08-03 : 02:53:17

quote:
Originally posted by Kristen

quote:
Originally posted by russell

Nothing to add to the discussion other than Kristen's back!!!

Yeah, but only 'coz I'm freeloading for some help ... :)

always here for you sir!!!

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2011-08-03 : 04:02:03

quote:
"Personally I'd prefer the deadlock graph output from the 1222 traceflag"

Sorry, jsut to clarify: Do you mean the "picture" output by SQL Profiler, or something you track yourself using the traceflag?

If you turn traceflag 1222 on and a deadlock occurs, SQL writes a text-version of the deadlock graph into the error log. I prefer that for working with deadlocks as it's easy to read once you're familiar with it and all the info is in one place.

--
Gail Shaw
SQL Server MVP

Kristen
Test

22859 Posts

Posted - 2011-08-03 : 04:14:17

"What else is going on? Something is operating against the records you're deleting?"

I've racked my brains to think of what might be. There are inserts, they should be at the "far end" of the table. However, any inserts will be creating entries all over the indexes, so there will be contention in some areas of the indexes. Other than updates to reset the MyDate column, this is basically a write-and-forget logging table (not used in JOINs except for the occasional admin query)

"What happens if you DELETE myTable WHERE T_id = xxx? (just one single record)"

That works fine. (Maybe if I did enough of them I would incur a deadlock though ...). I ran the loop this morning several times. The "best" run did 130 loops (of 10,000 deletes per loop) before it got a deadlock, so it seems that its dependent on exactly what other data is being manipulated. The websites being served were relatively quiet at that time, but still plenty of traffic

Chance that the deletes are triggering an index rebuild?

"Look at the execution plan too."

Yeah, I should do that!

"Also, as Gail said, a look at the deadloc graph would be helpful."

I'll find somewhere to post them. I don't know how to interpret them :(

"What if you remove the nolock hint from the initial insert? Possible that if inserts are happening rapidly that splits are occuring that's causing your NOLOCK to actually be the culprit."

That's outside my loop (what I'm actually doing is making the #TempTable, and then (manually) running the inner loop (i.e. re-running it each time it fails - I can only do this for about 60 minutes at the quietest time of the day as it is causing degradation of performance for users on the site, which in turn triggers alerts on the monitoring software that gets the Client agitated!)

Kristen
Test

22859 Posts

Posted - 2011-08-03 : 04:42:46

OK, I should have looked at the Query Plan sooner ... I assumed a delete by PK was straightforward ... it ain't :(

There are a bunch of ForeignKey checks, my inclination is to DROP those constraints. I'll keep them on DEV and QA systems, but we've found previously that they interfere with trying to bulk-delete from these, esenetially, logging-only tables. The session-log table isn't too bad, its 10,000s or 100,000s per day, but the detailed logs are 10,000,000s per day and bulk-deleting those with FKs on them used to be a nightmare - I thought we had thrown away the FKs on this Session table too, but obviously not :(

(I had tried to look for FKeys referring to this table, obviously I failed :( - what's the best way to do that?)

I've stuck the Query plan on pastebin (There is "copy to clipboard" at top and if pasted somewhere it will give a correctly formatted version), as it will wreck the width of this post if I put it here as CODE

http://pastebin.com/3bAUc4YJ

Not-a-clue why the outer is:


Index Delete(OBJECT:([MyDatabase].[dbo].[MyTable].[IX_MyDate]

when I'm deleting on PK, but that doesn't look good (or maybe that's just the first of many index delete with PREFETCH, and I'm overreacting because its the first one!).

The Clustered Index Seek/Scans for the FKey checks don't look good either ...

Previous Page&nsp; Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources