WITH(NOLOCK VS. READ_COMMITED_SNAPSHOT

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

WITH(NOLOCK VS. READ_COMMITED_SNAPSHOT

Author

Topic

namman
Constraint Violating Yak Guru

285 Posts

Posted - 2010-05-25 : 16:33:53

In other thread, I see lot of talk about NOLOCK and READ_COMMITED_SNAPSHOT, so I spent last couple days to review these topics, with articles on MS site and SQL Sever blogs, then did test with my developers to confirm those things. Here is my little comparision between the two.

NOLOCK:
Over intenet, people loudly say that WITH(NOLOCK) is missused. I agree 100%, but want to add litle more : WITH(NOLOCK) is "misNOTused" too. It means that WITH(NOLOCK) is used in some cases it should not and it is not used in some cases it should.

GOOD:
Using WITH(NOLOCK), no lock is generated. As a result, the chance to get deadlock is dramatic decreased. Also, lock is expensive, using WITH(NOLOCK) has much better perfomance than using any kind of locks.

BAD:
Using WITH(NOLOCK) (READ UNCOMMITED), it returns uncommited data as the name suggests, sometime very wrong in case page split during the time statement WITH(NOLOCK) access the source.

READ_COMMITED_SNAPSHOT
READ_COMMITED_SNAPSHOT is likely combination of READ_COMMITED + Row Visioning. And as the name suggests, READ_COMMITED_SNAPSHOT and READ_COMMITED return data that already committed. However this isolation level do NOT 100% guarantee correct data (some my developers belive this only after we did the test). Why commited data is incorrect data? The reason is that the isolate level get the commited data at begining of statement, not at begining of transactions. So in some cases, it will return data different from our expectation.

GOOD:
return data is "likely" correct, but not guarantee + reduce deadlock because it uses Sch-S lock, not page or row lock.

BAD:
decrease performance, because it applies row versioning, server has to do extra work to copy data to tempdb before the data is updated; plus, it still requires a lock(Sch-S).

WITH(NOLOCK VS. READ_COMMITED_SNAPSHOT
It depends on business requirement of the application.

In cases incorrect data is somehow acceptable and performance is more important: Using WITH(NOLOCK). For example: counting number of users in a community forum. The number serves some purposes, but everybody knows this number may be incorrect or even being cheated. Nobody bases on this number for serious investments. So why not using WITH(NOLOCK) for better performance?

In cases correct data is important factors: Using READ_COMMITED_SNAPSHOT

In cases of critical data like financial area, auction sites: DON'T use either one.

Another choice : SNAPSHOT
The difference between SNAPSHOT and READ_COMMITED_SNAPSHOT is like the difference between REPEATABLE and READ_COMMITED. With SNAPSHOT isolation level, data return is commited data at the begin of transaction, not statement, so return data is much more trustable than READ_COMMITED_SNAPSHOT. It should be used for applications with critical data.

SIRIALIZABLE is another choice to guarantee correct data and no update conflict detection. With this isolation level, deadlock will pop up like hell. So we have to do lot of work on application/database/query design to deal with deadlock issue.

That is my view point, which can be right or wrong or whatever. So any comment is appreciated.

Kristen
Test

22859 Posts

Posted - 2010-05-25 : 18:48:51

Good post

I think this article is useful:

http://sqlblog.com/blogs/andrew_kelly/archive/2009/04/10/how-dirty-are-your-reads.aspx

Couple of thoughts:

Getting COUNT(*) for Number of Members is indeed non-critical data. However, using NOLOCK you may still raise a non-recoverable error if the query is made at the time of an page-split. This, like other NOLOCK side effects, is extremely difficult to "force" during testing, so normally winds up being a behaviour only found on Production, and users are either annoyed or will report an error that takes time to track down and, usually, because it is not repeatable, no cause is found.

If I used a forum where I occasionally got non-repeatable error I would lose trust in the forum.

READ_COMMITED_SNAPSHOT / NOLOCK are not interchangeable options - READ_COMMITED_SNAPSHOT is set for the database, not the transaction.

I don't have any, good, like-for-like, before & after comparisons. We introduced READ_COMMITED_SNAPSHOT with our upgrade away from SQL 2000 so systems both upgraded and got READ_COMMITED_SNAPSHOT at the same time. However, we have not noticed any appreciable side effects.

We only use NOLOCK for SysAdmin queries on very-high transaction Logging Tables where it is critical that our diagnostic reports do not block processes that are essentially "write only" to the LOG tables.

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-05-25 : 20:08:44

We saw a huge performance improvement when we switched to READ_COMMITTED_SNAPSHOT. At the time, we weren't even aware of that isolation level on SQL 2005 and only heard about when we opened a case with MS PSS to help us with performance improvements. According to MS, READ_COMMITTED_SNAPSHOT is the recommended isolation level for OLTP databases. It is not the default because they wanted to keep the setting the same from 2000.

With READ_COMMITTED_SNAPSHOT, you do not get dirty reads so I'm confused why you are saying the data is likely correct with this isolation level.

Tempdb should be optimized when using READ_COMMITTED_SNAPSHOT. It should be on its own set of disks and have as many data files as there are CPUs (cores not sockets).

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-05-25 : 20:10:42

READ_COMMITTED_SNAPSHOT avoids deadlocks between reads and writes, deadlocks will still happen between writes and writes.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

namman
Constraint Violating Yak Guru

285 Posts

Posted - 2010-05-26 : 01:30:03

Thanks for comment, Krisen and tkizer.

However, using NOLOCK you may still raise a non-recoverable error if the query is made at the time of an page-split. This, like other NOLOCK side effects, is extremely difficult to "force" during testing, so normally winds up being a behaviour only found on Production, and users are either annoyed or will report an error that takes time to track down and, usually, because it is not repeatable, no cause is found.

Yes, I should add that in the BAD of NOLOCK.

I just want to present pro-con of the two. I did not say NOLOCK is good nor READ_COMMITED_SNAPSHOT is bad. Each has it's place depending on business requirement.

I read article in the link you provide,
http://sqlblog.com/blogs/andrew_kelly/archive/2009/04/10/how-dirty-are-your-reads.aspx
Thanks

NOLOCK may be bad but not as bad as some people think (it provides good performance in cases performance is weight more than other factors). READ_COMMITED_SNAPSHOT may be good but not as good as some people think (it also returns wrong data in some cases).

And more important, with critical data, we have another option that is much better than READ_COMMITED_SNAPSHOT. That is SNAPSHOT isolation level I described in the previous message.

READ_COMMITED_SNAPSHOT / NOLOCK are not interchangeable options - READ_COMMITED_SNAPSHOT is set for the database, not the transaction.

Sure, one is on database level and the other on transaction level.

READ_COMMITTED_SNAPSHOT avoids deadlocks between reads and writes, deadlocks will still happen between writes and writes

I agree with you tkizer.

We saw a huge performance improvement when we switched to READ_COMMITTED_SNAPSHOT

If you switch from READ_COMMITTED to READ_COMMITTED_SNAPSHOT, YES, off course, the performance is must better because READ_COMMITTED_SNAPSHOT using schema-s lock, not row or page lock. If you mean READ_COMMITTED_SNAPSHOT is better than NOLOCK in performance, perhaps, I have to do more research on that because I think different as I explained in the previous message.

With READ_COMMITTED_SNAPSHOT, you do not get dirty reads so I'm confused why you are saying the data is likely correct with this isolation level.

As I mentioned in previous message, with READ_COMMITTED_SNAPSHOT, the committed data we get is committed data at beginning of statement, not transaction. That is the reason we may not get the correct data as we expect. And off course getting incorrect data with READ_COMMITTED_SNAPSHOT is rare (but it happened on our test). That is what I mean about "is likely correct".

Kristen
Test

22859 Posts

Posted - 2010-05-26 : 05:14:15

" If you mean READ_COMMITTED_SNAPSHOT is better than NOLOCK in performance"

I think Tara was comparing READ_COMMITTED_SNAPSHOT to "no hints"

" NOLOCK may be bad but not as bad as some people think (it provides good performance in cases performance is weight more than other factors). "

I still think that programmers will not bother to program against the possible side effects.

Also, I cannot think of a good use in an end-user application. You mention "Member count", and I agree the accuracy of that is not very important, however, as a user, I would be alarmed if it suddenly showed 10 fewer or 10 more (i.e. because there were page splits occurring at the time of the query).

In that particular example I would change the program to have a "count column" of the number of members - doesn't make sense (to me) to issue a COUNT(*) every time you need to know the number of members, better to store that information (using a trigger or whatever).

But I don't mean to argue about your chosen example

I just mean I'm struggling to think of a good real-world example where NOLOCK would be acceptable (other than my sysadmin only report where the syssadmins will be aware of the possible fragility of the report and the only aim is to prevent any interference with the application, as per my earlier example)

But I'm happy to be educated better on this topic

READ_COMMITTED_SNAPSHOT is a blunt weapon, but I think its a very good fix in all the OLTP scenarios I've encountered. Perhaps if I was starting from scratch I might us ALLOW_SNAPSHOT_ISOLATION and explicitly use the READCOMMITTED hint when needed ... but ... I think I would spend far too much time deciding whether it was needed, or not!, on each query and I don't think my development budget would stretch to that amount of time & testing ... sadly

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2010-05-26 : 07:52:59

I've been advocating the use of NOLOCK for quite some time now and I just can't resist doing it again :) I would guess that on average about 80% of the data in any given system could go under the definition of "historic data", data that is unlikely to change, and issuing locks on these data is usually waste of valuable resources. But of course you need to know the risks involved and never use the dirty data as a part of any transactions or anything involved in critical business information or the like.

- Lumbago
My blog (yes, I have a blog now!) -> www.thefirstsql.com

Kristen
Test

22859 Posts

Posted - 2010-05-26 : 08:23:07

I'm very happy to have the debate

"never use the dirty data as a part of any transactions or anything involved in critical business information or the like"

But ... if there is a page-split whilst you are reading the data you will show some-missing, some-twice, or get application error. I don't think that's ever acceptable (for something you are showing to a user) - even if it is historic log data they'll still be scratching their heads, or making a duff decision, on the basis of what they see?

If the table is (effectively) read-only then what?

Does NOLOCK make much difference (if there are no WRITES on the table)? (I dunno the answer to that one, but quite possibly) ...

... BUT ... presumably there is some sort of archive routine that DOES write to that table? Middle-of-the-night or once-a-month. What then of any queries during page-splits? Lock the system so that no one could accidentally access the system during such data-archiving periods?

(We've had clock-errors or scheduled tasks that were disabled, forgotten, and then someone realised an re-enabled them and, being overdue, they ran immediately - in the middle of the day - even though that was never the intention.

I prefer to assume the worst in these sorts of scenarios

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2010-05-26 : 08:51:14

But do you have page-splits on historic data? I guess you could if you use GUIDs or composite primary keys in a clustered index, but usually (at least often enough to prove my point) historic data doesn't change that often and in my opinion doesn't need locking. Just to be clear: I'm not saying you should always use nolock on historic data, just that it's perfectly justifiable knowing the risks involved.

About READ_COMMITED_SNAPSHOT; this is actually the first time I hear about inaccurate data when using this isolation level so it would be great if anyone could confirm it. As far as I can remember this is the recomended isolation level from MS on all 2005/2008 databases so I find it somewhat hard to believe that it could cause problems...?

- Lumbago

My blog (yes, I have a blog now! just not that much content yet)
-> www.thefirstsql.com

Kristen
Test

22859 Posts

Posted - 2010-05-26 : 09:40:42

"But do you have page-splits on historic data?"

Only when more data is being "archived" into the historic data table, and then I think there is a potential annoyance for users.

If you have a partition table and there is genuinely no further updates, ever, then fine

" inaccurate data when using this isolation level "

I think namman's point was that the snapshot is as-of the start of the statement-block, rather than the start of the transaction?

I can see that this could cause a Gotcha if you weren't expecting that behaviour.

@namman have you got an example we could play with?

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-05-26 : 12:19:21

I'm going to contact Microsoft regarding namman's claim about READ_COMMITTED_SNAPSHOT. Stay tuned...

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

namman
Constraint Violating Yak Guru

285 Posts

Posted - 2010-05-27 : 12:36:52

About READ_COMMITED_SNAPSHOT; this is actually the first time I hear about inaccurate data when using this isolation level so it would be great if anyone could confirm it. As far as I can remember this is the recomended isolation level from MS on all 2005/2008 databases so I find it somewhat hard to believe that it could cause problems...?
- Lumbago

With read_committed_snapshot, we always get the committed data, no question about that. "incorrect data" I mentioned is something else.

We did tests on our dev server to show developers "incorrect data" when using read_committed_snapshot. If I have time I will do the again and keep it as a reference. But for now I can present a simple example that may illustrate "incorrect data" when using read_committed_snapshot isolation level.

Suppose, I work for an auction site. One of BR is that: When users click on an item, they will see info of the highest bid of that item; also, in db, we will update ViewItem table and some other actions (let make simple, just update ViewItem).

So I write a sp with itemId parameter somehow like this:
................
(
@itemId int
)

declare @HB decimal(8,2)
begin tran
set @HB = (select max(Bid) from BitInfo where id = @itemId)
update ItemView set LatestView = getdate(), HighestBid = @HB where id = @itemId
select [needed info] from BidInfo where id = @itemId and Bid = @HB
committed tran
(forgive error syntax if any)

The sp is not super, off course, but look normal huh? Support, we currently have 1000 bids on that item. When users click on the item, the system runs the sp with read_committed_snapshot isolation level, and returns the result.

Normally, users will get info of the highest bid. However, as you can see, sometime, users get nothing, even though no exception occurs and there are 1000 bids currently on that item, thus there always exists the highest bid(bid is unique). That is what I mean about "incorrect data" when using read_committed_snapshot.

Using snapshot (I think MS should named this as repeatable_snapshot or seiralizable_snapshot

) isolation level, it just works fine with that sp.

That is one of example about "incorrect data" I imply. You can create other examples that show "incorrect data" is more severe.

Don't get me wrong. I am not against read_committed_snapshot. For me, it is good if used properly (with care). I just think snapshot handles data better.

READ_COMMITTED_SNAPSHOT is a blunt weapon, but I think its a very good fix in all the OLTP scenarios I've encountered. Perhaps if I was starting from scratch I might us ALLOW_SNAPSHOT_ISOLATION and explicitly use the READCOMMITTED hint when needed ... but ... I think I would spend far too much time deciding whether it was needed, or not!, on each query and I don't think my development budget would stretch to that amount of time & testing ... sadly

Yes, if I was you, I might do the same thing

I'm going to contact Microsoft regarding namman's claim about READ_COMMITTED_SNAPSHOT. Stay tuned...

Thanks
It's good to have reponse from MS even the message is "namman is stupid, just ignore him".

Couple years ago, I emailed MS with some questions about sql server. They just ignored me, really.

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-05-27 : 15:15:21

We have a support contract with Microsoft, which is why I'm able to get answers.

Here's the response:

quote:

There is no data inaccuracy. It is accurate according to when you selected it and what isolation level you are operating in. For RCSI you get the values based on exactly what was committed in the database at the start of the query. For read uncommitted (nolock) you get the correct values at exactly the time the query reads them without consideration to any transactions in flight. For read committed you get the exact values committed at the time you read them and the statement will be blocked if a transaction is in flight, so you wait until the transaction either changes the value or rolls back. Full Snapshot Isolation you get the correct values at exactly the time just prior to any transaction in flight started.

RCSI allows you to read data without being blocked even though it is locked and potentially being changed by a transaction. So in order processing, there are 5 units on hand, your RCSI query reads five, the buyer orders all five, but another buyer was already ordering two of those, the transaction was in flight and by the time the buyer of 5 finishes the order there are no longer 5 to buy. If nolock is used in this same scenario the amount on hand would be either 5 or 3 depending upon how far along the transaction was. Then the other risk here is, the nolock reads 3 available because the change has already been made to the amount but the transaction has not committed, so you display 3 available to your customer, he goes elsewhere cause he needs 5 right now, the other buyer cancels and microseconds after you lose the sale, 5 are available. In the case of read committed, the buyer of 5 would wait until the other order was either completed or cancelled, then he would see exactly how many were available to him, of course you can lose the sale here because he gets tired of waiting and leaves. Snapshot isolation level adds another caveat to complex transactions and there is a good example in Books Online showing this that my simple example above doesn’t cover. See http://msdn2.microsoft.com/library/df3134f5-396c-406e-a7d9-b94b59277445.aspx

These are all correct scenarios and why we have multiple isolation levels, but the applications have to understand which isolation level they are using and take that into account when making business decision using the values returned to them. RCSI was a great addition to SQL as it provides another method to reduce a great deal of blocking without any application changes required, but it does provide the data from a bit different time related to transactions in flight. Nolock became heavily used over the years because of the large number of blocking problems that arose in SQL. None of the above approaches is wrong, all are viable options, it’s up to the reader to understand the isolation level they are reading in and what that means to the overall state of the data.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

namman
Constraint Violating Yak Guru

285 Posts

Posted - 2010-05-28 : 14:13:16

Good to have response from MS, and I think it is necessary to highlight some important parts.

Thanks, tkizer

There is no data inaccuracy. It is accurate according to when you selected it and what isolation level you are operating in. For RCSI you get the values based on exactly what was committed in the database at the start of the query. For read uncommitted (nolock) you get the correct values at exactly the time the query reads them without consideration to any transactions in flight. For read committed you get the exact values committed at the time you read them and the statement will be blocked if a transaction is in flight, so you wait until the transaction either changes the value or rolls back. Full Snapshot Isolation you get the correct values at exactly the time just prior to any transaction in flight started.

RCSI allows you to read data without being blocked even though it is locked and potentially being changed by a transaction. So in order processing, there are 5 units on hand, your RCSI query reads five, the buyer orders all five, but another buyer was already ordering two of those, the transaction was in flight and by the time the buyer of 5 finishes the order there are no longer 5 to buy. If nolock is used in this same scenario the amount on hand would be either 5 or 3 depending upon how far along the transaction was. Then the other risk here is, the nolock reads 3 available because the change has already been made to the amount but the transaction has not committed, so you display 3 available to your customer, he goes elsewhere cause he needs 5 right now, the other buyer cancels and microseconds after you lose the sale, 5 are available. In the case of read committed, the buyer of 5 would wait until the other order was either completed or cancelled, then he would see exactly how many were available to him, of course you can lose the sale here because he gets tired of waiting and leaves. Snapshot isolation level adds another caveat to complex transactions and there is a good example in Books Online showing this that my simple example above doesn’t cover. See http://msdn2.microsoft.com/library/df3134f5-396c-406e-a7d9-b94b59277445.aspx

These are all correct scenarios and why we have multiple isolation levels, but the applications have to understand which isolation level they are using and take that into account when making business decision using the values returned to them. RCSI was a great addition to SQL as it provides another method to reduce a great deal of blocking without any application changes required, but it does provide the data from a bit different time related to transactions in flight. Nolock became heavily used over the years because of the large number of blocking problems that arose in SQL. None of the above approaches is wrong, all are viable options, it’s up to the reader to understand the isolation level they are reading in and what that means to the overall state of the data.

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-05-28 : 14:34:35

I agree that those parts are important. My systems are perfectly happy with the data being accurate at the start of the transaction, so RCSI works very well for us.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources