Transaction Isolation and the New Snapshot Isolation Level

By Paul Alcon on 9 July 2007 | Tags: Transactions


Concurrency and transaction isolation are a prickly subject, difficult to explain with any kind of clarity without boring the reader and leaving their poor brain in a complete muddle. Therefore, it is often ignored in the vain hope it won't affect us and we can forget all about it. Well you can't ignore it any more and with SQL Server 2005 there's a whole new isolation level added to the four that already exist. So I donned my leather gloves to make a brave attempt to make this subject approachable and get you to the end, without putting you to sleep or letting you wander over to the more interesting flames going on at Slashdot. So here it goes.

Imagine you're at home and the mother-in-law is around for tea. She and your wife request a cup of tea each, one with sugar and one without. Being a man (humor me here -- no comments about me being sexist please) you can only assimilate one thing at a time and as your dear partner was the last to make a request by a few milliseconds, it stamps all over your mother-in-law's order in your short term memory and you forget to add the sugar for hers. Drat! Minus points for you again, on top of that you spent an inordinate amount of time hovering over the sugar bowl trying to remember if anyone wanted sugar, so the tea got a bit cold. It's an age old problem: How best to balance concurrency with data integrity in a multi-user environment. We want two cups of tea arriving together at high temperature, and we want the tea to have sugar in it when sugar was requested.

There are traditionally two models for database concurrency: Optimistic Concurrency and Pessimistic Concurrency. With pessimistic concurrency transactions are more highly isolated. This involves locking the data in the database when you read or modify it so that it can't be modified by anyone else until you are finished with it. By default SQL Server acquires a shared lock for read operations. (A read being a select statement.) And an exclusive lock for modifying operations. (Update, insert and delete statements.) Shared locks are compatible with other shared locks. But, exclusive locks share with no-one. The trouble with locking, is that it forces other transactions to wait. (Our tea gets cold.) And even worse: Occasionally, two transactions end up waiting permanently for each other in a deadlock. This is where transaction one has a read lock on row one of a table and wants to modify row two. Transaction two wants to modify row one and has a read lock on row two. Neither can acquire the necessary exclusive lock to complete. SQL Server can detect deadlocks; will abandon one of the transactions; and raise a deadlock error.

With optimistic concurrency isolation is reduced. It dispenses with the shared lock and is optimistic that no-one will change the data that was just read, before you make changes to it and save it back. The burden is now on the application using the database to check for changes to the original data read, and deal with it appropriately should any 'collisions' be detected. Therefore, modifying transactions are not kept waiting by read locks, and deadlocks cannot occur. But, the application now has to deal with a more complex situation.

If you don't explicitly declare a transaction and isolation level around your SQL statements, each atomic statement is implicitly a single transaction at READ COMMITTED isolation level. A select statement will acquire a shared lock on the rows it is reading. While it's reading, the data can't be modified by any update, insert or delete statements. Those statements have to wait to obtain an exclusive lock to modify the data. Any subsequent selects on that data in turn have to wait for the exclusive lock to be released, before they can obtain a shared lock and read the newly committed data. As the number of transactions rise in your database, concurrency and therefore performance can begin to degrade. As your application gets more sophisticated with multi-statement procedures your data integrity can also suffer. Now that we know what concurrency and isolation are all about, we can look at the various levels available and their trade-off's with some practical examples. The ultimate goal being to show how the new SNAPSHOT isolation in SQL Server 2005 can increase concurrency performance without the loss of data integrity in certain situations enabling us to leverage optimistic concurrency which before now wasn't available in SQL Server.

I have already mentioned the READ_COMMITTED and SNAPSHOT isolation levels, but here's the complete list along with their concurrency effects:

Isolation level

Dirty Reads

Non-repeatable reads

Phantom reads

Concurrency control

READ UNCOMMITTED

Yes

Yes

Yes

Pessimistic

READ COMMITTED (with locking)

No

Yes

Yes

Pessimistic

READ COMMITTED (with snapshot)

No

Yes

Yes

Optimistic

REPEATABLE READ

No

No

Yes

Pessimistic

SNAPSHOT

No

No

No

Optimistic

SERIALIZABLE

No

No

No

Pessimistic

READ UNCOMMITED

Starting at the beginning and working through, lets look at READ UNCOMMITED. As its name suggests, this allows us to read data that maybe in the middle of being modified by another transaction that is yet to complete. The data is uncommitted. This is the lowest level of isolation and leaves us open to all three concurrency effects. We will look at these effects in turn with examples. To do that we need to familiarize ourselves first with some syntax: how to set an isolation level for a transaction; how to explicitly declare the start and end of a transaction.

To set an isolation level and declare a transaction you use the following syntax:

SET TRANSACTION 
ISOLATION LEVEL READ UNCOMMITTED 
GO
BEGIN TRANSACTION T1

--SQL statements here ...
IF <some condition>
	ROLLBACK TRANSACTION T1
ELSE
	COMMIT TRANSACTION T1

In this example I have named the transaction T1, this is optional. Transactions can also be nested. You are allowed to abbreviate the TRANSACTION keyword to just TRAN. With that under our belts we're ready at last to fire up the query window and get our hands dirty.

The first effect to receive our attention will be the Dirty Read. No this has nothing to do with top shelf media! Sorry, I told you it was hard to make transactions interesting! To keep things approachable and simple enough for our puny brains we will set up a table with a single row like this:

CREATE TABLE tbl 
(
id int IDENTITY(1,1)
PRIMARY KEY,
val int NOT NULL
)
GO
INSERT INTO tbl (val)
VALUES(1)

Open SQL Server Management Studio, choose or create a new database, click the 'New query' button in the toolbar and paste the above syntax into the query window. Finally click 'Execute' or press the F5 key. (You can execute all the examples in this article in this manner)

Open two 'New query' tabs in Management Studio. From the window menu choose 'New horizontal tab group.' Paste the following two TSQL snippets, one into each query window.

Statement 1:

BEGIN TRAN longUpdate

UPDATE tbl 
SET val = -1 

WAITFOR '00:00:05'

ROLLBACK TRAN longUpdate

Because humans are a bit slow, to see what's happening we're using WAITFOR to put a long pause in the middle of the transaction. This is to give you time to click the execute button for each statement.

Statement 2:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
GO
BEGIN TRAN

SELECT val FROM tbl 

COMMIT TRAN

Click the execute button for the first statement. Click in the second window to make it active and click the execute button again. The second statement should execute immediately and show the following result:

val
-------------
-1

Finally the first statement should finish:

(1 row(s) affected)

Run the second statement again on it's own, you should see this:

val
--------------
1

The result of -1 was a dirty read. We were able to read the data without having to wait for our long update to finish so concurrency is good, but we got dirty data. We saw the uncommitted -1 value that got rolled back to 1 at the end of statement 1. Imagine that was an inventory report. Fred in sales is having one of those days, a customer who just cannot make up their mind. He's just initiated an order reducing widget stock by 10 items. Meanwhile in stock control Alice is looking forward to going home, she runs the inventory report. “Thank goodness our system's so fast.” she thinks. “I'll be home in no time. Hmm widgets are low!” She mails an order to the supplier for more widgets. Back in sales the customer changes their mind at the last minute. Fred cancels the order and with an inaudible sigh, hangs up the phone and also leaves for the day. Next morning Alice finds Bob struggling under a mini widget mountain in the warehouse! Oh dear.

READ COMMITED

So lets look at this again, but this time we're going to go up an isolation level to the default for SQL Server READ COMMITED. With the same query windows replace the select query in statement 2 with this:

SET TRANSACTION ISOLATION LEVEL READ COMMITTED
GO

BEGIN TRAN
SELECT val FROM tbl 
COMMIT TRAN

It's almost identical to the previous statement, we have simply changed the isolation level from READ UNCOMMITTED to READ COMMITTED. Just as before execute statement 1 and then statement 2. Oh ho! This time if you got it right statement 2 took muuuuuch longer and returned this result:

val
--------------
1

What's happened this time is the update statement begins and acquires an exclusive lock on the record, the read statement requests a shared lock on the same record, it is blocked by the executing update statement and must wait for it to complete and release it's exclusive lock before it can read the data. Our inventory report is now safe, but it sometimes takes longer to run and Alice doesn't always get out at 17:00 on the dot anymore. Oh well sorry Alice, it's a balancing act like I said. But wait, maybe we can help poor Alice after all. Widget Co. just got upgraded to SQL Server 2005 and READ COMMITTED isolation has a new mode, let's take a look. Get a new third query window open and execute the following statement in it:

ALTER DATABASE <your DB name here>
SET READ_COMMITTED_SNAPSHOT ON

The effect of this is immediate. But, you do need to be in single user mode which we are (or you should be, I hope you're not doing these exercises on your production DB!). Back to our two little statements we were playing with earlier. Alter statement 2 again so it looks like this:

SET TRANSACTION ISOLATION LEVEL SNAPSHOT
GO

BEGIN TRAN
SELECT val FROM tbl 
COMMIT TRAN

Execute them just as before. Set the update running first and then the select statement. What happened this time? Our select statement performs immediately just like it did at READ UNCOMMITTED level. But, we also got a correct result of 1 for column val. No more dirty read. Wow! Magic, how did it pull that little trick? Well the moment the update transaction started a version of the row is squirreled away in tempdb by SQL Server. When the select statement starts, SQL Server dishes up the versioned row straight away, as this is the most recently committed data for that row. So the select statement doesn't have to wait for the slow update. Lucky Alice.

REPEATABLE READ

Moving on then, lets take a look at REPEATABLE READ. We need to swap our statements around so our select statement becomes statement 1, and we move the WAITFOR into that statement and repeat the select again afterwards. Our new statement 1 looks like this:

Statement 1

SET TRANSACTION ISOLATION LEVEL READ COMMITTED
GO

BEGIN TRAN repeatRead

SELECT * FROM tbl 
WAITFOR '00:00:05'
SELECT * FROM tbl 

COMMIT TRAN repeatRead 

New statement 2

BEGIN TRAN shortUpdate

UPDATE tbl
set val = -1 

COMMIT TRAN shortUpdate

This time we will commit our update, and our select statement is initially at READ COMMITTED to illustrate what happens under the default level. Run the two statements, starting the select statement 1 first. Notice how the update is allowed in between the two reads of the same row, the results of each select are different:

val
--------------
1

val
--------------
-1

Imagine in a single procedure you were selecting data into a temporary table for statistical analysis. Then the same set of data is read into another temp table for some different analysis, to both then be be combined in a report. The second read may now contain changed data and therefore not agree with the first result set. To overcome this scenario we can use REPEATABLE READ. Alter the select statement 1 to REPEATABLE READ so it looks like this:

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
GO

BEGIN TRAN repeatRead

SELECT * FROM tbl 
WAITFOR '00:00:05'
SELECT * FROM tbl 

COMMIT TRAN repeatRead

Change the update statement 2 so that it updates val back to 1 instead of -1:

BEGIN TRAN shortUpdate

UPDATE tbl 
SET val = 1 

COMMIT TRAN shortUpdate

Run the two statements, starting the select in statement 1 first. This time the update has to wait, it is blocked by the select statement until it completes, and this time we get consistent results.

val
-------------
-1

val
--------------
-1

SNAPSHOT

Our statistical operation is now safe. But, we have had to pay the cost of concurrency again. Our update was held up. Let's try our new friend SNAPSHOT. Alter the select statement in statement 1 so it looks like this:

SET TRANSACTION ISOLATION LEVEL SNAPSHOT
GO

BEGIN TRAN repeatRead

SELECT * FROM tbl 
WAITFOR '00:00:05'
SELECT * FROM tbl 

COMMIT TRAN repeatRead

Run the two statements again, starting the select statement first and then the update. This time the update executes immediately. But, the repeated select also gets consistent results. Thanks to row versioning concurrency is restored. We have paid a little in overhead required for row versioning, but it is far less significant than the effects of locking.

So far we have looked exclusively at read and update transactions. It is time to widen the net a little and introduce inserts to demonstrate the effects of phantom reads, which are a special case of non-repeatable reads. Delete actions can also cause phantom reads but it is no different to inserts, so we will stick with insert to keep things simple. A phantom read is where a select statement produces more or less results than a previous select, using the same criteria. Let's demonstrate with an example. Change the select statement 1 back to REPEATABLE READ and create a new insert for statement 2 as follows:

Statement 1

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
GO
BEGIN TRAN repeatRead

SELECT * FROM tbl 
WAITFOR '00:00:05'
SELECT * FROM tbl 

COMMIT TRAN repeatRead

Statement 2

BEGIN TRAN shortInsert

INSERT INTO tbl 
VALUES (2) 

COMMIT TRAN shortInsert

Execute the statements in order. The long running select will produce the following two result sets:

val
---------------
1

val
---------------
1
2

This can have exactly the same repercussions as non-repeatable read where statistical analysis is the goal. Change the long running select transaction back to SNAPSHOT isolation level. Run the two statements again in order. This time both selects return the same result. Even though a third record has now been added in between the two selects they both return just two records.

Snapshot Isolation

Up to this point we have engaged only the READ COMMITTED SNAPSHOT database option. We will now turn on ALLOW_SNAPSHOT_ISOLATION to see the effects. To do this execute the following statement:

ALTER DATABASE <your DB name here>
SET ALLOW_SNAPSHOT_ISOLATION ON

The ALLOW_SNAPSHOT_ISOLATION will go into PENDING_ON mode if any existing transactions are running. As soon as they complete the option will become fully on and active. With the ALLOW_SNAPHOT_ISOLATION on, statements that modify data can take advantage of optimistic concurrency also. But, it adds a new level of complexity that the application using the database will now have to handle. Such is the price of optimism. In snapshot mode an update statement for example, no longer has to compete with shared locks. This means an update can occur in between a read and an update that are within a single transaction. In this situation SQL Server detects this and throws an error that the application can catch and deal with accordingly. Here is a demonstration to illustrate.

Statement 1

SET TRANSACTION ISOLATION LEVEL SNAPSHOT
GO

BEGIN TRAN optimisticTran

SELECT * FROM tbl 

WAITFOR '00:00:05'

UPDATE tbl 
SET val = 3
WHERE id = 1 

COMMIT TRAN optimisticTran

Statement 2

BEGIN TRAN shortUpdate

UPDATE tbl 
SET val = 4
WHERE id = 1

COMMIT TRAN shortUpdate

Run these together starting statement 1 first. The short update will complete quickly. But, statement 2 will raise an error that the data has been modified in between reading the data and then attempting to update it.

Msg 3960, Level 16, State 2, Line 15

Snapshot isolation transaction aborted due to update conflict. 
You cannot use snapshot isolation to access table 'dbo.tbl' directly 
or indirectly in database 'yourdbname' to update, delete, or insert 
the row that has been modified or deleted by another transaction. 
Retry the transaction or change the isolation level for the 
update/delete statement.

Phew! Finally for those of you still with me, it wouldn't do to finish without looking at the last level SERIALIZABLE. This is the ultimate level of isolation. Like SNAPSHOT it also protects us from dirty reads, non-repeatable reads and phantom reads but without the update conflict we just saw above. However, this is expensive in terms of concurrency as we shall see in this final example:

Statement 1

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
GO

BEGIN TRAN pessimisticTran

SELECT * FROM tbl 

WAITFOR '00:00:05'

UPDATE tbl 
SET val = 3
WHERE id = 1

COMMIT TRAN pessimisticTran

Statement 2

BEGIN TRAN shortUpdate

UPDATE tbl 
SET val = 4
WHERE id = 1

COMMIT TRAN shortUpdate

Run the two statements. The insert in statement 2 is forced to wait on the long running pessimistic transaction. But, we avoid the update conflict experienced with SNAPSHOT isolation.

To conclude then. We have explored the new SNAPSHOT isolation in contrast to those that were already in SQL Server prior to SQL Server 2005 and the opportunities it provides. The main goal of SNAPSHOT isolation is to bring optimistic concurrency to SQL Server. It allows us to avoid the concurrency effects of Dirty read, non-repeatable read and Phantom read. But, without the full cost to performance that SERIALIZABLE isolation brings. The responsibility for update conflicts is moved to the application to be handled there. It is no mistake that this fits in nicely with the disconnected nature of n-tier data applications that can be created with Microsoft's .NET Framework.

Well I hope you enjoyed our little journey, and feel armed to deal with transactions using them to your advantage. Thank you for reading this article, your comments are welcome. If there are any battle hardened transaction warriors out there, I welcome your comments too.


- Advertisement -