Return to Transaction Isolation and the New Snapshot Isolation Level
Transaction Isolation and the New Snapshot Isolation Level
Written by Paul Alcon on 09 July 2007
Concurrency and transaction
isolation are a prickly subject, difficult to explain with any kind of
clarity without boring the reader and leaving their poor brain in a complete
muddle. Therefore, it is often ignored in the vain hope it won't affect
us and we can forget all about it. Well you can't ignore it any more and with SQL Server 2005 there's a whole new isolation level added to the four that already exist. So I donned my leather gloves to make a brave attempt to make this subject approachable and get you to the end, without putting you to sleep or letting you wander over to the more interesting flames going on at Slashdot. So here it goes.
Imagine you're at home and the
mother-in-law is around for tea. She and your wife request a cup of tea
each, one with sugar and one without. Being a man (humor me here -- no
comments about me being sexist please) you can only assimilate one
thing at a time and as your dear partner was the last to make a request by a
few milliseconds, it stamps all over your mother-in-law's order in your
short term memory and you forget to add the sugar for hers. Drat! Minus
points for you again, on top of that you spent an inordinate amount of time
hovering over the sugar bowl trying to remember if anyone wanted sugar, so
the tea got a bit cold. It's an age old problem: How best to balance
concurrency with data integrity in a multi-user environment. We want
two cups of tea arriving together at high temperature, and we want the tea
to have sugar in it when sugar was requested.
There are traditionally two models
for database concurrency: Optimistic Concurrency and Pessimistic
Concurrency. With pessimistic concurrency transactions are more highly
isolated. This involves locking the data in the database when you read or
modify it so that it can't be modified by anyone else until you are finished
with it. By default SQL Server acquires a shared lock for read operations.
(A read being a select statement.) And an exclusive lock for modifying
operations. (Update, insert and delete statements.) Shared locks are
compatible with other shared locks. But, exclusive locks share with
no-one. The trouble with locking, is that it forces other transactions to
wait. (Our tea gets cold.) And even worse: Occasionally, two transactions
end up waiting permanently for each other in a deadlock. This is where
transaction one has a read lock on row one of a table and wants to modify
row two. Transaction two wants to modify row one and has a read lock on row
two. Neither can acquire the necessary exclusive lock to complete. SQL
Server can detect deadlocks; will abandon one of the transactions; and raise
a deadlock error.
With optimistic concurrency
isolation is reduced. It dispenses with the shared lock and is optimistic
that no-one will change the data that was just read, before you make changes
to it and save it back. The burden is now on the application using the
database to check for changes to the original data read, and deal with it
appropriately should any 'collisions' be detected. Therefore, modifying
transactions are not kept waiting by read locks, and deadlocks cannot occur.
But, the application now has to deal with a more complex situation.
If you don't explicitly declare a
transaction and isolation level around your SQL statements, each atomic
statement is implicitly a single transaction at READ COMMITTED isolation
level. A select statement will acquire a shared lock on the rows it is
reading. While it's reading, the data can't be modified by any update,
insert or delete statements. Those statements have to wait to obtain an
exclusive lock to modify the data. Any subsequent selects on that data in
turn have to wait for the exclusive lock to be released, before they can
obtain a shared lock and read the newly committed data. As the number of
transactions rise in your database, concurrency and therefore performance
can begin to degrade. As your application gets more sophisticated with
multi-statement procedures your data integrity can also suffer. Now that we
know what concurrency and isolation are all about, we can look at the
various levels available and their trade-off's with some practical examples.
The ultimate goal being to show how the new SNAPSHOT isolation
in SQL Server 2005 can increase concurrency performance without the loss of
data integrity in certain situations enabling us to leverage
optimistic concurrency which before now wasn't available in SQL Server.
I have already mentioned the
READ_COMMITTED and SNAPSHOT isolation levels, but here's the complete list
along with their concurrency effects:
|
Isolation level
|
Dirty Reads
|
Non-repeatable reads
|
Phantom reads
|
Concurrency control
|
|
READ UNCOMMITTED
|
Yes
|
Yes
|
Yes
|
Pessimistic
|
|
READ COMMITTED (with
locking)
|
No
|
Yes
|
Yes
|
Pessimistic
|
|
READ COMMITTED (with
snapshot)
|
No
|
Yes
|
Yes
|
Optimistic
|
|
REPEATABLE READ
|
No
|
No
|
Yes
|
Pessimistic
|
|
SNAPSHOT
|
No
|
No
|
No
|
Optimistic
|
|
SERIALIZABLE
|
No
|
No
|
No
|
Pessimistic
|
READ UNCOMMITED
Starting at the beginning and
working through, lets look at READ UNCOMMITED. As its name suggests, this
allows us to read data that maybe in the middle of being modified by another
transaction that is yet to complete. The data is uncommitted. This is the
lowest level of isolation and leaves us open to all three concurrency
effects. We will look at these effects in turn with examples. To do that we
need to familiarize ourselves first with some syntax: how to set an
isolation level for a transaction; how to explicitly declare the start
and end of a transaction.
To set an isolation level and
declare a transaction you use the following syntax:
SET TRANSACTION
ISOLATION LEVEL READ UNCOMMITTED
GO
BEGIN TRANSACTION T1
--SQL statements here ...
IF <some condition>
ROLLBACK TRANSACTION T1
ELSE
COMMIT TRANSACTION T1
In this example I have named the
transaction T1, this is optional. Transactions can also be nested.
You are allowed to abbreviate the TRANSACTION keyword to just TRAN. With
that under our belts we're ready at last to fire up the query window and get
our hands dirty.
The first effect to receive our
attention will be the Dirty Read. No this has nothing to do with top shelf
media! Sorry, I told you it was hard to make transactions interesting! To
keep things approachable and simple enough for our puny brains we will set
up a table with a single row like this:
CREATE TABLE tbl
(
id int IDENTITY(1,1)
PRIMARY KEY,
val int NOT NULL
)
GO
INSERT INTO tbl (val)
VALUES(1)
Open SQL Server Management Studio, choose or
create a new database, click the 'New query' button in the toolbar and paste
the above syntax into the query window. Finally click 'Execute' or press the
F5 key. (You can execute all the examples in this article in this manner)
Open two 'New query' tabs in Management Studio.
From the window menu choose 'New horizontal tab group.' Paste the following
two TSQL snippets, one into each query window.
Statement 1:
BEGIN TRAN longUpdate
UPDATE tbl
SET val = -1
WAITFOR '00:00:05'
ROLLBACK TRAN longUpdate
Because humans
are a bit slow, to see what's happening we're using WAITFOR to put a long
pause in the middle of the transaction. This is to give you time to click
the execute button for each statement.
Statement 2:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
GO
BEGIN TRAN
SELECT val FROM tbl
COMMIT TRAN
Click the
execute button for the first statement. Click in the second window to make it
active and click the execute button again. The second statement should
execute immediately and show the following result:
val
-------------
-1
Finally the
first statement should finish:
(1 row(s) affected)
Run the second
statement again on it's own, you should see this:
val
--------------
1
The result of -1 was a dirty read.
We were able to read the data without having to
wait for our long update to finish so concurrency is good, but we got dirty
data. We saw the uncommitted -1 value that got rolled back to 1 at the end
of statement 1. Imagine that was an inventory report. Fred in sales is
having one of those days, a customer who just cannot make up their mind.
He's just initiated an order reducing widget stock by 10 items. Meanwhile in
stock control Alice is looking forward to going home, she runs the inventory
report. “Thank goodness our system's so fast.” she thinks. “I'll be home in
no time. Hmm widgets are low!” She mails an order to the supplier for more
widgets. Back in sales the customer changes their mind at the last minute.
Fred cancels the order and with an inaudible sigh, hangs up the phone and
also leaves for the day. Next morning Alice finds Bob struggling under a
mini widget mountain in the warehouse! Oh dear.
READ COMMITED
So lets look at this again, but this time we're going to
go up an isolation level to the default for SQL Server READ COMMITED. With
the same query windows replace the select query in statement 2 with this:
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
GO
BEGIN TRAN
SELECT val FROM tbl
COMMIT TRAN
It's almost
identical to the previous statement, we have simply changed the isolation
level from READ UNCOMMITTED to READ COMMITTED. Just as before execute
statement 1 and then statement 2. Oh ho! This time if you got it right
statement 2 took muuuuuch longer and returned this result:
val
--------------
1
What's
happened this time is the update statement begins and acquires an exclusive
lock on the record, the read statement requests a shared lock on the same
record, it is blocked by the executing update statement and must wait for it
to complete and release it's exclusive lock before it can read the data. Our
inventory report is now safe, but it sometimes takes longer to run and Alice
doesn't always get out at 17:00 on the dot anymore. Oh well sorry Alice,
it's a balancing act like I said. But wait, maybe we can help poor Alice
after all. Widget Co. just got upgraded to SQL Server 2005 and READ COMMITTED
isolation has a new mode, let's take a look. Get a new third query window
open and execute the following statement in it:
ALTER DATABASE <your DB name here>
SET READ_COMMITTED_SNAPSHOT ON
The effect of this is immediate. But, you do need to be
in single user mode which we are (or you should be, I hope you're not doing
these exercises on your production DB!). Back to our two little statements
we were playing with earlier. Alter statement 2 again so it looks like this:
SET TRANSACTION ISOLATION LEVEL SNAPSHOT
GO
BEGIN TRAN
SELECT val FROM tbl
COMMIT TRAN
Execute them just as before. Set the update running first
and then the select statement. What happened this time? Our select
statement performs immediately just like it did at READ UNCOMMITTED level.
But, we also got a correct result of 1 for column val. No more dirty read.
Wow! Magic, how did it pull that little trick? Well the moment the update
transaction started a version of the row is squirreled away in tempdb by
SQL Server. When the select statement starts, SQL Server dishes up the
versioned row straight away, as this is the most recently committed data for
that row. So the select statement doesn't have to wait for the slow update.
Lucky Alice.
REPEATABLE READ
Moving on then, lets take a look at REPEATABLE READ. We
need to swap our statements around so our select statement becomes statement
1, and we move the WAITFOR into that statement and repeat the select again
afterwards. Our new statement 1 looks like this:
Statement 1
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
GO
BEGIN TRAN repeatRead
SELECT * FROM tbl
WAITFOR '00:00:05'
SELECT * FROM tbl
COMMIT TRAN repeatRead
New statement 2
BEGIN TRAN shortUpdate
UPDATE tbl
set val = -1
COMMIT TRAN shortUpdate
This time we
will commit our update, and our select statement is initially at READ
COMMITTED to illustrate what happens under the default level. Run the two
statements, starting the select statement 1 first. Notice how the update is
allowed in between the two reads of the same row, the results of each select
are different:
val
--------------
1
val
--------------
-1
Imagine in a
single procedure you were selecting data into a temporary table for
statistical analysis. Then the same set of data is read into another
temp table for some different analysis, to both then be be combined in a
report. The second read may now contain changed data and therefore not agree
with the first result set. To overcome this scenario we can use REPEATABLE
READ. Alter the select statement 1 to REPEATABLE READ so it looks like this:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
GO
BEGIN TRAN repeatRead
SELECT * FROM tbl
WAITFOR '00:00:05'
SELECT * FROM tbl
COMMIT TRAN repeatRead
Change the
update statement 2 so that it updates val back to 1 instead of -1:
BEGIN TRAN shortUpdate
UPDATE tbl
SET val = 1
COMMIT TRAN shortUpdate
Run the two
statements, starting the select in statement 1 first. This time the update
has to wait, it is blocked by the select statement until it completes, and
this time we get consistent results.
val
-------------
-1
val
--------------
-1
SNAPSHOT
Our statistical operation is now safe. But, we have had
to pay the cost of concurrency again. Our update was held up. Let's try our
new friend SNAPSHOT. Alter the select statement in statement 1 so it looks
like this:
SET TRANSACTION ISOLATION LEVEL SNAPSHOT
GO
BEGIN TRAN repeatRead
SELECT * FROM tbl
WAITFOR '00:00:05'
SELECT * FROM tbl
COMMIT TRAN repeatRead
Run the two
statements again, starting the select statement first and then the update.
This time the update executes immediately. But, the repeated select also
gets consistent results. Thanks to row versioning concurrency is restored.
We have paid a little in overhead required for row versioning, but it
is far less significant than the effects of locking.
So far we have
looked exclusively at read and update transactions. It is time to widen the
net a little and introduce inserts to demonstrate the effects of phantom
reads, which are a special case of non-repeatable reads. Delete actions can
also cause phantom reads but it is no different to inserts, so we will stick
with insert to keep things simple. A phantom read is where a select
statement produces more or less results than a previous select, using the
same criteria. Let's demonstrate with an example. Change the select
statement 1 back to REPEATABLE READ and create a new insert for statement 2
as follows:
Statement 1
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
GO
BEGIN TRAN repeatRead
SELECT * FROM tbl
WAITFOR '00:00:05'
SELECT * FROM tbl
COMMIT TRAN repeatRead
Statement 2
BEGIN TRAN shortInsert
INSERT INTO tbl
VALUES (2)
COMMIT TRAN shortInsert
Execute the
statements in order. The long running select will produce the following two
result sets:
val
---------------
1
val
---------------
1
2
This can have
exactly the same repercussions as non-repeatable read where statistical
analysis is the goal. Change the long running select transaction back to
SNAPSHOT isolation level. Run the two statements again in order. This time
both selects return the same result. Even though a third record has now been
added in between the two selects they both return just two records.
Snapshot Isolation
Up to this
point we have engaged only the READ COMMITTED SNAPSHOT database option. We
will now turn on ALLOW_SNAPSHOT_ISOLATION to see the effects. To do this
execute the following statement:
ALTER DATABASE <your DB name here>
SET ALLOW_SNAPSHOT_ISOLATION ON
The
ALLOW_SNAPSHOT_ISOLATION will go into
PENDING_ON mode if any existing transactions are running. As soon as they
complete the option will become fully on and active. With the
ALLOW_SNAPHOT_ISOLATION on, statements that modify data can take advantage
of optimistic concurrency also. But, it adds a new level of complexity
that the application using the database will now have to handle. Such is the
price of optimism. In snapshot mode an update statement for example, no
longer has to compete with shared locks. This means an update can occur
in between a read and an update that are within a single transaction. In this
situation SQL Server detects this and throws an error that the application
can catch and deal with accordingly. Here is a demonstration to illustrate.
Statement 1
SET TRANSACTION ISOLATION LEVEL SNAPSHOT
GO
BEGIN TRAN optimisticTran
SELECT * FROM tbl
WAITFOR '00:00:05'
UPDATE tbl
SET val = 3
WHERE id = 1
COMMIT TRAN optimisticTran
Statement
2
BEGIN TRAN shortUpdate
UPDATE tbl
SET val = 4
WHERE id = 1
COMMIT TRAN shortUpdate
Run these
together starting statement 1 first. The short update will complete quickly.
But, statement 2 will raise an error that the data has been modified
in between reading the data and then attempting to update it.
Msg 3960, Level 16, State 2, Line 15
Snapshot isolation transaction aborted due to update conflict.
You cannot use snapshot isolation to access table 'dbo.tbl' directly
or indirectly in database 'yourdbname' to update, delete, or insert
the row that has been modified or deleted by another transaction.
Retry the transaction or change the isolation level for the
update/delete statement.
Phew! Finally
for those of you still with me, it wouldn't do to finish without looking at
the last level SERIALIZABLE. This is the ultimate level of isolation. Like
SNAPSHOT it also protects us from dirty reads, non-repeatable reads and
phantom reads but without the update conflict we just saw above. However,
this is expensive in terms of concurrency as we shall see in this final
example:
Statement 1
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
GO
BEGIN TRAN pessimisticTran
SELECT * FROM tbl
WAITFOR '00:00:05'
UPDATE tbl
SET val = 3
WHERE id = 1
COMMIT TRAN pessimisticTran
Statement 2
BEGIN TRAN shortUpdate
UPDATE tbl
SET val = 4
WHERE id = 1
COMMIT TRAN shortUpdate
Run the two
statements. The insert in statement 2 is forced to wait on the long running
pessimistic transaction. But, we avoid the update conflict experienced with
SNAPSHOT isolation.
To conclude
then. We have explored the new SNAPSHOT isolation in contrast to those that
were already in SQL Server prior to SQL Server 2005 and the opportunities it
provides. The main goal of SNAPSHOT isolation is to bring optimistic
concurrency to SQL Server. It allows us to avoid the concurrency effects of
Dirty read, non-repeatable read and Phantom read. But, without the full cost
to performance that SERIALIZABLE isolation brings. The responsibility for
update conflicts is moved to the application to be handled there. It is no
mistake that this fits in nicely with the disconnected nature of
n-tier data applications that can be created with Microsoft's .NET Framework.
Well I hope you enjoyed our little journey, and feel
armed to deal with transactions using them to your advantage. Thank you for
reading this article, your comments are welcome. If there are any battle
hardened transaction warriors out there, I welcome your comments too.
|