BEGIN TRY
declare @NewChild int
insert into Child ( blah, blah, blah ) values ( @blah, @blah, @blah )
select @NewChildId = SCOPE_IDENTITY()
update Parent set CurrentChildId = @NewChildId where Parent = @ParentId
if @@TRANCOUNT > 0 commit tran TxExample
END TRY
BEGIN CATCH
if @@TRANCOUNT > 0 rollback tran
END CATCH
This is relatively straight forward. It inserts the child, then updates the parent's cached data. Those two operations are wrapped in a transaction and a try catch to ensure that if anything should fail for any reason, both statements will be rolled back. This ensures data integrity.
And now it's time to talk about deadlocks. This code is susceptible to deadlocks. As a relatively contrived but none the less possible example suppose the following SQL could also be run:
update Parent set blah = @blah where ParentId = @ParentId
...
select * from Child where ChildParentId = @ParentId
commit tran
This is because when the second query tries to select from the Child table, it will wait because the first query has inserted a new row and SQL Server's default isolation level is read committed, which means dirty data will not be read, instead it will wait for the data to be committed. So it's going to sit there, waiting for the first query to commit.
This isn't a deadlock yet. The deadlock happens when SQL switches back to the first query and attempts to execute the update on the parent. When it does this, it will try to obtain an exclusive lock on that parent row, but it won't be able to because the second query already has an exclusive lock from it's update. So it will wait for the second query to commit.
The first query is now waiting for the second query which is waiting for the first query and you have yourself a deadlock.
Before we fix it, we should ask ourselves "is this a big deal?" The answer is, it depends, but in general yes. If all your SQL is small and all your transactions complete quickly and you don't have very many users banging on the system concurrently then you probably wont see any deadlocks. But unless you can guarantee that all those conditions will remain the same you have to be at least a little worried. And if those conditions don't apply to you, you definitely have to be worried.
So how do we fix it? First thing we could do is to commit the transaction in the second query before executing the select. If this is possible, then it's a good idea. You want your transaction to commit as quickly as possible and you want to touch as few objects as you can while in the transaction. That said, there are plenty of reasons why you might not be able to commit the transaction after the update. For example, maybe you're reading the child data because you need it to perform another update, and those two updates have to be in the same transaction. In that case, there is nothing you can do to fix query #2.
But even if you could fix query #2, someone could some day come along and write query #3 which would introduce the same problem again. So what we really need to do is fix query #1. The way we do that is by having query #1 obtain a shared lock on all the resources we know it will need to touch, immediately at the top of the query.
Add this code after the BEGIN TRY:
select ParentId from Parent where ParentId = @ParentId
set transaction isolation level readcommitted
With this code in place, query #2 will not be able to execute it's update until query #1 completes. Thus, preventing the deadlock and saving the day!
This example was simple but the deadlock was still subtle and hard to see. This problem just gets more complicated the more complicated your SQL gets. And your SQL will get more complicated in direct relation to how complicated your data schema is. So you really have to be on the look out for this issue.
Before I wrap this up, I should mention that if you need to lock more than just one row in one table at the top of your query (like we did in query #1), life can get interesting. If the tables you are locking are all related you can lock them by inner joining to them. But if they are unrelated, things get interesting. If they're unrelated, you can't join from one to the next, so you need to execute separate select statements. And if two queries need to lock the same records in two unrelated tables, but they lock them in different orders (A, B vs. B, A) you can end up with a deadlock! For these cases you have to resort to what you learned in your operating systems class: always lock all your resources in the same order. Good luck with that.
I'll leave you with some rules of thumb, which apply to most cases but, of course, not all:
- Keep your transactions as small as possible by touching as few objects as possible
- Keep your transactions as fast as possible: if you have a query that can execute on n records in a single transaction where n is unbounded you are likely to find yourself in a world of hurt
- Obtain shared locks on everything your transaction will eventually require exclusive locks on before you acquire any other locks
- If you need to do any reads that don't need to be repeatable, do them before you obtain any shared or exclusive locks (this is really just in keeping with #2)
- If you set the transaction isolation level to repeatable read make sure you're setting it back to read committed (even if its the last line of your query, this will make sure triggers don't execute in repeatable read)