Deprecate UPDATE FROM? Not if I can help it!
Fellow MVP Hugo Kornelis (blog) has suggested that the proprietary UPDATE FROM and DELETE FROM syntax, which has worked for several SQL Server versions, should be deprecated in favor of MERGE. Here is the Connect item he raised:
As you can see, the response is quite divided (more so than any other item that I can recall) – at the time of writing, it was 11 up-votes and 12 down-votes. I have no shame in admitting that I am one of the people who down-voted Hugo's suggestion back when it was originally posted. And this blog post may seem like I'm replying quite late (almost three years after the fact), but the truth is that this debate was re-hashed recently due to a DELETE FROM issue that turned out to be a simple aliasing problem. But nonetheless, it got my gears spinning, so…
Why don't I give my love to MERGE?
The proprietary syntax works fine, in most cases. I find UPDATE FROM to be a very elegant way to update the contents of a table based on a join to one or more other tables. Let's say I have an Orders table and an OrderDetails table. Since each OrderDetails item may be shipped and handled (and even canceled) separately, there is a [Status] column in the OrderDetails table. And let's also say that we have discovered that Customer #1 would like to cancel all of his pending orders. There is no built-in cascade option that will cover this scenario, and we want to make sure that OrderDetails isn't inspected directly to see the (now out-of-date) status on any of this customer's line items. Before updating the parent row to 'Canceled', I would write an UPDATE FROM like this to affect all of the "child" rows:
UPDATE od SET od.[Status] = 'Canceled' FROM dbo.OrderDetails AS od INNER JOIN dbo.Orders AS o ON od.OrderID = o.OrderID WHERE o.CustomerID = @CustomerID AND o.[OrderStatus] = 'Pending';
ANSI (and –CELKO–, no doubt) would have me change it to this:
UPDATE dbo.OrderDetails SET [Status] = 'Canceled' WHERE EXISTS ( SELECT * FROM dbo.Orders AS o WHERE o.OrderID = dbo.OrderDetails.OrderID AND o.CustomerID = @CustomerID AND o.OrderStatus = 'Pending' );
Or perhaps a CTE:
;WITH cte AS ( SELECT od.[Status] FROM dbo.Orders AS o INNER JOIN dbo.OrderDetails AS od ON o.OrderID = od.OrderID WHERE o.CustomerID = @CustomerID AND o.OrderStatus = 'Pending' ) UPDATE cte SET [Status] = 'Canceled';
And Hugo would apparently rather see this (and wait 20 minutes for me to figure out how to write it):
MERGE dbo.OrderDetails AS od USING ( SELECT OrderID FROM dbo.Orders WHERE CustomerID = @CustomerID AND OrderStatus = 'Pending' ) AS o ON od.OrderID = o.OrderID WHEN MATCHED THEN UPDATE SET od.[Status] = 'Canceled';
While completely subjective, and this only represents how I feel at this exact moment in time, I would much rather write the first piece of code than ever see the latter three examples. The primary reason is that I can comment out the first two lines of the proprietary UPDATE FROM, and change the entire query to a SELECT if I want to check that my JOIN results are correct:
SELECT * --UPDATE od -- SET od.[Status] = 'Canceled' FROM dbo.OrderDetails AS od INNER JOIN dbo.Orders AS o ON od.OrderID = o.OrderID WHERE o.CustomerID = @CustomerID AND o.[OrderStatus] = 'Pending';
MERGE is not perfect, either.
I will concede to Hugo that there are cases where you can write an UPDATE FROM statement that would lead to unpredictable results, but I will submit that you can render a lot of unpredictable results with ANSI-compliant syntax as well (I'll show one below). In the meantime, here are just a few *still active* Connect items (or that are closed as "Won't Fix") where users have discovered bugs in the way that MERGE has been implemented. Most importantly, that it has problems with filtered indexes, and causes unexpected @@ROWCOUNT results in triggers. You can see more details here:
And here are a few cautions from Books Online about MERGE that require a lot of parsing and may raise doubt about its use in your scenario:
It is important to specify only the columns from the target table that are used for matching purposes. That is, specify columns from the target table that are compared to the corresponding column of the source table. Do not attempt to improve query performance by filtering out rows in the target table in the ON clause, such as by specifying AND NOT target_table.column_x = value. Doing so may return unexpected and incorrect results.
Specifying READPAST with WHEN NOT MATCHED [ BY TARGET ] THEN INSERT may result in INSERT operations that violate UNIQUE constraints
I'll say all this a different way: the syntax is downright cumbersome. You've seen my above example, and no doubt the syntax diagram and examples from Books Online; it's not for the faint of heart. I took a quick poll on twitter and, while not pretending that it is a scientific result, all but one respondent admitted that to write a MERGE statement, they would have to look up the syntax in Books Online. And I was tempted to challenge the one respondent to a closed-book test. This is syntax that has been around for three years (and longer for those of us involved in the SQL Server 2008 beta), but is not easy to master AT ALL.
Other ways to shoot yourself in the foot
I'm all for deprecation, when warranted (TIMESTAMP or the use of sys.objects, anyone?). Look, Hugo is a very smart guy, and I respect him greatly, but his reasons for deprecating this syntax just don't hold up. He says that "improperly code join criteria You can shoot yourself in the foot with any standard syntax, too. A very common example is to not properly correlate the EXISTS clause, in which case ALL rows in OrderDetails are updated:
UPDATE dbo.OrderDetails SET [Status] = 'Canceled' WHERE EXISTS ( SELECT * FROM dbo.Orders AS o WHERE o.OrderID = o.OrderID -- bad correlation: no error, no warning! AND o.CustomerID = @CustomerID AND o.OrderStatus = 'Pending' );
And you can do something quite similar with a CTE as well:
;WITH cte AS ( SELECT od.[Status] FROM dbo.Orders AS o INNER JOIN dbo.OrderDetails AS od ON o.OrderID = o.OrderID -- bad correlation: no error, no warning! WHERE o.CustomerID = @CustomerID AND o.OrderStatus = 'Pending' ) UPDATE cte SET [Status] = 'Canceled';
A major benefit of MERGE, as Hugo points out, is that an error message is produced if you accidentally program the same bad correlation:
MERGE dbo.OrderDetails AS od USING ( SELECT OrderID FROM dbo.Orders WHERE CustomerID = @CustomerID AND OrderStatus = 'Pending' ) AS o ON o.OrderID = o.OrderID -- bad correlation WHEN MATCHED THEN UPDATE SET od.[Status] = 'Canceled';
The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.
So, MERGE will better protect you from shooting yourself in the foot. But if you're not prone to shooting yourself in the first place…
Hugo also suggests simply removing this construct is not going to magically free up hours and hours of development time that will be better spent elsewhere. How much time does he really believe the development team is wasting on UPDATE FROM today?
I'm not vehemently opposed to MERGE. I think we should all become familiar with the syntax at some point, and in fact I've implemented it in a few places – but not as a replacement for UPDATE FROM, but rather to get rid of multi-statement "UPSERT" constructs like:
UPDATE ... ; IF @@ROWCOUNT = 0 INSERT ... ;
… or …
IF EXISTS (...) UPDATE ... ; ELSE INSERT ...;
To me, the "UPSERT" scenario is squarely in the wheelhouse of MERGE, as opposed to "UPDATE FROM." I just think that it should present the right benefits so that you'll want to use it on your own schedule, rather than when the dev team decides it's time to deprecate the method you're currently using to achieve the same result. Just be careful to use SERIALIZABLE (aka HOLDLOCK) when using MERGE for more than one operation – for the guts on why, see this blog post from Dan Guzman, and this article of mine.
If we're going to do anything with the way UPDATE FROM works, how about narrowing it down to something more specific and troublesome, such as the "quirky update" syntax? I think there would be a lot less opposition to having this kind of code raise an 'Incorrect syntax' error, and replace it with better windowed aggregate support:
CREATE TABLE dbo.OrderHistory ( OrderID INT PRIMARY KEY, OrderTotal DECIMAL(6,2) NOT NULL, RunningTotal DECIMAL(8,2) NULL ); GO INSERT dbo.OrderHistory(OrderID, OrderTotal) SELECT 1, 12.74 UNION SELECT 2, 213.55 UNION SELECT 3, 67.44; GO DECLARE @RunningTotal DECIMAL(8,2) = 0.00; UPDATE dbo.OrderHistory SET @RunningTotal = RunningTotal = @RunningTotal + OrderTotal FROM dbo.OrderHistory; GO SELECT OrderID, OrderTotal, RunningTotal FROM dbo.OrderHistory; GO DROP TABLE dbo.OrderHistory; GO
Having these same running totals aggregated via an OVER() clause – now THAT is something where I can justify the dev team spending more time.