Deprecate UPDATE FROM? Not if I can help it!
Fellow MVP Hugo Kornelis (blog) has suggested that the proprietary UPDATE FROM and DELETE FROM syntax, which has worked for several SQL Server versions, should be deprecated in favor of MERGE. Here is the Connect item he raised:
As you can see, the response is quite divided (more so than any other item that I can recall) – at the time of writing, it was 11 up-votes and 12 down-votes. I have no shame in admitting that I am one of the people who down-voted Hugo's suggestion back when it was originally posted. And this blog post may seem like I'm replying quite late (almost three years after the fact), but the truth is that this debate was re-hashed recently due to a DELETE FROM issue that turned out to be a simple aliasing problem. But nonetheless, it got my gears spinning, so…
Why don't I give my love to MERGE?
The proprietary syntax works fine, in most cases. I find UPDATE FROM to be a very elegant way to update the contents of a table based on a join to one or more other tables. Let's say I have an Orders table and an OrderDetails table. Since each OrderDetails item may be shipped and handled (and even canceled) separately, there is a [Status] column in the OrderDetails table. And let's also say that we have discovered that Customer #1 would like to cancel all of his pending orders. There is no built-in cascade option that will cover this scenario, and we want to make sure that OrderDetails isn't inspected directly to see the (now out-of-date) status on any of this customer's line items. Before updating the parent row to 'Canceled', I would write an UPDATE FROM like this to affect all of the "child" rows:
UPDATE od SET od.[Status] = 'Canceled' FROM dbo.OrderDetails AS od INNER JOIN dbo.Orders AS o ON od.OrderID = o.OrderID WHERE o.CustomerID = @CustomerID AND o.[OrderStatus] = 'Pending';
ANSI (and –CELKO–, no doubt) would have me change it to this:
UPDATE dbo.OrderDetails SET [Status] = 'Canceled' WHERE EXISTS ( SELECT * FROM dbo.Orders AS o WHERE o.OrderID = dbo.OrderDetails.OrderID AND o.CustomerID = @CustomerID AND o.OrderStatus = 'Pending' );
Or perhaps a CTE:
;WITH cte AS ( SELECT od.[Status] FROM dbo.Orders AS o INNER JOIN dbo.OrderDetails AS od ON o.OrderID = od.OrderID WHERE o.CustomerID = @CustomerID AND o.OrderStatus = 'Pending' ) UPDATE cte SET [Status] = 'Canceled';
And Hugo would apparently rather see this (and wait 20 minutes for me to figure out how to write it):
MERGE dbo.OrderDetails AS od USING ( SELECT OrderID FROM dbo.Orders WHERE CustomerID = @CustomerID AND OrderStatus = 'Pending' ) AS o ON od.OrderID = o.OrderID WHEN MATCHED THEN UPDATE SET od.[Status] = 'Canceled';
While completely subjective, and this only represents how I feel at this exact moment in time, I would much rather write the first piece of code than ever see the latter three examples. The primary reason is that I can comment out the first two lines of the proprietary UPDATE FROM, and change the entire query to a SELECT if I want to check that my JOIN results are correct:
SELECT * --UPDATE od -- SET od.[Status] = 'Canceled' FROM dbo.OrderDetails AS od INNER JOIN dbo.Orders AS o ON od.OrderID = o.OrderID WHERE o.CustomerID = @CustomerID AND o.[OrderStatus] = 'Pending';
MERGE is not perfect, either.
For a lot more:
I will concede to Hugo that there are cases where you can write an UPDATE FROM statement that would lead to unpredictable results, but I will submit that you can render a lot of unpredictable results with ANSI-compliant syntax as well (I'll show one below). In the meantime, here are just a few *still active* Connect items (or that are closed as "Won't Fix") where users have discovered bugs in the way that MERGE has been implemented. Most importantly, that it has problems with filtered indexes, and causes unexpected @@ROWCOUNT results in triggers. You can see more details here:
And here are a few cautions from Books Online about MERGE that require a lot of parsing and may raise doubt about its use in your scenario:
It is important to specify only the columns from the target table that are used for matching purposes. That is, specify columns from the target table that are compared to the corresponding column of the source table. Do not attempt to improve query performance by filtering out rows in the target table in the ON clause, such as by specifying AND NOT target_table.column_x = value. Doing so may return unexpected and incorrect results.
Specifying READPAST with WHEN NOT MATCHED [ BY TARGET ] THEN INSERT may result in INSERT operations that violate UNIQUE constraints
I'll say all this a different way: the syntax is downright cumbersome. You've seen my above example, and no doubt the syntax diagram and examples from Books Online; it's not for the faint of heart. I took a quick poll on twitter and, while not pretending that it is a scientific result, all but one respondent admitted that to write a MERGE statement, they would have to look up the syntax in Books Online. And I was tempted to challenge the one respondent to a closed-book test. This is syntax that has been around for three years (and longer for those of us involved in the SQL Server 2008 beta), but is not easy to master AT ALL.
Other ways to shoot yourself in the foot
I'm all for deprecation, when warranted (TIMESTAMP or the use of sys.objects, anyone?). Look, Hugo is a very smart guy, and I respect him greatly, but his reasons for deprecating this syntax just don't hold up. He says that "improperly code join criteria You can shoot yourself in the foot with any standard syntax, too. A very common example is to not properly correlate the EXISTS clause, in which case ALL rows in OrderDetails are updated:
UPDATE dbo.OrderDetails SET [Status] = 'Canceled' WHERE EXISTS ( SELECT * FROM dbo.Orders AS o WHERE o.OrderID = o.OrderID -- bad correlation: no error, no warning! AND o.CustomerID = @CustomerID AND o.OrderStatus = 'Pending' );
And you can do something quite similar with a CTE as well:
;WITH cte AS ( SELECT od.[Status] FROM dbo.Orders AS o INNER JOIN dbo.OrderDetails AS od ON o.OrderID = o.OrderID -- bad correlation: no error, no warning! WHERE o.CustomerID = @CustomerID AND o.OrderStatus = 'Pending' ) UPDATE cte SET [Status] = 'Canceled';
A major benefit of MERGE, as Hugo points out, is that an error message is produced if you accidentally program the same bad correlation:
MERGE dbo.OrderDetails AS od USING ( SELECT OrderID FROM dbo.Orders WHERE CustomerID = @CustomerID AND OrderStatus = 'Pending' ) AS o ON o.OrderID = o.OrderID -- bad correlation WHEN MATCHED THEN UPDATE SET od.[Status] = 'Canceled';
The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.
So, MERGE will better protect you from shooting yourself in the foot. But if you're not prone to shooting yourself in the first place…
Hugo also suggests simply removing this construct is not going to magically free up hours and hours of development time that will be better spent elsewhere. How much time does he really believe the development team is wasting on UPDATE FROM today?
I'm not vehemently opposed to MERGE. I think we should all become familiar with the syntax at some point, and in fact I've implemented it in a few places – but not as a replacement for UPDATE FROM, but rather to get rid of multi-statement "UPSERT" constructs like:
UPDATE ... ; IF @@ROWCOUNT = 0 INSERT ... ;
… or …
IF EXISTS (...) UPDATE ... ; ELSE INSERT ...;
To me, the "UPSERT" scenario is squarely in the wheelhouse of MERGE, as opposed to "UPDATE FROM." I just think that it should present the right benefits so that you'll want to use it on your own schedule, rather than when the dev team decides it's time to deprecate the method you're currently using to achieve the same result. Just be careful to use SERIALIZABLE (aka HOLDLOCK) when using MERGE for more than one operation – for the guts on why, see this blog post from Dan Guzman, and this article of mine.
If we're going to do anything with the way UPDATE FROM works, how about narrowing it down to something more specific and troublesome, such as the "quirky update" syntax? I think there would be a lot less opposition to having this kind of code raise an 'Incorrect syntax' error, and replace it with better windowed aggregate support:
CREATE TABLE dbo.OrderHistory ( OrderID INT PRIMARY KEY, OrderTotal DECIMAL(6,2) NOT NULL, RunningTotal DECIMAL(8,2) NULL ); GO INSERT dbo.OrderHistory(OrderID, OrderTotal) SELECT 1, 12.74 UNION SELECT 2, 213.55 UNION SELECT 3, 67.44; GO DECLARE @RunningTotal DECIMAL(8,2) = 0.00; UPDATE dbo.OrderHistory SET @RunningTotal = RunningTotal = @RunningTotal + OrderTotal FROM dbo.OrderHistory; GO SELECT OrderID, OrderTotal, RunningTotal FROM dbo.OrderHistory; GO DROP TABLE dbo.OrderHistory; GO
Having these same running totals aggregated via an OVER() clause – now THAT is something where I can justify the dev team spending more time.
I like MERGE but I always have to lookup the syntax every time I use it. Load of people struggle with the UPDATE FROM syntax. Let's see how they cope when we throw MERGE into the mix.
Removing or deprecating the UPDATE / DELETE FROM syntax would be a mistake IMHO.
I recommend developers use a Orm. Entity framework will do it right everytime. No need to worry. Then SQL is more of a escape hatch then a hammer for every problem. Who needs a stored proc for every piece of crud? Heaven forbid if you have to add a field to a table. In that case your resigned to an afternoon of plumbing to make sure all you queries on that table are updated.
Yes, validating that the code works is huge. I can't test a MERGE without either (a) making a copy of all the tables and data involved, or (b) wrapping everything in transactions. Arguably we should be doing (b) anyway, but it's not necessary when you can easily test your assumptions by performing a SELECT instead of affecting data.
Now, it is worth it to test "the hard way" sometimes, especially in the UPSERT scenario. But in a lot of cases I just can't justify the time sink.
Yup. Totally agree that it would be dump to deprecate this option. Folks that want fuller ANSI compliance can code things that way. (But they're STILL going to run the code on SQL Server and any illusions about being able to magically just port that code to another RDBMS without all sorts of extra work are just delusions.)
And @RichB nails it: Making this change would impact SOOOOO much code already out there that it would definitely hinder sales.
Great post though (and I concur: my fav reason for the 'older' syntax is that it's so much easier to switch to a SELECT and then test).
I agree 100% with you about the dev team dedicating time to support the OVER clause in full, instead focusing on deprecating UPDATE FROM or any other statement / command.
It would also be interesting, and perhaps productive, if the dev team also implement the support for the ANSI standard row value constructor, which could be very handy for multiple columns updates.
SET (c1, c2, c3) = (SELECT T2.c1, T2.c2, T2.c3
WHERE T2.keycol = T1.keycol)
WHERE keycol = @key;
Concidently, the suggestion was also made by Hugo and it got my vote.
Of course, there would be 2 big consequences involved with deprecating it:
1. Few organisations would get round to upgrading to that version.
2. A massive boom in short terms contract work to rewrite the exceptions to the above points code base.
Of course it might also provide an additional stimulus for folk to make the break with SQL entirely – if they have to effectively rewrite it all they may as well take the plunge to one of the competitions products like their boards golf buddies keep suggesting…
"(and wait 20 minutes for me to figure out how to write it)" – LOL. This is the biggest reason MERGE has been met with a decided lack of enthusiasm with our developers. No one can ever remember how to use the blasted thing, and they just graviate back to the simpler, and easier-to-use methods that work just as well.
I personally think removing UPDATE FROM would be a horrible move on Microsoft's part. As with any language one can shoot themselves in the foot with just about everything, so should all instances where this can happen, in all languages, just be removed?
UPDATE FROM is far easier to understand and write than the MERGE syntax, and I hope Microsoft sees the light and doesn't remove it from TSQL.