Bad habits to kick : relying on undocumented behavior
Originally published February 2010; updated May 2020.
See the full index.
In my last post in this series, I talked about the common habit of creating an IDENTITY column on every single table. Today I want to talk about a more broad concept: relying on undocumented (and therefore probably undefined, and certainly far from guaranteed) behavior and objects.
This undocumented and unsupported system procedure creates a cursor, loops through every database, and performs some action against each of them. The problem? It randomly skips databases (because of the flaky cursor they use), and Microsoft refuses to acknowledge the problem (never mind document or fix it). So, I wrote my own, called sp_ineachdb, and wrote about all the improvements here:
- Execute a Command in the Context of Each Database – Part 1
- Execute a Command in the Context of Each Database – Part 2
Initially I had created this one, and wrote about it here, but it had its own limitations (most notably, it couldn't be used in
INSERT...EXEC and, like the original, it didn't actually run in the context of the target database).
ORDER BY in a view
This is probably the most infamous of all SQL Server undocumented behaviors. In SQL Server 2000, users learned to create views with a built-in ordering, like this:
CREATE VIEW dbo.MyView AS SELECT TOP 100 PERCENT a,b,c FROM dbo.MyTable ORDER BY c;
The problem is, they assume that if they then issue…
SELECT * FROM dbo.MyView;
…the results are guaranteed to come back ordered by
c (because that's what the view says).
Without an ORDER BY on the outer query, SQL Server is free to return the results in any order it chooses (more about this later). To dissect the syntax in the view a bit, I want to explain that this is a symptom of a poor design choice in the T-SQL dialect — the way TOP was implemented, ORDER BY can serve two functions: to determine the rows included in the result, and to indicate the order of the result. Which is why you sometimes see the following logic to get the top 10 finishers of a race, with 10th place listed first (well, let's ignore ranking functions and the OVER() clause for now):
SELECT RunnerID, [time_in_seconds] FROM ( SELECT TOP (10) RunnerID, [time_in_seconds] FROM dbo.RaceResults WHERE RaceID = 15 ORDER BY [time_in_seconds] ) AS x ORDER BY [time_in_seconds] DESC;
Going back to the view, the TOP 100 PERCENT is required in this case because the ORDER BY is only supposed to be used to determine the rows included in TOP, not to guarantee the order of the results. The syntax for a view allows you to include an ORDER BY clause *only* if you also include a TOP clause. In SQL Server 2000, you would in fact get the results in the order defined by the ORDER BY within the view. In SQL Server 2005, changes to the optimizer "broke" this behavior, and this was considered a bug by a large segment of the SQL Server community. I don't really consider this broken because the observed behavior was not documented (never mind guaranteed).
In any case, one rather large customer of Microsoft, with plenty of leverage against the SQL Server team, made such a fuss about this change in behavior that a trace flag was introduced to instruct the optimizer to obey the ORDER BY in the view when returning rows. (You can get a lot more information about this trace flag in KB #926292.) Should you use this trace flag? In my opinion, no — I think the squeaky wheel was just too lazy to fix their application code so that it properly dictated the order in which it wanted its results. Should you be aware of what can happen when you rely on undocumented behavior, and that behavior changes? Absolutely.
Ordering without an ORDER BY
Aside from the view case above, there is a more general myth out there that SQL Server will always return rows in a specific order – the myth varies from person to person, but it is either the order of the clustered index, or the order of the IDENTITY column, or the chronological order of insertion. (Similarly, the assumption is that using TOP without an ORDER BY should have a predictable meaning; as with ordering results, it does not.) Often people come to this conclusion because, in a limited sample size, that's the behavior they "always" see. I want to make it quite clear: ordering is arbitrary unless you use an ORDER BY clause. The optimizer is free to find the most efficient way to return the results to you, which can mean any ordering whatsoever.
I see a lot of cases where people expect SQL Server to short-circuit their clauses — basically, read the conditions from left to right (or top to bottom), and when any of the criteria returns false, don't bother evaluating any subsequent conditions. An example is where you have chosen to store DATETIME data in a VARCHAR column, then try to evaluate some DATETIME aspect — but only after determining which values really are dates (this pattern of poor data type choices is another really common bad habit, by the way). So you see queries like this:
CREATE TABLE dbo.foo(bar VARCHAR(20)); INSERT dbo.foo(bar) SELECT '20090201' UNION ALL SELECT 'last week' UNION ALL SELECT 'next Tuesday'; SELECT bar FROM dbo.foo WHERE ISDATE(bar) = 1 AND DATEPART(MONTH, bar) = 2; DROP TABLE dbo.foo;
Conversion failed when converting date and/or time from character string.
The truth is, SQL Server is free to evaluate the conditions in any order, so it doesn't necessarily have to first check the result of ISDATE() before trying to treat the data like DATETIME values. And like with the ORDER BY cases above, just because you observe a specific behavior today, does not mean you will see the same behavior tomorrow — many things can change the way SQL Server processes a query. So how do you get around it? Well, there are a few ways. The most obvious one in this specific case is, using a DATETIME column to store DATETIME data. But this isn't always possible, and it is not always the issue in this scenario anyway — it was just the handiest example I could come up with. The next thing people try, is to use a sub-query to first return only the rows that have DATETIME values:
SELECT bar FROM ( SELECT bar FROM dbo.foo WHERE ISDATE(bar) = 1 ) AS x WHERE DATEPART(MONTH, bar) = 2;
The result: same error (and same happens with a CTE). It's like the optimizer is reading your mind and knows you're trying to trick it; it still sees that there are rows in the table that would not pass the DATEPART check, but is evaluating things in the wrong order.
So how do we get this "short circuiting" to really work? The CASE expression is your friend (and thanks Alex for the improved syntax suggestion):
SELECT bar FROM dbo.foo WHERE CASE WHEN ISDATE(bar) = 1 THEN DATEPART(MONTH, bar) END = 2;
I'm sure I've seen this solved in other clever ways before, and that someone will remind me how to really make this short circuit, but on a Monday night, this is the only workaround I know is guaranteed to work. In some cases, SQL Server *will* short-circuit (see the comments from Nigel Ellis about 2/3 through this TechNet chat transcript), and you can see some working examples from Mark Cohen. My point, though, remains: feel free to take advantage of these optimizations, but do not rely on them.
You're probably thinking, wow, that's a strange one to mention. sp_who2 is undocumented and unsupported? Absolutely. Try to find sp_who2 in any current version of Books Online. I reminded them that there was a reference to sp_who2 in the 2005 version of Books Online in Connect #207997 back in September of 2006. Shortly thereafter, the last remaining trace of this procedure was removed. I also asked for a supported and documented version in March of 2007, in Connect #264681. Don't hold your breath, because all sp_* procedure development has been abandoned (which is why a lot of the sp_help style procedures haven't been updated to reflect new 2008 features). I am also not trying to scare you into believing that sp_who2 will suddenly stop working in the next release or service pack – I think it is pretty much here to stay. But with the advancements in performance tuning and troubleshooting, I question if it is of any value to use these days at all. Instead, there are many alternatives. Adam Machanic has written a great replacement with more flexibility and much more useful information than the new stuff (sp_WhoIsActive). I also documented a few approaches to querying the DMVs for certain information in Chapter 29 of SQL Server MVP Deep Dives, entitled, "My Favorite DMVs and Why."
xp_fileexist / xp_getfiledetails
The writing is on the wall: the days of extended procedures are numbered, especially those that are undocumented. Writing code today that relies on these modules being available in future versions of SQL Server is a little like Russian roulette. The last thing you want to do during a SQL Server upgrade is worry about breaking code that accesses the file system. Thankfully, there are many tutorials out there that will help you bring CLR to the rescue, in the event that you really need to perform file handling from within SQL Server. Greg Larsen does a pretty good job of priming you to write your own xp_getfiledetails.
Sadly, I am just skimming the surface on features and behaviors that are undocumented and/or undefined. There are literally dozens and dozens of undocumented stored procedures and extended procedures, and many engine behaviors that we all take for granted. There are probably several behaviors I rely on every day that I don't realize are undocumented… it is definitely a difficult habit to avoid altogether. But the realization that you *might* be using undocumented methods, or making assumptions based on observed and not guaranteed behavior, is often a good chunk of the battle.
See the full index.