December 14, 2010 | SQL Server

The old "do as I say, not as I do" problem

Microsoft is often considered a leader, an innovator, a trend-setter.  The same could be said for Apple, Google, and a host of other tech companies.  And each of those has its set of critics as well, who think that the company is the opposite – or worse.  Some people think it is a good idea to model their own code, architecture or applications after things that these companies have done, but this is not always the best approach.  Humans work at these companies too, and everyone is prone to mistakes, even when there are 40 sets of eyes that review a piece of code before it ships.

I am a consumer of many vendors' products, but Microsoft's quite easily are the ones with which I am most familiar.  Even being an MVP, I don't have access to all of the source code for all of the products, but I do have a tendency to tinker with SQL Server stuff.  Again, not source code, per se, but definitely a lot of the objects that get created in SQL Server system databases.

In a recent private discussion with some MVPs, we were talking about sp_who2 – how it shows 'RUNNABLE' for the spid that called it, even though that is clearly not accurate.  When I was looking through the code for sp_who2, I noticed the final SELECT (against a #temp table) had the following ORDER BY, but commented out:

-- (Seems always auto sorted.)   order by spid_sort

Implying that commenting out the ORDER BY has no impact on the order of the results.  This is a terrible, terrible, terrible myth to perpetuate.  As many SQL Server experts (and many folks within Microsoft) will tell you: if you, your users, or your applications are assuming and expecting a certain sort order, use an ORDER BY clause.  In this case, the word "Seems" should be a clue.  What is the point of commenting out this ORDER BY clause?  All this does is open the door to unexpected behavior should the code above it change (or should the optimizer behave differently in the future, which can happen for a variety of reasons).  Once I pointed this out to a couple of Microsoft folks, there's a good chance this will be fixed in the Denali time frame.  But anyone looking at the code in the meantime will just have further "proof" that the myth is real.

Just because you observe a certain order when you run the query once (or a hundred times) does not mean that that ordering is guaranteed.  Trust me, this will burn you eventually if you keep relying on it!  In my opinion, you should ALWAYS, ALWAYS, ALWAYS use an ORDER BY clause from the get-go, even if you don't care how the data is ordered.  Without it, certain behaviors will be assumed, and if you later add ordering or the native (undefined!) ordering changes for whatever reason, you'll be fixing it somehow.  If you want the ordering to be random, then do that: ORDER BY NEWID() or some similar construct.  If you want the ordering to follow the clustered index, then say so: ORDER BY <clustered index column(s)>.

If you still think that relying on undefined ordering is okay, please go back and read my bad habit post about ORDER BY in a view, and also see Dave Ballantyne's blog post where he reminds us that Without "ORDER BY", order is not guaranteed.  Also Alex Kuznetsov has two great and to-the-point posts about this: Without ORDER BY, there is no default sort order and Uncertainty erodes confidence, so add that ORDER BY clause.

9 comments on this post

    • SQLDenis - December 14, 2010, 4:25 PM

      Aaron,
      In can tell you for a fact I have seen sp_who2 not return results ordered by SPID.
      I always notice it when there are 400+ connections, it won't always start with SPID 1, I see it start with high 200s up to 400 then from 1 to 200

    • mjswart - December 14, 2010, 5:00 PM

      In sql 2008, I count 10 sprocs/functions in the resource database with comments containing seems. Each one makes me raise an eyebrow.

    • AaronBertrand - December 14, 2010, 5:27 PM

      Yes, I've seen it before too Denis.  And Michael, thanks, it is not the only instance of this bizarre comment, but it is probably one of the oldest.  πŸ™‚
      A

    • Paul White - December 14, 2010, 6:05 PM

      Aaron,
      What do you mean "…even though that is clearly not accurate."?  It is accurate – it's just not showing what you were expecting!
      The (undocumented) procedure sp_who2 is calling OpenRowSet(TABLE SYSPROCESSES) at the point information is captured, which happens to executes in a separate TSQL context.  So, while the user session is waiting for that call to complete, it is indeed RUNNABLE, and not RUNNING.
      This is true for anything that calls sys.sysprocesses or sys.sysprocesses_ex (the views that wrap the SYSPROCESSES call), so it's not peculiar to sp_who2.
      The equivalent sTVF for the sys.dm_exec_requests view (OpenRowset(TABLE SYSREQUESTS)) executes in the user context, so it shows RUNNING, as you would expect.
      Much of the Microsoft code (even the documented stuff) leaves much to be desired, as even a quick look reveals.  The commented-out ORDER BY is a classic example.
      Nevertheless, I cannot agree with your advice to ALWAYS, ALWAYS, ALWAYS use an ORDER BY clause.  'Almost always' would have sufficed.
      Last thing, I cannot believe Adam hasn't mentioned (his award-winning) sp_WhoIsActive in the comments yet πŸ™‚
      Paul

    • AaronBertrand - December 14, 2010, 6:12 PM

      I disagree Paul.  The user shouldn't need to understand the inner mechanics of a process to be confused by the fact that their SPID is returning data to them but still says "runnable."
      And can you give some examples of where you recommend AGAINST an ORDER BY clause?

    • Paul White - December 14, 2010, 6:32 PM

      Sure, it's an interesting implementation quirk – it just seems to me that it's rather too accurate for your liking.  Perhaps it would have been better to refer to the behaviour as confusing πŸ™‚
      Almost always use an ORDER BY clause, unless doing so would be counter-productive, and you fully understand the implications of omitting it.
      If I really don't care about returned order, and the addition an ORDER BY would result in an unnecessary sort, or adds a new hidden dependency for no gain, or produces a daft plan, I will omit it.
      My personal gripe is with the use of TOP without an ORDER BY.  Another case of 'almost always' (though TOP OVER (ORDER BY…)syntax would be even better).
      Paul

    • AaronBertrand - December 14, 2010, 6:47 PM

      The person writing the code is not the only person who cares or doesn't care about returned order.  If you have anyone else using the output, they are most certainly going to complain if the behavior changes.  I would say an ORDER BY that doesn't result in any extra work (or even if it is a little extra work) is going to be better than knowingly producing unpredictable results (even if, most of the time, they "seem" to order in a specific way).  Results being sent to an end user should not be so large that a predictable sort will make or break the performance expectations of the application.  And I am trying to think of real-world examples where you truly don't care about the order – I can't think of a single result in my 13-year career where a known and defined order wasn't helpful.

    • Paul White - December 14, 2010, 7:20 PM

      In my (16-year) career I have come across plenty of examples where an explicit ORDER BY would have been dumb.  Not all query results are small, or destined for display to an end user!
      I find it difficult to credit that you have never written a query to feed a batch process or mid-tier component that required a set (in the relational sense) as input.  You've never written a query to feed a client application that sorts data for display in complex ways locally, for which any database-supplied order is irrelevant?
      You might add an ORDER BY that seems for-free and harmless today, but what about when the statistics change, and the optimizer chooses a different plan shape (perhaps a hash join) that no longer preserves order?
      What if your query suddenly starts running in parallel and the merging exchanges required to respect the (pointless) presentation order result in parallelism deadlocks?
      What will you do when paged because a new index changed the plan resulting in a huge mid-plan sort and application time-outs?
      So, I stand by what I wrote:
      "Almost always use an ORDER BY clause, unless doing so would be counter-productive, and you fully understand the implications of omitting it."
      There are very few guarantees about intra-operator ordering anyway, so relying on order provided by an operator earlier in the plan to avoid a sort is no more safe than relying on observed sort orders without ORDER BY πŸ˜‰
      Paul

    • GrumpyOldDba - December 15, 2010, 1:01 PM

      I have an example of where an order by is not required – I have  reports in SSRS which return I guess you'd say, multiple facets of the data which you may want to order by to see different views of that data – ordering will highlight certain features. For this reason the queries do not have order by statements as SSRS allows you to see the data how you want to see it. I hope that makes sense?
      as an aside to sp_who2  I never use procs, always using queries to look at sessions/connections because i can change the order by to suit whatever I want to see at that time, by blocking spid, by database, by hostname, by user, by last session etc. etc.
      Another interesting post, thanks.

Comments are closed.