Bad Habits to Kick : ORDER BY ordinal
See the full index.
A few weeks ago, I wrote a post about forming a new habit: always terminating statements with semi-colons. To continue this series, today's topic is using ordinal numbers in our
ORDER BY clauses.
At least once a week, I catch myself using ordinal position to define order. For example, I might have this:
SELECT foo, bar = COUNT(*) FROM dbo.splunge ORDER BY 1, 2 DESC;
This is lazy shorthand, and is bound to get screwed up at some point, unless I have encrypted the stored procedure and thrown away the key. Invariably, someone at some point is going to come in and add a column, remove a column, or change the order of the columns, and the order is going to be wrong. Incorrect order may not be a big deal, but it can be disastrous.
Remember that a lot of us produce stored procedures to be consumed by essentially "dumb" clients – Windows forms or web applications that lack the ability to perform any ordering on their own. So often the
ORDER BY clause of a resultset coming back from a stored procedure is pretty important. When editing a stored procedure that has an error-prone
ORDER BY clause like the above, it is very easy to make changes to the
SELECT list without even looking at the
ORDER BY clause, or the reverse. Imagine the web developer using the above procedure asks to change the
SELECT list. Now he wants
mort, foo, bar.
Now, assume you do exactly as the developer asks, and edit the
SELECT clause and alter the procedure:
SELECT mort, foo, bar = COUNT(*) FROM dbo.splunge ORDER BY 1, 2 DESC;
Oops! Now the developer is complaining that his UI is broken because the order is all messed up. This would not have happened if you had used the following syntax in the first place:
SELECT mort, foo, bar = COUNT(*) FROM dbo.splunge ORDER BY foo, bar DESC;
I know it is cumbersome to spell out those column names when you're doing ad hoc stuff, but getting in the habit will potentially save your bacon someday. And you can't use the excuse that you have very complicated expressions in your SELECT list, because you can reference the alias in the
ORDER BY. As above, the following is completely legal:
SELECT blat = RTRIM(CONVERT(int, ABS(foo) % 2)) FROM dbo.splunge ORDER BY blat;
There is nothing lost by being explicit, except for a few keystrokes. But there is plenty to lose by not doing so – including some patience for the developer running to your cube like their head is on fire. Ideally, when you are making changes to existing code, you examine the entire statement to ensure consistency. But try to be proactive and protect yourself against the inevitable rushed and possibly botched emergency fix.
See the full index.
To each their own. IMO. your method mostly works, but mine is better. 🙂 Ultimately, it's a matter of preference, convenience, and ease of use. All of which are subjective.
Sorry Brian, I understand your reasoning to some extent, but I'm not sold. it still seems to me that the primary benefit is avoiding typing 8 characters. I value consistency over typing efficiency and total number of characters in a module. <shrug>
"avoiding the strenuous work of typing 8 characters is not of value to me"
Arg! There goes my good argument! 🙂
Redundancy annoys me greatly. It's not the eight characters, it's eight unneeded characters. The pain grows geometrically which each succeeding characters.
But, enough of that, the benefits as i see them, are two-fold.
1) To me, TABLEs are objects (or classes). A record in a TABLE is an instantiation of that object. Within that paradigm, when FKing to a TABLE, i want to object mentioned, not the technical detail of which COLUMN it is.
I find this paradigm helps people understand relationships in between TABLEs, put COLUMNs (attributes) in the correct TABLE, and write more intelligent queries.
2) Having Id, Name, Description as standard names for every object (where applicable) makes it very easy to remember what i need. The id is always "id", the name is always "name", the description is always "description".
Ad-hoc gets really easy:
SELECT * FROM Customer WHERE Id = 42;
SELECT * FROM People WHERE Name = 'Aaron Bertrand';
SELECT * FROM Product WHERE Description IS NULL;
If the Id COLUMN is always <table-name>ID, that would also work.
Though, in two ad-hoc cases even this is slightly inconvenient.
1) Many times i call TABLEs by their SYNONYMs because of long names of the TABLE. For a quick example, i may alias Customer to Cust.
If i use the SYNONYM, i have to remember that my <table-name>ID convention won't work on a SYNONYM: SELECT * FROM Cust WHERE CustID = 42; fails. However, SELECT * FROM Cust WHERE Id = 42; will always work.
2) During development and debugging, often i have to search multiple TABLEs. It becomes very convenient of all lookup TABLEs are only different to query by TABLE name and not also COLUMN name.
SELECT * FROM Customer WHERE Id = 42;
SELECT * FROM Product WHERE Id = 17;
SELECT * FROM People WHERE Id = 14;
It's just so much easier.
I guess my point is more along these lines:
In order to get someone to change their ways, there has to be something in it *for them*. I think I've demonstrated cases where having the same column name in the same database to mean different things could cause problems. What problem is caused by always calling a Customer's ID CustomerID? Or on the flip side, what problem is solved by calling it a more generic "ID"? I will certainly consider the value of your convention if you can tell me what I gain by using "ID" instead of "CustomerID" in that case. And no, avoiding the strenuous work of typing 8 characters is not of value to me, in case you're wondering.
Brian, I used to do it exactly as you do, for the same reason. I believe I even once said to a co-worker, "You don't need to say CustomerID because it's so obvious it's a Customer we're talking about; it's in the Customers table! Duh!" Later on as I was writing and maintaining more complex queries involving these objects, I found that when the column was explicitly named in all cases, it just led to more self-documenting code, and much less scrolling to figure out which ID that one is. Adding the fact that I had to change c.ID to CustomerID to make the resultset make sense just added to the pain.
I am all for changing my own conventions when it prevents *me* from shooting myself in the foot. None of us is always at 100%. Anyway, as for the "value," I see no value in it for me, to stop using Customers.CustomerID and switch to a less explicit Customers.ID. So don't expect to see it in any of my code examples in the future. 🙂
Yes, i meant no value *to me*. Good point.
If the other TABLE will always use Customer, the benefit is the same. The only difference is that in Customer the COLUMN is called Id, everywhere else it is called Customer. Having the same name is only different in that the parent will have the same name the child uses.
As for qualifying, if the query writer chooses a bad alias that's his problem. I don't believe is changing the COLUMN name so the user cannot shoot himself in the foot. That is his prerogative.
Brian, I'm not sure why you see no value in columns having the same name throughout the data model. Not only do you always know you'll be calling something a CustomerID, it also makes it very easy to find every reference to it in the metadata. Maybe that's not valuable *TO YOU* but that does not mean it has no value in general.
Qualifying the table names is part of it, but when you say:
SELECT o.ID, c.ID FROM Orders AS o INNER JOIN Customers AS c …
Your resultset still has two columns, both named "ID". This can cause problems if you need to then put the result inside a CTE, or select the results into a table, or need distinct column names in the consumer (reporting services, home-grown applications, etc). You can alias, sure, but then what have you gained? Now instead of typing:
You are typing:
o.ID AS OrderID
OrderID = o.ID
To me this is just another case where being explicit outweighs the cost of typing those extra characters.
As for "understanding the model," we don't always have that luxury. Ever outsource? Do you think they will ever have a full understanding of the model before they are allowed to produce any work? Fat chance; the bean counters need results too.
Yeah, these are preferences.
To the point:
1) see no value in COLUMNs having the same name.
2) treat TABLEs as objects, and prefer referring to the object rather than an attribute
3) qualify all COLUMN names in a query that has multiple TABLEs, so there is never any confusion.
4) don't want somebody who "quickly understands" the model anywhere near it. That approach is error prone, and usually quite inefficient.
Brian, the reason I prefer ApplicationID over Id is so that every column in the entire data dictionary that represents that column is named the exact same way. This way, among other things, I never have multiple columns named "Id" coming back in a join, and have to figure out which is which. And I can immediately understand this without first having to "know the model."
But again, these are preferences. If Id makes more sense for your applications, by all means, go ahead. Maybe I should start another series focused on "why these preferences are better than those preferences"? 🙂
I don't have GUIDs where one TABLE is the master. The idea itself raises a "red flag" for me. Of course, it could be justified, i just personally have never seen the case in the applications i have worked on.
When not constrained by company standards, i use the TABLE name as the FK COLUMN Name. For example:
Application: Id, Name, Description, Vendor, Licenses
Application_Version: Application, Version
In this case, Application_Version.Application is an FK to Application.Id. It is obvious, as Id is the PK.
If the same COLUMN is pulled twice, i add underscore and decription. For example:
Application: Id, Name, Description, Vendor, Licenses
Application_Version: Application, Version, Application_Replaced
Here, both Application and Application_Replaced FK to the same COLUMN, but the second which is an attribute and not a child gets an underscore and a description which makes it obvious to anyone who knows the model. (To get the know the model there is documentation.)
The point is, this is obvious to anyone using it, and thus becomes easy to types as well.
Brian, in your previous post you said, "Why do i use ordinals at all? Because of annoying naming conventions. FROM Customer ORDER BY Id, is easy to remember (and type!). FROM Customer ORDER BY CustomerId is redundant (annoying and more typo prone) and hard to remember (underscores this time? How about vowels?). FROM MCustomer_TBL ORDER BY X234_CustomerID_Q is hard to remember what it is called at all, redundant, and hard to remeber formatting."
Formatting aside, what happens if you have more than one column that is an id (typically a foreign key)..GUIDs? I deal with tables that do not include the actual text in the tables themselves, rather a GUID reference to a master values table. I would never shorthand my column names because the next guy that comes along will have to interpret what is written. I agree with Aaron, always best to type them out or use the ObjectBrowser..or Intellisense. One thing I've tried to remind myself of is that the time I may save by taking shortcuts may cost the next guy more time the next pass through. So, its a wash, if not more costly.
Aaron, you are right. The drag is faster.
Though, i tend to be a keyboard user. So, i find alt-tab, ctrl-f, arrows, shift-end, crtl-c (ctrl-insert), alt-tab, ctrl-v (shift-insert) to be much faster and more convenient than f8, reach for mouse, move mouse over section, find scroll bar location with mouse and move mouse there, click, drag, let go, find table, expand table, expand columns, drag column into editor, alt-w h.
But it is a matter of conveniece, and each user's experience should be quite different.
Brian, even faster than notepad would be to drag the column name from Object Explorer. Just expand the Columns node for the table and click and drag into the query editor window. I mentioned a similar tip for grabbing *all* the columns from the top node in my post about SELECT *:
Other than in ad-hoc queries (and UNION ALLs) i do not use ordinals in an ORDER BY. Why do i use ordinals at all? Because of annoying naming conventions. FROM Customer ORDER BY Id, is easy to remember (and type!). FROM Customer ORDER BY CustomerId is redundant (annoying and more typo prone) and hard to remember (underscores this time? How about vowels?). FROM MCustomer_TBL ORDER BY X234_CustomerID_Q is hard to remember what it is called at all, redundant, and hard to remeber formatting.
So, i use ordinals. When it's real code, i open up notepad with the TABLE definition and copy&paste (faster than using the mouse withe the editor.)
dd, that seems ambiguous to me. Do you have to order the columns in the select list in the same way that you want the order of the rows? E.g. should SQL Server sort that statement by foo first, then bar desc, just because that is the order in the select list? What if I want to return foo, bar in that order in the select list, but ordering of the rows should be first by bar and then by foo?Also, it seems cumbersome to have to add SORT 15 times. It also precludes the use of conditional ordering.
Why have a separate order by clause at all (unless the ordering columns does not appear in the select list)?
SELECT foo SORT, bar SORT DESC
Even without 3rd party tools, a column list is far too easy to get. In Object Explorer, expand your table, and drag the "columns" node onto the query window. They're probably not formatted the way you want them, but reformatting is less work than typing.
I always find the use of ordinals in the order by interesting because I have never used them and didn't even know you could until I saw it in some vendor application.
I'd never do it. I have to admit to not using a column list for inserts though. Bad habit, I know and I will work on it. Of course if you use SQLPrompt it does it for you, so I do it less now.
Hear Hear say I. And in SQL 2008, intellisense will in many cases complete the column names for you so you have even less excuse to use ordinals. Even when just hacking a query for a quick report, it still makes sense to avoid ordinals as by using them you are introducing an easy place to put a hard to spot bug.
My other peeves are with using SELECT * and with not naming the columns involved in an INSERT…SELECT
INSERT INTO some_table
SELECT foo, bar FROM some_other_table
Add a column to some_table and you're potentially breaking your code. To me, it's lazy not to be explicit with either ORDER BY or with INSERT…SELECT