Bad habits to kick : using table aliases like (a, b, c) or (t1, t2, t3)
In my last post in this series, I talked about using old-style JOINs. Today I'd like to touch on using aliases. No, I don't mean fake passports and a life of crime, I mean using an alias as shorthand for referencing a table or view in a query.
Some code I once had to clean up on an inherited system looked a lot like this:
SELECT * FROM Orders a INNER JOIN OrderDetails b ON < ... > INNER JOIN SalesPerson c ON < ... > INNER JOIN Customers d ON < ... > INNER JOIN Addresses e ON < ... > WHERE < ... a bunch of criteria against the a, b, c, d, e aliases ... >;
I'm not talking about one query here. I am talking about dozens and dozens of stored procedures all coded like this. And not even consistently… in the same procedure, OrderDetails could be aliased as "a" in one query, "e" in another, and "c" in two others. I don't like this kind of convention one bit. For one, it makes the code very hard to maintain, because when you are looking at the filter criteria, you constantly have to scan up in the query to see what table or view the "e" alias refers to *this time*…
Some people will argue that applying some kind of order or sequence to the tables in the join might make sense, especially when the order does matter (for example when you mix in OUTER JOIN or CROSS APPLY operations). But those details, if they are not apparent enough from the code itself, should be conveyed via comments. If some of those rules change, nobody wants to go back and change all your b aliases to c and c to b, for example, if those two tables must be joined in the opposite order for some reason.
So, what is a better way to do it? In my opinion, the alias should be between 1-3 letters, and give some sort of indication of which table or view it is referencing. So taking the above example, I would re-write it as follows:
SELECT < ... column list ... > FROM dbo.Orders AS o INNER JOIN dbo.OrderDetails AS od ON < ... > INNER JOIN dbo.SalesPerson AS sp ON < ... > INNER JOIN dbo.Customers AS c ON < ... > INNER JOIN dbo.Addresses AS a ON < ... > WHERE < ... >;
Note that I would not leave it as a SELECT * query. And since I have adopted the schema concept for a lot of my work, I always add the dbo. prefix where other developers may have forgotten. Also, I have been getting into the habit of using AS to alias my tables ad views, just to be more standards-compliant; as explained in previous posts, this isn't something I feel strongly enough to go revamp my entire codebase, but I certainly strive to add in the AS keyword whenever I am visiting or maintaining existing code. (I still don't like aliasing *columns* using AS. But that's a story for another day.)
If you have a very complex query, it can get a little tougher to stick to any sensible rules, because if you have three tables or views that all start with c, you need to get more creative. An option is to not alias the ones with shorter names, and reference them directly. Or at least take their two- or three- or four-part name and alias it as just the table name itself. I see many queries like this (probably the product of one of the
crappy crutches visual designers in SSMS):
SELECT DATABASE1.dbo.Customers.key_column, DATABASE1.dbo.Customers.Column2, DATABASE2.dbo.Orders.Column3, DATABASE2.dbo.Orders.Column4 FROM DATABASE1.dbo.Customers INNER JOIN DATABASE2.dbo.Orders ON DATABASE1.dbo.Customers.key_column = DATABASE2.dbo.Orders.key_column;
Blecch! Imagine that the query is more complex, and there are other tables or views that start with C, O, etc. In that case, I would much rather see:
SELECT Customers.key_column, Customers.Column2, Orders.Column3, Orders.Column4 FROM DATABASE1.dbo.Customers AS Customers INNER JOIN DATABASE2.dbo.Orders AS Orders ON Customers.key_column = Orders.key_column;
Or, of course, if these really are the only two tables in the query:
SELECT c.key_column, c.Column2, o.Column3, o.Column4 FROM DATABASE1.dbo.Customers AS c INNER JOIN DATABASE2.dbo.Orders AS o ON c.key_column = o.key_column;
The queries don't make sense logically, but I was just trying to quickly illustrate the point that 1-3 letter aliases are not always appropriate. But please, don't take this as a suggestion to not use aliases at all, regardless of complexity. I love using aliases and I think they make queries much simpler looking and easier to maintain than they could potentially be otherwise.
I will confess that, while I always use sensible alias names for tables and views in all of my JOINs, I am guilty of using single letters like x, y and z for things like CTEs or subqueries (usually when I am not referencing them in the final result). I am trying to get better at this, and noticing my own habit is actually what spurred this post. So please don't take it like I am wagging my fingers at everyone; I am right there with you on some of my bad habits, shortcuts and other lazy behavior. When I have actually blogged about it, this gives me much less of an excuse to tolerate seeing it in my own code.
I am working on a series of "Bad habits to kick" articles, in an effort to motivate people to drop some of the things that I hate to see when I inherit code. Up next: Declaring VARCHAR without (length).