Bad habits to kick : using old-style JOINs
In my last post in this series, I talked about using simple loops to populate large tables. This time I'd like to focus on getting rid of old, ANSI-89 joins.
I am sure most veterans know better than to use old ANSI-89 JOIN syntax, such as:
SELECT o.OrderID, od.ProductID FROM dbo.Orders AS o, dbo.OrderDetails AS od WHERE o.OrderDate >= '20091001' AND o.OrderID = od.ProductID;
One reason to avoid this syntax is that the query is often less readable than when you separate the joining criteria from the filter criteria. For example, the above query re-written to use a "proper" join becomes:
SELECT o.OrderID, od.ProductID FROM dbo.Orders AS o INNER JOIN dbo.OrderDetails AS od ON o.OrderID = od.ProductID WHERE o.OrderDate >= '20091001';
This is subjective, of course, but seems to be a consensus. Now, bring in four or five more tables to the query, and then try to debug when you are not getting the expected number of rows. When the joined-in tables are grouped together with their joining criteria, commenting them out or changing the criteria becomes a much simpler task.
The OUTER JOIN version of this syntax (which uses *=, =*) is discontinued in SQL Server 2012, and has been deprecated since SQL Server 2008, so there is another reason to avoid the syntax in general. Many people have memorized which is which, but deprecation aside, isn't it better to explicitly state what you are doing? For example:
SELECT p.ProductName, p.ProductID FROM dbo.Products AS p LEFT OUTER JOIN dbo.OrderDetails AS od ON p.ProductID = od.ProductID WHERE od.ProductID IS NULL;
There are also documented cases where the old-style outer joins produce incorrect results depending on the ordering of evaluation of the "filter" criteria (which happened to include the joining criteria). Largely a moot point since *= / =* is deprecated, but I wanted to point it out all the same. After all, since I know many of us will be on 2000, 2005 or 2008 for some time, we *could* continue using this syntax if we wanted to. But I don't think we should.
Are you sold yet? No? Well, there is an even bigger potential problem in using the implicit INNER JOIN syntax. What if you forget the JOIN criteria in the WHERE clause? You've turned it into a CROSS JOIN (otherwise known as the Cartesian Product). Take the first query again, and leave out the last line:
SELECT o.OrderID, od.ProductID FROM dbo.Orders AS o, dbo.OrderDetails AS od WHERE o.OrderDate >= '20091001';
For every single order after October 1st, you're going to get a copy of every single row from OrderDetails *for all of time* since you've lost the joining criteria *and* the filtering criteria for that table. This could be a disastrous mistake on a production system, and while it can be caught relatively quickly after deployment and testing, it can be prevented entirely at compile time, since an error will be thrown if you leave out the ON clause for an INNER JOIN. Sure, you can still get your join criteria wrong and push the change without noticing (for example, ON o.OrderID = o.OrderID), but with the newer syntax, there's one less thing that is likely to go wrong.
My biggest beef about trying to help people kick these habits is that Microsoft themselves keep publishing sample code that promotes the syntax. Again, this isn't so much a problem for the veterans (except those with insurmountable inertia), who fully understand how JOINs work, and know the history of ANSI-89 and ANSI-92. But for someone brand new to SQL Server, they will pick up a current copy of Books Online, and see sample queries that enforce these coding standards. The other day, I complained about this on Connect, when I came across yet another example of old-style joins in the SQL Server 2008 version of Books Online. They quickly agreed that, when they review the documentation for 2008 R2, they will update any code samples they deem "suspect."
I am working on a series of "Bad habits to kick" articles, in an effort to motivate people to drop some of the things that I hate to see when I inherit code. Up next: using table aliases like (a, b, c) or (t1, t2, t3).