Bad Habits to Kick: Old-style JOINs
October 8th, 200930
Bad Habits to Kick: Old-style JOINs
October 8th, 200930
 
 
This is part of more than a decade of posts involving Bad Habits and Best Practices.
See the full index.

I am sure most veterans know better than to use old ANSI-89 JOIN syntax, such as:

SELECT o.OrderID, od.ProductID
  FROM dbo.Orders AS o, dbo.OrderDetails AS od
  WHERE o.OrderDate >= '20091001'
  AND o.OrderID = od.ProductID;

One reason to avoid this syntax is that the query is often less readable than when you separate the join criteria from the filter criteria. For example, the above query re-written to use a "proper" join becomes:

SELECT o.OrderID, od.ProductID
  FROM dbo.Orders AS o
  INNER JOIN dbo.OrderDetails AS od
  ON o.OrderID = od.ProductID
  WHERE o.OrderDate >= '20091001';

This is subjective, of course, but seems to have broad support in my circles.

Now, bring in four or five more tables to the query, and then try to debug when you are not getting the expected number of rows. When tables are grouped together with their joining criteria, commenting clauses out or changing the criteria becomes a much simpler task.

The OUTER JOIN version of this syntax (which uses *= and =* syntax) is discontinued in SQL Server 2012, and has been deprecated since SQL Server 2008, so there is another reason to avoid the syntax in general. Many people have memorized which is which, but deprecation aside, isn't it better to explicitly state what you are doing? For example:

SELECT p.ProductName, p.ProductID
  FROM dbo.Products AS p
  LEFT OUTER JOIN dbo.OrderDetails AS od
  ON p.ProductID = od.ProductID
  WHERE od.ProductID IS NULL;

There are also documented cases where the old-style outer joins produce incorrect results depending on the ordering of evaluation of the "filter" criteria (which happened to include the joining criteria). Largely a moot point now, since the old outer join syntax is deprecated, but I wanted to point it out all the same

Are you sold yet? No? Well, there is an even bigger potential problem in using the implicit syntax. What if you forget the join criteria? You've inadvertently turned it into a CROSS JOIN (otherwise known as the Cartesian Product). Take the first query again, and leave out the last line:

SELECT o.OrderID, od.ProductID
  FROM dbo.Orders AS o, dbo.OrderDetails AS od
  WHERE o.OrderDate >= '20091001';

For every single order after October 1st, you're going to get a copy of every single row from OrderDetails *for all of time* since you've lost the joining criteria *and* the filtering criteria for that table. This could be a disastrous mistake on a production system, and while it can be caught relatively quickly after deployment and testing, it could have been prevented entirely at compile time. The explicit INNER JOIN syntax requires an ON clause. Sure, you can still get your join criteria wrong and push the change without noticing (for example, ON o.OrderID = o.OrderID), but with the newer syntax, there's one less thing that is likely to go wrong.

My biggest beef about trying to help people kick these habits is that Microsoft themselves keep publishing sample code that promotes the syntax. Again, this isn't so much a problem for the veterans (except those with insurmountable inertia), who fully understand how joins work, and know the history of ANSI-89 and ANSI-92. However, for someone brand new to SQL Server, they will pick up a current copy of Books Online, and see sample queries that use these bad examples.

I once complained about this on () Connect, when I came across yet another example of old-style joins in the SQL Server 2008 version of Books Online. They quickly agreed that, when they review the documentation for 2008 R2, they will update any code samples they deem "suspect."

This is part of more than a decade of posts involving Bad Habits and Best Practices.
See the full index.
By: Aaron Bertrand

I am a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. I am a long-time Microsoft MVP, write at Simple Talk, SQLPerformance, and MSSQLTips, and have had the honor of speaking at more conferences than I can remember. In non-tech life, I am a husband, a father of two, a huge hockey and football fan, and my pronouns are he/him.

30 Responses

  1. philip says:

    The *=/=* syntax was never ANSI; it was from Oracle, it never had a clearly defined meaning when more than two tables were joined, and arose from a misapplication of a notation in a Chris Date article about outer joins.
    Something like ON was needed for proper outer join syntax. Coincidentally it made sense to use the same syntax for inner join, but since nested selects were added at the same time, ON was not needed for mixing outer and inner joins.
    Comma has never been deprecated. The other joins are still defined in the SQL standard in terms of it.
    Comma is just a different spelling for cross join with lower precedence. Arguments that comma is problematic are specious. (Especially considering how ridiculously & unnecessarily arcane and redundant the rest of select statement syntax and semantics is.)

  2. William says:

    I used to like the old style, but now I am a fan of the new style. I have been bitten with the accidental Cartesian join by the old style, and, with lots of tables in your query, the old style is very hard to trace.
    I find the formatting style below to be quite readable… (using tab chars that are 8 spaces wide):
    select a.col1,
          b.col2,
          c.col3
    from   table1 a
    join   table2 b    on a.col2 = b.col2
    left outer
    join   (
          select col3
          from   table3
          ) c         on b.col3 = c.col3
    where  a.col2 = '2'
    and    a.col3 = '3'
    order by a.col1, b.col2, c.col3

  3. Gordon Ellis says:

    I fully agree with ogrig, Brian, and Nate about the readability and clause separation of the old-style syntax. After all, logically, joins (inner joins anyway) ARE simply filtered cross joins, and as Sheila's post illustrates, there really is NO DIFFERENCE between a join condition and a filter condition. But I also agree with Aaron and others about the dangers of the old-style syntax.
    Personally, I hate the "new" style so much that I avoid it where possible, and clearly separate my joins and conditions in the where clause, as so:
    select
     a.SomeColumn,
     b.SomeOtherColumn
    from
     tableA a,
     tableB b,
    where
    — joins
     a.ID = b.a_ID and
    — conditions
     a.SomeValue > 10 and
     b.SomeString = 'abc'
    A much better solution in my opinion would be for a new syntax where the type of join is specified in the FROM clause, and the join conditions are expressed separately in a new JOIN clause, followed by filter conditions in the WHERE clause. Something like:
    select
     (whatever)
    from –these are your tables and how they're joined
     tableA a left
     tableB b inner
     tableC c cross
     tableD
    join –these are your join conditions (one condition or set of conditions for each join mentioned above)
     a.Field1 = b.Field1 and a.Field2 = b.Field2,
     a.Field3 > c.Field4
    where –these are your filter conditions
     a.Field5 = 0 and
     d.Field6 <> 'abc' and
     etc.
    The parser could then check to see that you provided join conditions that make sense for your join type, and throw an error if there is a mismatch (e.g. trying to do an inner or outer join without any conditions or with conditions that don't compare fields from both sides of the join, or trying to put conditions on a cross join, etc.)

  4. MatthewMcK says:

    I agree with Aaron,  but I find that given a choice I prefer to hire people that know what they are doing rather than a WTF? person. (their Uni. has failed them).  Not everything is development, and in some shops there are a lot of these queries. Maintenance matters.
    In any case after a few years with any model a boilerplate query will already exist for most scenarios. I have no preference, and we must know both, but ANSI style join does allow bad behaviour such as not having the selection criteria in the same order as the tables are named.
    Personally I like A,B,C,D,E etc. This should also be the way the DB designer thinks when modeling.  
    It disturbs me if I see someone drafting a query without drawing a picture first (not using an ERD).
    (SQLServer from 1993, DB2 from 1992, dabbler in Mapreduce)

  5. AaronBertrand says:

    @Nate ok, readability is quite subjective, and I am never going to convince everyone about any aspect of what makes code more or less readable – way too many preferences, styles, habits, etc.
    But what about the other points that even ogrig conceded (and Brian ignored)? Are you really going to continue using old-style joins just for readability, even after they're deprecated? Are you really going to insist that it makes sense to use old-style inner joins even though you can no longer use old-style outer joins? Are "readable" inner joins – prone to losing important filter criteria – really worth having forced inconsistency?
    I'm trying to provide advice that is most logical, especially for newer users who might be "learning" from legacy code or veterans. You run ignore that and run your shop however you like, of course. 🙂

  6. Nate Brunner says:

    I agree COMPLETELY with ogrig and Brian Tkatch. The "old school" is much cleaner and more readable.

  7. John says:

    So true about the outdated join syntax, expecially the deprecated outer joins.
    We have a couple of commercial products (same vendor) that persist in using the deprecated outer join syntax.  We brought this to the attention of the vendor over 5 years (yes, years) ago and they have yet to correct their source code.
    Thus, we are "locked in" to using SQL 2008 R2 because the only way we can get the application to run correctly is to set the database compatability level to "80" (SQL 2000) for these databases.  This database compatability level no longer exists in SQL 2012 so we cannot upgrade our SQL Servers until the vendor fixes their code. Oh well.

  8. Simon says:

    FYI the link to the deprecation of old-style joins is now outdated (it now points at features which will be cut from vNext after 2012)
    Correct link is now: http://msdn.microsoft.com/en-us/library/ms144262.aspx (Discontinued Database Engine Functionality in SQL Server 2012)

  9. Stephen says:

    I have to agree with Aaron on this as well.  Trying to decipher 5+ table joins in the old comma syntax is a hell I'd wish on no man.  
    We've had accidental Cartesian products in our application a few times due to the comma syntax which have resulted in 60,000+ row tables being Cartesian joined instead of Inner joined.  I've never had this problem as my intentions are clear when I use the newer syntax, even though I'm guilty of shorthanding JOIN and LEFT JOIN. 😛

  10. AaronBertrand says:

    Syed no, there isn't, but I think you may have missed the point of the article…

  11. Syed says:

    Is there any difference in the performance when using either of them.

  12. Edward says:

    I have to agree with Aaron and disagree with ogrig.  Had a query I inherited once with Joins to 6 tables and a total of ten criteria in the Where clause.  The query kept blowing up because one of the tables had changed structure.  Trying to trouble shoot the joins on that was a godawful nightmare.  When I finally finished re-writing it, it was very straight forward with my Five JOIN lines and their criteria separated from the 4 Where criteria.  Because of that I was able to then simplify the where criteria down to two because of the equi-join required that criteria equal to each other didn't have to both be listed in the Where clause as equalling the same value.

  13. Code Guy says:

    I asked a member of the ANSI SQL board why they went with this more complicated join clause.  He, (Joe Celko) said that the original syntax is in fact the correct method.  The Join syntax was for systems that didn't have the optimizers such as SQL CE and SQL Lite and was never intended to be used in the way we have stared using it. Microsoft got it confused and made it policy.  
    The Cartesian join issue is not much of a problem if you test your sql.
    He also said that they met for 2 days and decided that the language is produced SQL not Sequel.  I was greatful to have an answer to both of those questions.

  14. quizno says:

    so i guess i'm confused…. i, too, think the "x INNER JOIN y ON z" syntax is LESS readable. INNER joins are never the problem, it's the left and right joins that seem to frustrate. What is the "correct" syntax to use when wanting (x LEFT JOIN y) RIGHT JOIN z is wanted (e.g. the RIGHT JOIN is between table z and the result of the LEFT JOIN of tables x and y)?

  15. RBrown says:

    Just a quick comment about your first "proper" join.  The aliasing seems to be inconsistent with the comparison in the ON clause…
    SELECT o.OrderID, od.ProductID  FROM dbo.Orders AS o  INNER JOIN dbo.OrderDetails AS d  ON o.OrderID = od.ProductID  WHERE o.OrderDate >= '20091001';
    should be:
    SELECT o.OrderID, od.ProductID  FROM dbo.Orders AS o  INNER JOIN dbo.OrderDetails AS od  ON o.OrderID = od.ProductID  WHERE o.OrderDate >= '20091001';

  16. Brian Tkatch says:

    I agree with ogrig . ANSI-89 syntax is much easier to read. I despise the later syntax, and use it only if i have too.
    Besides the lack of readability, there is another issue with the later new-fangled syntax. It destroys clause separation. Traditionally, the FROM clause includes records, the WHERE clause excludes records. A join clause usually excludes records and never includes records. Hence, it belongs in the WHERE clause.

  17. Aaron Bertrand says:

    For the most part, by readable, I meant that it is much more visually obvious where the join criteria lives and where the filter criteria lives.  Obviously if you've used the old style your entire career then your point of view is different from someone who hasn't.  To me the new style is more readable, but only part of that is for the subjective reason.

  18. ogrig says:

    Sorry to disagree, "old style ANSI-89 joins" are NOT less readable, quite the opposite.
    They are non-standard, on the deprecation list (for a good reason), … and have other issues you would be more aware of than I am, but not less readable.
    It is you that got used to the new standard (not so new anymore:-) and are not confortable anymore with the old one.
    My "problem" with the new standard is proper formatting of the code.
    Let's take 3 ways of writing the same simple 3 tables query:
    — style 1
    SELECT  dt.someColumn
    FROM      tblInvoiceRun    ir
        JOIN tblInvoiceHeader hd ON ir.invoiceRunNo = hd.invoiceRunNo
        JOIN tblInvoiceDetail dt ON hd.invoiceNo    = dt.invoiceNo
    WHERE ir.invoiceRunNo = 666
    — style 2
    SELECT  dt.someColumn
    FROM      tblInvoiceRun    ir
        JOIN tblInvoiceHeader hd
        ON ir.invoiceRunNo = hd.invoiceRunNo
        JOIN tblInvoiceDetail dt
        ON hd.invoiceNo    = dt.invoiceNo
    WHERE ir.invoiceRunNo = 666
    — style 3
    SELECT  dt.someColumn
    FROM tblInvoiceRun    ir
        tblInvoiceHeader hd
        tblInvoiceDetail dt
    WHERE ir.invoiceRunNo = 666
    AND   ir.invoiceRunNo = hd.invoiceRunNo
    AND   hd.invoiceNo    = dt.invoiceNo
    style 1: clear separation of participants (tables) and rules (ON clause), but a bit of a pain to align properly and maintain (just think about adding a LEFT JOIN or multiple conditions in one of the ON clauses)
    style 2: I know this is your recommended style, but for me it is just messy. The participants and rules are interspersed, which means you cannot use a proper table-like alignament. Just unnecessarily hard to follow.
    style 3: very clear separation of participants (FROM clause) and rules (WHERE clause), very easy to write and align properly.
    Just to be clear: I am not saying that ANSI-89 is the style to use, quite the contrary. But there are enough valid reasons to change. The fact that you, me, or someone else's cat do not find it easy to the eye is NOT one of them.

  19. Jogo says:

    One feature of oracle join syntax that i really like is the "USING" clause.  If you want to join two tables on a set of columns with the same name in both tables, you just say:
    select * from table1 t1 join table2 t2
    USING (SharedIdColumn1)
    just so much clearer to read and less fiddly to write than
    select * from table1 join table2
    ON t1.SharedIdColum1 = t2.SharedIdColumn2
    apparently this is actually part of the ansi specification.  I wish microsoft would just go ahead and implement it.  Or maybe not part of ansi spec in which case I wish MS would just go ahead and "borrow" it from Oracle 🙂

  20. AaronBertrand says:

    Wild Rumpus, a lot of the "good" vs. "bad" habits are quite subjective, so I think it would be a stretch to expect a tool (particularly a free one) that will identify these things for you, never mind convert them.
    That said, there are some tools that Microsoft ships that will help identify some things, such as old-style outer joins and usage of other deprecated features.
    SQL Server 2008 Upgrade Advisor:
    http://microsoft.com/downloads/details.aspx?FamilyID=F5A6C5E9-4CD9-4E42-A21C-7291E7F0F852
    SQL Server 2005 Upgrade Advisor:
    http://microsoft.com/downloads/details.aspx?FamilyID=1470e86b-7e05-4322-a677-95ab44f12d75
    SQL Server 2005 Best Practices Analyzer:
    http://microsoft.com/downloads/details.aspx?FamilyID=DA0531E4-E94C-4991-82FA-F0E3FBD05E63

  21. Wild Rumpus says:

    I'm a software engineer, but my current focus is writing application code.  I totally understand where you're coming from, but my development team doesn't have a DBA, and a lot of us "coders" are from back-in-the-day.  Are there any FREE, AUTOMATED tools that we can run our scripts through to highlight where we could be switching to new, good habits?

  22. YoungerGuy says:

    I'm the younger guy coming to my company and with less experience but a
    lot fresher education and there in my company they do this kind of join statements all the time. It was a lot of "WTF" in the beginning, I could hardly believe my eyes. But the atmosphere got me and I'm guilty of writing this kind of statements. And another sin they do is to use cursors excessively and then they sometimes complain of a slow application? Why can it be so? He he!!

  23. RussellH says:

    @Glenn Berry
    I agree, and Oracle people shouldn't have an excuse because Oracle has supported the modern syntax since version 9i.

  24. AaronBertrand says:

    I guess it's semantics, Sheila, and you can decide for yourself whether the syntax you propose is more meaningful for you.  I prefer putting filter criteria in the WHERE clause, and join criteria in the ON clause.  The OrderDate has nothing to do with the join (in other words, the join will work the same way if you comment that line out, it will just return more rows), so I don't think it belongs there.

  25. Sheila Filteau says:

    What is your opinion on including the filter criteria with the join clause, when the filter is based on the value of a specific column? In your first example the WHERE would be an AND.
    SELECT o.OrderID, od.ProductID  
    FROM dbo.Orders AS o  
    INNER JOIN dbo.OrderDetails AS d  
    ON o.OrderID = od.ProductID  
    AND o.OrderDate >= '20091001';

  26. Tony Sawyer says:

    @mjswart
    How about:
    select a.number + b.number * 1000
    from numbers a
    join numbers b
    on 1 = 1
    ?

  27. AaronBertrand says:

    Yes, I would still use the latter.  Then I *know* by looking at the code that my intention was to cross join.  Remember that if you wrote that code, you might not be the next person looking at it, scratching their head wondering about your intention.  When you state your intention, there is no mystery.
    Arguably, you could state your intention by placing a comment at the beginning of the code block.  But that comment can still be ignored or get lost when copying the code to another place.  And you'd still have to form the habit of commenting in the first place.

  28. mjswart says:

    Here's a question though. What if you actually *want* a cartesian product.
    For example, say you want the numbers 1-1,000,000 but only have a numbers table that stores 1000 numbers.
    Then the choice is between:
    select a.number + b.number * 1000
    from numbers a, numbers b
    and:
    select a.number + b.number * 1000
    from numbers a
    cross join numbers b
    In this case, the old syntax seems a bit less wrong than other scenarios you mentioned. But I guess the second example is saying in effect "I'm cross joining and I know it".

  29. mjswart says:

    Man, ANSI-89. The name itself isn't even Y2K compliant.
    But the old syntax makes me nostalgic. Do you remember? It was used at a time when people's computer choices were between *real* PCs and clones (non-IBM).

  30. Glenn Berry says:

    I completely agree with you about this. I see a lot of people with Oracle experience in classes that I teach using ANSI-89 JOINs, but I quickly break them of that habit (at least while the class lasts).
    It is much harder to look at a query that uses ANSI-89 JOINs, and quickly pick out the actual part WHERE clause that matters the most for query optimization.