See the full index.
I am sure most veterans know better than to use old ANSI-89 JOIN syntax, such as:
SELECT o.OrderID, od.ProductID FROM dbo.Orders AS o, dbo.OrderDetails AS od WHERE o.OrderDate >= '20091001' AND o.OrderID = od.ProductID;
One reason to avoid this syntax is that the query is often less readable than when you separate the join criteria from the filter criteria. For example, the above query re-written to use a "proper" join becomes:
SELECT o.OrderID, od.ProductID FROM dbo.Orders AS o INNER JOIN dbo.OrderDetails AS od ON o.OrderID = od.ProductID WHERE o.OrderDate >= '20091001';
This is subjective, of course, but seems to have broad support in my circles.
Now, bring in four or five more tables to the query, and then try to debug when you are not getting the expected number of rows. When tables are grouped together with their joining criteria, commenting clauses out or changing the criteria becomes a much simpler task.
The OUTER JOIN
version of this syntax (which uses *=
and =*
syntax) is discontinued in SQL Server 2012, and has been deprecated since SQL Server 2008, so there is another reason to avoid the syntax in general. Many people have memorized which is which, but deprecation aside, isn't it better to explicitly state what you are doing? For example:
SELECT p.ProductName, p.ProductID FROM dbo.Products AS p LEFT OUTER JOIN dbo.OrderDetails AS od ON p.ProductID = od.ProductID WHERE od.ProductID IS NULL;
There are also documented cases where the old-style outer joins produce incorrect results depending on the ordering of evaluation of the "filter" criteria (which happened to include the joining criteria). Largely a moot point now, since the old outer join syntax is deprecated, but I wanted to point it out all the same
Are you sold yet? No? Well, there is an even bigger potential problem in using the implicit syntax. What if you forget the join criteria? You've inadvertently turned it into a CROSS JOIN
(otherwise known as the Cartesian Product). Take the first query again, and leave out the last line:
SELECT o.OrderID, od.ProductID FROM dbo.Orders AS o, dbo.OrderDetails AS od WHERE o.OrderDate >= '20091001';
For every single order after October 1st, you're going to get a copy of every single row from OrderDetails
*for all of time* since you've lost the joining criteria *and* the filtering criteria for that table. This could be a disastrous mistake on a production system, and while it can be caught relatively quickly after deployment and testing, it could have been prevented entirely at compile time. The explicit INNER JOIN
syntax requires an ON
clause. Sure, you can still get your join criteria wrong and push the change without noticing (for example, ON o.OrderID = o.OrderID
), but with the newer syntax, there's one less thing that is likely to go wrong.
My biggest beef about trying to help people kick these habits is that Microsoft themselves keep publishing sample code that promotes the syntax. Again, this isn't so much a problem for the veterans (except those with insurmountable inertia), who fully understand how joins work, and know the history of ANSI-89 and ANSI-92. However, for someone brand new to SQL Server, they will pick up a current copy of Books Online, and see sample queries that use these bad examples.
I once complained about this on () Connect, when I came across yet another example of old-style joins in the SQL Server 2008 version of Books Online. They quickly agreed that, when they review the documentation for 2008 R2, they will update any code samples they deem "suspect."
See the full index.
The *=/=* syntax was never ANSI; it was from Oracle, it never had a clearly defined meaning when more than two tables were joined, and arose from a misapplication of a notation in a Chris Date article about outer joins.
Something like ON was needed for proper outer join syntax. Coincidentally it made sense to use the same syntax for inner join, but since nested selects were added at the same time, ON was not needed for mixing outer and inner joins.
Comma has never been deprecated. The other joins are still defined in the SQL standard in terms of it.
Comma is just a different spelling for cross join with lower precedence. Arguments that comma is problematic are specious. (Especially considering how ridiculously & unnecessarily arcane and redundant the rest of select statement syntax and semantics is.)
I used to like the old style, but now I am a fan of the new style. I have been bitten with the accidental Cartesian join by the old style, and, with lots of tables in your query, the old style is very hard to trace.
I find the formatting style below to be quite readable… (using tab chars that are 8 spaces wide):
select a.col1,
b.col2,
c.col3
from table1 a
join table2 b on a.col2 = b.col2
left outer
join (
select col3
from table3
) c on b.col3 = c.col3
where a.col2 = '2'
and a.col3 = '3'
order by a.col1, b.col2, c.col3
I fully agree with ogrig, Brian, and Nate about the readability and clause separation of the old-style syntax. After all, logically, joins (inner joins anyway) ARE simply filtered cross joins, and as Sheila's post illustrates, there really is NO DIFFERENCE between a join condition and a filter condition. But I also agree with Aaron and others about the dangers of the old-style syntax.
Personally, I hate the "new" style so much that I avoid it where possible, and clearly separate my joins and conditions in the where clause, as so:
select
a.SomeColumn,
b.SomeOtherColumn
from
tableA a,
tableB b,
where
— joins
a.ID = b.a_ID and
— conditions
a.SomeValue > 10 and
b.SomeString = 'abc'
A much better solution in my opinion would be for a new syntax where the type of join is specified in the FROM clause, and the join conditions are expressed separately in a new JOIN clause, followed by filter conditions in the WHERE clause. Something like:
select
(whatever)
from –these are your tables and how they're joined
tableA a left
tableB b inner
tableC c cross
tableD
join –these are your join conditions (one condition or set of conditions for each join mentioned above)
a.Field1 = b.Field1 and a.Field2 = b.Field2,
a.Field3 > c.Field4
where –these are your filter conditions
a.Field5 = 0 and
d.Field6 <> 'abc' and
etc.
The parser could then check to see that you provided join conditions that make sense for your join type, and throw an error if there is a mismatch (e.g. trying to do an inner or outer join without any conditions or with conditions that don't compare fields from both sides of the join, or trying to put conditions on a cross join, etc.)
I agree with Aaron, but I find that given a choice I prefer to hire people that know what they are doing rather than a WTF? person. (their Uni. has failed them). Not everything is development, and in some shops there are a lot of these queries. Maintenance matters.
In any case after a few years with any model a boilerplate query will already exist for most scenarios. I have no preference, and we must know both, but ANSI style join does allow bad behaviour such as not having the selection criteria in the same order as the tables are named.
Personally I like A,B,C,D,E etc. This should also be the way the DB designer thinks when modeling.
It disturbs me if I see someone drafting a query without drawing a picture first (not using an ERD).
(SQLServer from 1993, DB2 from 1992, dabbler in Mapreduce)
@Nate ok, readability is quite subjective, and I am never going to convince everyone about any aspect of what makes code more or less readable – way too many preferences, styles, habits, etc.
But what about the other points that even ogrig conceded (and Brian ignored)? Are you really going to continue using old-style joins just for readability, even after they're deprecated? Are you really going to insist that it makes sense to use old-style inner joins even though you can no longer use old-style outer joins? Are "readable" inner joins – prone to losing important filter criteria – really worth having forced inconsistency?
I'm trying to provide advice that is most logical, especially for newer users who might be "learning" from legacy code or veterans. You run ignore that and run your shop however you like, of course. 🙂
I agree COMPLETELY with ogrig and Brian Tkatch. The "old school" is much cleaner and more readable.
So true about the outdated join syntax, expecially the deprecated outer joins.
We have a couple of commercial products (same vendor) that persist in using the deprecated outer join syntax. We brought this to the attention of the vendor over 5 years (yes, years) ago and they have yet to correct their source code.
Thus, we are "locked in" to using SQL 2008 R2 because the only way we can get the application to run correctly is to set the database compatability level to "80" (SQL 2000) for these databases. This database compatability level no longer exists in SQL 2012 so we cannot upgrade our SQL Servers until the vendor fixes their code. Oh well.
FYI the link to the deprecation of old-style joins is now outdated (it now points at features which will be cut from vNext after 2012)
Correct link is now: http://msdn.microsoft.com/en-us/library/ms144262.aspx (Discontinued Database Engine Functionality in SQL Server 2012)
I have to agree with Aaron on this as well. Trying to decipher 5+ table joins in the old comma syntax is a hell I'd wish on no man.
We've had accidental Cartesian products in our application a few times due to the comma syntax which have resulted in 60,000+ row tables being Cartesian joined instead of Inner joined. I've never had this problem as my intentions are clear when I use the newer syntax, even though I'm guilty of shorthanding JOIN and LEFT JOIN. 😛
Syed no, there isn't, but I think you may have missed the point of the article…
Is there any difference in the performance when using either of them.
I have to agree with Aaron and disagree with ogrig. Had a query I inherited once with Joins to 6 tables and a total of ten criteria in the Where clause. The query kept blowing up because one of the tables had changed structure. Trying to trouble shoot the joins on that was a godawful nightmare. When I finally finished re-writing it, it was very straight forward with my Five JOIN lines and their criteria separated from the 4 Where criteria. Because of that I was able to then simplify the where criteria down to two because of the equi-join required that criteria equal to each other didn't have to both be listed in the Where clause as equalling the same value.
I asked a member of the ANSI SQL board why they went with this more complicated join clause. He, (Joe Celko) said that the original syntax is in fact the correct method. The Join syntax was for systems that didn't have the optimizers such as SQL CE and SQL Lite and was never intended to be used in the way we have stared using it. Microsoft got it confused and made it policy.
The Cartesian join issue is not much of a problem if you test your sql.
He also said that they met for 2 days and decided that the language is produced SQL not Sequel. I was greatful to have an answer to both of those questions.
so i guess i'm confused…. i, too, think the "x INNER JOIN y ON z" syntax is LESS readable. INNER joins are never the problem, it's the left and right joins that seem to frustrate. What is the "correct" syntax to use when wanting (x LEFT JOIN y) RIGHT JOIN z is wanted (e.g. the RIGHT JOIN is between table z and the result of the LEFT JOIN of tables x and y)?
Just a quick comment about your first "proper" join. The aliasing seems to be inconsistent with the comparison in the ON clause…
SELECT o.OrderID, od.ProductID FROM dbo.Orders AS o INNER JOIN dbo.OrderDetails AS d ON o.OrderID = od.ProductID WHERE o.OrderDate >= '20091001';
should be:
SELECT o.OrderID, od.ProductID FROM dbo.Orders AS o INNER JOIN dbo.OrderDetails AS od ON o.OrderID = od.ProductID WHERE o.OrderDate >= '20091001';
I agree with ogrig . ANSI-89 syntax is much easier to read. I despise the later syntax, and use it only if i have too.
Besides the lack of readability, there is another issue with the later new-fangled syntax. It destroys clause separation. Traditionally, the FROM clause includes records, the WHERE clause excludes records. A join clause usually excludes records and never includes records. Hence, it belongs in the WHERE clause.
For the most part, by readable, I meant that it is much more visually obvious where the join criteria lives and where the filter criteria lives. Obviously if you've used the old style your entire career then your point of view is different from someone who hasn't. To me the new style is more readable, but only part of that is for the subjective reason.
Sorry to disagree, "old style ANSI-89 joins" are NOT less readable, quite the opposite.
They are non-standard, on the deprecation list (for a good reason), … and have other issues you would be more aware of than I am, but not less readable.
It is you that got used to the new standard (not so new anymore:-) and are not confortable anymore with the old one.
My "problem" with the new standard is proper formatting of the code.
Let's take 3 ways of writing the same simple 3 tables query:
— style 1
SELECT dt.someColumn
FROM tblInvoiceRun ir
JOIN tblInvoiceHeader hd ON ir.invoiceRunNo = hd.invoiceRunNo
JOIN tblInvoiceDetail dt ON hd.invoiceNo = dt.invoiceNo
WHERE ir.invoiceRunNo = 666
— style 2
SELECT dt.someColumn
FROM tblInvoiceRun ir
JOIN tblInvoiceHeader hd
ON ir.invoiceRunNo = hd.invoiceRunNo
JOIN tblInvoiceDetail dt
ON hd.invoiceNo = dt.invoiceNo
WHERE ir.invoiceRunNo = 666
— style 3
SELECT dt.someColumn
FROM tblInvoiceRun ir
tblInvoiceHeader hd
tblInvoiceDetail dt
WHERE ir.invoiceRunNo = 666
AND ir.invoiceRunNo = hd.invoiceRunNo
AND hd.invoiceNo = dt.invoiceNo
style 1: clear separation of participants (tables) and rules (ON clause), but a bit of a pain to align properly and maintain (just think about adding a LEFT JOIN or multiple conditions in one of the ON clauses)
style 2: I know this is your recommended style, but for me it is just messy. The participants and rules are interspersed, which means you cannot use a proper table-like alignament. Just unnecessarily hard to follow.
style 3: very clear separation of participants (FROM clause) and rules (WHERE clause), very easy to write and align properly.
Just to be clear: I am not saying that ANSI-89 is the style to use, quite the contrary. But there are enough valid reasons to change. The fact that you, me, or someone else's cat do not find it easy to the eye is NOT one of them.
One feature of oracle join syntax that i really like is the "USING" clause. If you want to join two tables on a set of columns with the same name in both tables, you just say:
select * from table1 t1 join table2 t2
USING (SharedIdColumn1)
just so much clearer to read and less fiddly to write than
select * from table1 join table2
ON t1.SharedIdColum1 = t2.SharedIdColumn2
apparently this is actually part of the ansi specification. I wish microsoft would just go ahead and implement it. Or maybe not part of ansi spec in which case I wish MS would just go ahead and "borrow" it from Oracle 🙂
Wild Rumpus, a lot of the "good" vs. "bad" habits are quite subjective, so I think it would be a stretch to expect a tool (particularly a free one) that will identify these things for you, never mind convert them.
That said, there are some tools that Microsoft ships that will help identify some things, such as old-style outer joins and usage of other deprecated features.
SQL Server 2008 Upgrade Advisor:
http://microsoft.com/downloads/details.aspx?FamilyID=F5A6C5E9-4CD9-4E42-A21C-7291E7F0F852
SQL Server 2005 Upgrade Advisor:
http://microsoft.com/downloads/details.aspx?FamilyID=1470e86b-7e05-4322-a677-95ab44f12d75
SQL Server 2005 Best Practices Analyzer:
http://microsoft.com/downloads/details.aspx?FamilyID=DA0531E4-E94C-4991-82FA-F0E3FBD05E63
I'm a software engineer, but my current focus is writing application code. I totally understand where you're coming from, but my development team doesn't have a DBA, and a lot of us "coders" are from back-in-the-day. Are there any FREE, AUTOMATED tools that we can run our scripts through to highlight where we could be switching to new, good habits?
I'm the younger guy coming to my company and with less experience but a
lot fresher education and there in my company they do this kind of join statements all the time. It was a lot of "WTF" in the beginning, I could hardly believe my eyes. But the atmosphere got me and I'm guilty of writing this kind of statements. And another sin they do is to use cursors excessively and then they sometimes complain of a slow application? Why can it be so? He he!!
@Glenn Berry
I agree, and Oracle people shouldn't have an excuse because Oracle has supported the modern syntax since version 9i.
I guess it's semantics, Sheila, and you can decide for yourself whether the syntax you propose is more meaningful for you. I prefer putting filter criteria in the WHERE clause, and join criteria in the ON clause. The OrderDate has nothing to do with the join (in other words, the join will work the same way if you comment that line out, it will just return more rows), so I don't think it belongs there.
What is your opinion on including the filter criteria with the join clause, when the filter is based on the value of a specific column? In your first example the WHERE would be an AND.
SELECT o.OrderID, od.ProductID
FROM dbo.Orders AS o
INNER JOIN dbo.OrderDetails AS d
ON o.OrderID = od.ProductID
AND o.OrderDate >= '20091001';
@mjswart
How about:
select a.number + b.number * 1000
from numbers a
join numbers b
on 1 = 1
?
Yes, I would still use the latter. Then I *know* by looking at the code that my intention was to cross join. Remember that if you wrote that code, you might not be the next person looking at it, scratching their head wondering about your intention. When you state your intention, there is no mystery.
Arguably, you could state your intention by placing a comment at the beginning of the code block. But that comment can still be ignored or get lost when copying the code to another place. And you'd still have to form the habit of commenting in the first place.
Here's a question though. What if you actually *want* a cartesian product.
For example, say you want the numbers 1-1,000,000 but only have a numbers table that stores 1000 numbers.
Then the choice is between:
select a.number + b.number * 1000
from numbers a, numbers b
and:
select a.number + b.number * 1000
from numbers a
cross join numbers b
In this case, the old syntax seems a bit less wrong than other scenarios you mentioned. But I guess the second example is saying in effect "I'm cross joining and I know it".
Man, ANSI-89. The name itself isn't even Y2K compliant.
But the old syntax makes me nostalgic. Do you remember? It was used at a time when people's computer choices were between *real* PCs and clones (non-IBM).
I completely agree with you about this. I see a lot of people with Oracle experience in classes that I teach using ANSI-89 JOINs, but I quickly break them of that habit (at least while the class lasts).
It is much harder to look at a query that uses ANSI-89 JOINs, and quickly pick out the actual part WHERE clause that matters the most for query optimization.