T-SQL Tuesday #23: Well, that's a kick in the JOIN…
October 4th, 2011
T-SQL Tuesday #23: Well, that's a kick in the JOIN…
October 4th, 2011
 
 

This month's T-SQL Tuesday is a week early due to the upcoming PASS Summit. It is being hosted by Stuart Ainsworth (blog | @codegumbo) and the topic is a general one: JOINs. I had an idea for a very brief post showing the typical join operations we use to solve common queries, and a more interesting but less common way we can solve the same queries with less code.

I often see people performing explicit outer or implicit anti semi joins to return "all of some set, except for this other set." Sometimes the query can get quite convoluted, leading to self-joins and CTEs – especially when aggregates are involved.

As a fictitious example, let's say we wanted all of the object_id values from sys.objects, except for those objects that also have a column named id – hey, I didn't say this was a real-world example, it's just a very simple way to demonstrate using objects I know you have installed. So you may see this type of explicit outer join:

SELECT o.[object_id]
    FROM sys.objects AS o
    LEFT OUTER  JOIN sys.columns AS c
    ON o.[object_id] = c.[object_id]
    AND c.[name] = N'id'
    WHERE c.[object_id] IS NULL;

You may also see these NOT EXISTS or NOT IN variants:

-- NOT EXISTS:
 
SELECT [object_id]
    FROM sys.objects AS o
    WHERE NOT EXISTS
    (
      SELECT 1
        FROM sys.columns
        WHERE [name] = N'id'
        AND [object_id] = o.[object_id]
    );
 
-- NOT IN:
 
SELECT [object_id]
    FROM sys.objects
    WHERE [object_id] NOT IN
    (
      SELECT [object_id]
        FROM sys.columns
        WHERE [name] = N'id'
    );
As a side note, be very careful when choosing between a correlated EXISTS and a direct NOT IN – if the subject column is nullable, the latter will produce unexpected results. To avoid confusion and to program consistently, I always choose EXISTS even when it means I need to add a clause to correlate it to the outer query.

However, since SQL Server 2005, we've been able to use EXCEPT to write a slightly simpler version of this query:

SELECT [object_id]
  FROM sys.objects
EXCEPT
SELECT [object_id]
  FROM sys.columns
  WHERE [name] = N'id';

Sure, we've only saved a line or two, but I find the syntax much less complex – even on a trivial query like this. It doesn't look or smell like a join, does it? Well, it certainly behaves like one. In this case it also performs like one (though you should always test performance when weighing queries that produce the same results – readability should only be one of your criteria).

Let's generate the actual execution plans from within Plan Explorer. The runtime results are nearly identical for all four queries:

The graphical plans are almost identical. The outer join version looks like this:

While the other three are all performed using one less filter, but otherwise the same plan:

In a future blog post, I'll go into more detail about EXCEPT and I'll also talk about how INTERSECT can make certain inner join queries more intuitive. These are definitely two operators that should be in your toolkit if you are writing a lot of joins…

By: Aaron Bertrand

I am a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. I am a long-time Microsoft MVP, write at Simple Talk, SQLPerformance, and MSSQLTips, and have had the honor of speaking at more conferences than I can remember. In non-tech life, I am a husband, a father of two, a huge hockey and football fan, and my pronouns are he/him.