Sometimes it's the small things : match column names in subqueries
The behavior of column matching in subqueries is a little peculiar, to say the least. If you've been bitten by this behavior once, you're unlikely to have been bitten a second time, but for some of us it just takes a while to sink in.
This morning I wasted a good five minutes "troubleshooting" a query that wasn't working out the way I thought it should. I have a core table called Users (simplified for brevity):
CREATE TABLE dbo.Users ( UserID INT PRIMARY KEY, -- ... other columns ... );
Against this table, I was writing a custom report for a specific set of users, so I wrote something like this (again, simplified for brevity):
DECLARE @u TABLE (id INT PRIMARY KEY); INSERT @u(id) SELECT 1 UNION ALL SELECT 2; SELECT UserID FROM dbo.Users WHERE UserID IN (SELECT UserID FROM @u);
I ran the query, and of course got all of the users in the original Users table. Why? Because in the table variable definition, I got really lazy about the name of the ID column, and then forgot that I had been lazy when I referenced it later. In most cases this won't have any effect, but in a subquery, if the column is not found in the objects referenced there, the parser jumps scope and tries to bind to a column in the parent query. If it finds none, you get an error (invalid column name); if it finds more than one, you get a different error (ambiguous column name). But when it finds exactly one, then I may as well have written:
SELECT UserID FROM dbo.Users WHERE UserID IN (SELECT UserID FROM dbo.Users); -- which of course becomes: SELECT UserID FROM dbo.Users;
Note that if @u had no rows, then the outer query would have also returned no rows. But all it needs is one row to satisfy the condition for every row in the outer table… and in some cases, this can be more difficult to troubleshoot than my simple example here.
It is clear I'm not the first person to get tripped up by this behavior, but they aren't going to fix it:
According to Microsoft, this behavior is "correct." If it were me, I would "fix" the language rules so that a subquery that is not explicitly correlated to an outer query, should be able to be parsed and executed on its own without error. If it can't, then it shouldn't parse as part of a larger query, either.
But that's just my opinion. While I understand that the current behavior aligns with the T-SQL rules, that doesn't mean that the T-SQL rules make a whole lot of sense in this case.
In the meantime, if you have a situation where a subquery with IN() is not acting like the filter you expected, check your column names before you start re-writing your query and pulling your hair out.