What do BETWEEN and the devil have in common?
I'll make no bones about it: BETWEEN is evil. For one, the meaning of the word in English does not always match the meaning of the operator in T-SQL. In T-SQL, BETWEEN is an *inclusive* range – not everyone gets that. Sure, in casual conversation when someone says "between 3 and 6" the answer really could be 3, 4, 5 or 6; but other times, they really mean to restrict the set to only 4 or 5 (an *exclusive* range). And don't get me started on the cable company telling you they'll be there between noon and 3:00 – we know that means 4:30, at best.
People also make incorrect assumptions about BETWEEN in that it doesn't matter which order the parameters are listed – e.g. BETWEEN 3 AND 5 and BETWEEN 5 AND 3 should yield the same answer. However, as the documentation states:
BETWEEN returns TRUE if the value of test_expression is greater than or equal to the value of begin_expression and less than or equal to the value of end_expression.
This means that you should be validating your inputs. If you are accepting user text for the boundaries, make sure that parameter A is less than or equal to parameter B. If this is not true, swap them (or scold the user).
Another incorrect assumption people make about BETWEEN, specifically when dealing with DATE/TIME data types, is that SQL Server can read their mind. If they say BETWEEN July 1 2011 AND July 31 2011, they usually mean they want the whole month. But that actually translates to something like the following (number of zeros dictated by precision of data type):
... BETWEEN '20110701 00:00:00.000' AND '20110731 00:00:00.000';
In some other platforms, or if your data is stored as DATE or as a DATETIME type but without time, this will work okay. But in most cases we are actually storing date and time data together, and this monthly report is going to fall short, by somewhere between 3 and 4%. (See what I did there?) I discussed this in a recent addition to my Bad Habits to Kick series.
I also talked recently about the new EOMONTH() function in
DenaliSQL Server 2012, which threatens to enable even more people to do the exact same thing. I suspect that many folks will assume that it will find the true "end of the month" depending on the data type. For example, they might expect the following results, given an input of July 21 2011:
SMALLDATETIME '20110731 23:59:00' DATETIME '20110731 23:59:59.997' DATETIME2 '20110731 23:59:59.9999999'
This is not the case; EOMONTH() will always return the last day of the given month, at midnight, regardless of the data type. The results here are actually:
SMALLDATETIME '20110731 00:00' DATETIME '20110731 00:00.000' DATETIME2 '20110731 00:00.0000000'
So you can't use EOMONTH() instead of tricks you are probably using today (but shouldn't be), such as subtracting time components from the beginning of the next month:
SMALLDATETIME DATEADD(MINUTE, -1, '20110801') DATETIME DATEADD(MILLISECOND, -3, '20110801') DATETIME2 DATEADD(NANOSECOND, -100, '20110801')
Doing all of this to foster the use of BETWEEN is not going to pay off in the long run. If the underlying data type changes, your queries are going to break – not in the good way, where users report error messages, but rather where the results are just slightly wrong. Let's build a very simple table with six rows and demonstrate several ways that BETWEEN combined with the above tricks and assumptions can mess up query results.
USE tempdb; GO CREATE TABLE dbo.Data ( OrderID INT IDENTITY(1,1), OrderDate DATETIME2 ); INSERT dbo.Data(OrderDate) VALUES ('20110630 23:59:59.9999999'), ('20110730 15:32:00.0000000'), ('20110730 23:59:59.9999999'), ('20110731 00:00:00.0000000'), ('20110731 23:59:59.9999999'), ('20110801 00:00:00.0000000'); GO SELECT OrderID, OrderDate FROM dbo.Data; GO
The following query produces three rows, while many expect four (assuming that SQL Server knows they want all data from the end range date):
SELECT OrderID, OrderDate FROM dbo.Data WHERE OrderDate BETWEEN '20110701' AND '20110731';
Now what if the underlying data type changes? Rounding can cause some unexpected results:
SELECT OrderID, OrderDate FROM dbo.Data WHERE CONVERT(SMALLDATETIME, OrderDate) BETWEEN '20110701' AND '20110731';
Now what about some common tricks – such as subtracting a minute from the next day? While this might work if the underlying data type is SMALLDATETIME, with other data types the rounding causes rows to be missed:
SELECT OrderID, OrderDate FROM dbo.Data WHERE OrderDate BETWEEN '20110701' AND DATEADD(MINUTE, -1, '20110801');
Let's try 3 milliseconds. This might work for DATETIME, but not for DATETIME2 – order #5 is still missing:
SELECT OrderID, OrderDate FROM dbo.Data WHERE OrderDate BETWEEN '20110701' AND DATEADD(MILLISECOND, -3, '20110801');
How about a microsecond? Nope, still not right, order #5 is missing:
SELECT OrderID, OrderDate FROM dbo.Data WHERE OrderDate BETWEEN '20110701' AND DATEADD(MICROSECOND, -1, CONVERT(DATETIME2, '20110801'));
How about a nanosecond? Well, now order #5 is included, but order #6 is also included, because this rounded up (yes, also possible with DATETIME2):
SELECT OrderID, OrderDate FROM dbo.Data WHERE OrderDate BETWEEN '20110701' AND DATEADD(NANOSECOND, -1, CONVERT(DATETIME2, '20110801'));
In this case, the answer is to use 100 nanoseconds:
SELECT OrderID, OrderDate FROM dbo.Data WHERE OrderDate BETWEEN '20110701' AND DATEADD(NANOSECOND, -100, CONVERT(DATETIME2, '20110801'));
But still, we can't rely on that because it shifts if the underlying data type changes. If we use SMALLDATETIME, order #1 incorrectly reappears, and order #5 goes missing again. Who wants to keep track of all these rules?
SELECT OrderID, OrderDate FROM dbo.Data WHERE CONVERT(SMALLDATETIME, OrderDate) BETWEEN '20110701' AND DATEADD(NANOSECOND, -100, CONVERT(DATETIME2, '20110801'));
The real answer? Use an open-ended date range. Always. Seriously, try to break it. This will always work, regardless of the underlying data types (I'll explicitly state that I assume that the data types are, in fact, in the date/time family, and not integers or strings):
SELECT OrderID, OrderDate FROM dbo.Data WHERE OrderDate >= '20110701' AND OrderDate < '20110801';
Also, these range boundaries are quite easy to determine, with or without the use of helper functions like EOMONTH(). If you're really excited about using EOMONTH(), please keep the above in mind. You can still use an open-ended range with EOMONTH(), just add a day. e.g.
... WHERE OrderDate >= '20110701' AND OrderDate < DATEADD(DAY, 1, EOMONTH('20110701'));
You might be tempted to use BETWEEN if you are using the DATE data type. While I think this is okay, as I've mentioned in the past, I would place consistency above brevity, and use open-ended ranges in all cases. After all, you never know when that DATE column may change to include more granular information.
Now, I hope that convinces some of you to stop using BETWEEN for date range queries. If the potential for incorrect results doesn't convince you, maybe it will help if you try to remember that when you use BETWEEN, you may see it color-coded in grey (or gray), but I see red. 🙂
Don't forget to clean up:
DROP TABLE dbo.Data; GO