What do BETWEEN and the devil have in common?
See the full index.
I'll make no bones about it: BETWEEN is evil. For one, the meaning of the word in English does not always match the meaning of the operator in T-SQL. In T-SQL, BETWEEN is an *inclusive* range – not everyone gets that. Sure, in casual conversation when someone says "between 3 and 6" the answer really could be 3, 4, 5 or 6; but other times, they really mean to restrict the set to only 4 or 5 (an *exclusive* range). And don't get me started on the cable company telling you they'll be there between noon and 3:00 – we know that means 4:30, at best.
People also make incorrect assumptions about BETWEEN in that it doesn't matter which order the parameters are listed – e.g. BETWEEN 3 AND 5 and BETWEEN 5 AND 3 should yield the same answer. However, as the documentation states:
BETWEEN returns TRUE if the value of test_expression is greater than or equal to the value of begin_expression and less than or equal to the value of end_expression.
This means that you should be validating your inputs. If you are accepting user text for the boundaries, make sure that parameter A is less than or equal to parameter B. If this is not true, swap them (or scold the user).
Another incorrect assumption people make about BETWEEN, specifically when dealing with DATE/TIME data types, is that SQL Server can read their mind. If they say BETWEEN July 1st, 2011 AND July 31st, 2011, they usually mean they want the whole month. But that actually translates to something like the following (number of zeros dictated by precision of data type):
... BETWEEN '20110701 00:00:00.000' AND '20110731 00:00:00.000';
In some other platforms, or if your data is stored as DATE or as a DATETIME type but without time, this will work okay. But in most cases we are actually storing date and time data together, and this monthly report is going to fall short, by somewhere between 3 and 4%. (See what I did there?) I discussed this in a recent addition to my Bad Habits to Kick series.
I also talked recently about the new EOMONTH() function in SQL Server 2012, which threatens to enable even more people to do the exact same thing. I suspect that many folks will assume that it will find the true "end of the month" depending on the data type. For example, they might expect the following results, given an input of July 21 2011:
SMALLDATETIME '20110731 23:59:00' DATETIME '20110731 23:59:59.997' DATETIME2 '20110731 23:59:59.9999999'
This is not the case; EOMONTH() will always return the last day of the given month, at midnight, regardless of the data type. The results here are actually:
SMALLDATETIME '20110731 00:00' DATETIME '20110731 00:00.000' DATETIME2 '20110731 00:00.0000000'
So you can't use EOMONTH() instead of tricks you are probably using today (but shouldn't be), such as subtracting time components from the beginning of the next month:
SMALLDATETIME DATEADD(MINUTE, -1, '20110801') DATETIME DATEADD(MILLISECOND, -3, '20110801') DATETIME2 DATEADD(NANOSECOND, -100, '20110801')
Doing all of this to foster the use of BETWEEN is not going to pay off in the long run. If the underlying data type changes, your queries are going to break – not in the good way, where users report error messages, but rather where the results are just slightly wrong. Let's build a very simple table with six rows and demonstrate several ways that BETWEEN combined with the above tricks and assumptions can mess up query results.
USE tempdb; GO CREATE TABLE dbo.Data ( OrderID INT IDENTITY(1,1), OrderDate DATETIME2 ); GO INSERT dbo.Data(OrderDate) VALUES ('20110630 23:59:59.9999999'), ('20110730 15:32:00.0000000'), ('20110730 23:59:59.9999999'), ('20110731 00:00:00.0000000'), ('20110731 23:59:59.9999999'), ('20110801 00:00:00.0000000'); SELECT OrderID, OrderDate FROM dbo.Data;
The following query produces three rows, while many expect four (assuming that SQL Server knows they want all data from the end range date):
SELECT OrderID, OrderDate FROM dbo.Data WHERE OrderDate BETWEEN '20110701' AND '20110731';
Now what if the underlying data type changes? Rounding can cause some unexpected results:
SELECT OrderID, OrderDate FROM dbo.Data WHERE CONVERT(SMALLDATETIME, OrderDate) BETWEEN '20110701' AND '20110731';
Now what about some common tricks – such as subtracting a minute from the next day? While this might work if the underlying data type is SMALLDATETIME, with other data types the rounding causes rows to be missed:
SELECT OrderID, OrderDate FROM dbo.Data WHERE OrderDate BETWEEN '20110701' AND DATEADD(MINUTE, -1, '20110801');
Let's try 3 milliseconds. This might work for DATETIME, but not for DATETIME2 – order #5 is still missing:
SELECT OrderID, OrderDate FROM dbo.Data WHERE OrderDate BETWEEN '20110701' AND DATEADD(MILLISECOND, -3, '20110801');
How about a microsecond? Nope, still not right, order #5 is missing:
SELECT OrderID, OrderDate FROM dbo.Data WHERE OrderDate BETWEEN '20110701' AND DATEADD(MICROSECOND, -1, CONVERT(DATETIME2, '20110801'));
How about a nanosecond? Well, now order #5 is included, but order #6 is also included, because this rounded up (yes, also possible with DATETIME2):
SELECT OrderID, OrderDate FROM dbo.Data WHERE OrderDate BETWEEN '20110701' AND DATEADD(NANOSECOND, -1, CONVERT(DATETIME2, '20110801'));
In this case, the answer is to use 100 nanoseconds:
SELECT OrderID, OrderDate FROM dbo.Data WHERE OrderDate BETWEEN '20110701' AND DATEADD(NANOSECOND, -100, CONVERT(DATETIME2, '20110801'));
But still, we can't rely on that because it shifts if the underlying data type changes. If we use SMALLDATETIME, order #1 incorrectly reappears, and order #5 goes missing again. Who wants to keep track of all these rules?
SELECT OrderID, OrderDate FROM dbo.Data WHERE CONVERT(SMALLDATETIME, OrderDate) BETWEEN '20110701' AND DATEADD(NANOSECOND, -100, CONVERT(DATETIME2, '20110801'));
The real answer? Use an open-ended date range. Always. Seriously, try to break it. This will always work, regardless of the underlying data types (I'll explicitly state that I assume that the data types are, in fact, in the date/time family, and not integers or strings):
SELECT OrderID, OrderDate FROM dbo.Data WHERE OrderDate >= '20110701' AND OrderDate < '20110801';
I can't stress how much easier it is to find the beginning of the next period than the end of the current period, with or without the use of "helper" functions like EOMONTH().
You might be tempted to use BETWEEN if you are using the DATE data type. While I think this is okay, as I've mentioned in the past, I would place consistency above brevity, and use open-ended ranges in all cases. After all, you never know when that DATE column may change to include more granular information.
Now, I hope that convinces some of you to stop using BETWEEN for date range queries. If the potential for incorrect results doesn't convince you, maybe it will help if you try to remember that when you use BETWEEN, you may see it color-coded in grey (or gray), but I see red. 🙂
Don't forget to clean up:
DROP TABLE dbo.Data;
See the full index.
Found this article verifying that what I had learned the hard way. I was fetching records for one day using between and adding that data to the data for the next day. I surmised that the oddities of between would fade out since I was essentially taking everything. I failed to take into account midnight – records with the datetime at midnight appear in both datasets thanks to my improperly using between. >= and < are reliable friends that I am now back to using….
@Martin Surely all of those platforms support syntax like >= beginning of period and < beginning of next period, regardless of precision.
Thank you for the in-depth analysis for the mistake which I see from novice programmers again and again – now I'll have a link to redirect them to.
But now I'll have to find how to achieve correct inclusive BETWEEN with date endpoints compared against date&time columns in various SQL servers – Microsoft, MySQL, PostgreSQL, assuming that we always ignore milliseconds part leaving it to 000 (or 0000000, depends on particular SQL server datatype precision).
Very helpful and the time you took is appreciated
@Daniel no, two problems with that:
(1) The explicit conversion will force a table scan, even if there is an index on OrderDate that would be used when using a proper, open-ended range using the right data types.
(2) varchar without length? Please read:
What about write this:
SELECT OrderID, OrderDate
WHERE Convert(varchar,OrderDate,112) BETWEEN '20110701' AND '20110731';
About the "BETWEEN 5 AND 3" thing : the SQL standard has an option called SYMMETRIC. 4 does not match "BETWEEN 5 AND 3" but it does match "BETWEEN 5 AND 3 SYMMETRIC".
Optional conformance feature, so I'd be surprised if any engine really had this.
Nice work. You had me at Hello, meaning to say that you had me with the Blog Title.
Good extensive work. I guess you had a lot of time on your hands, and the Database Community in its entirely will benefit for many, many years to come.
I really enjoyed you pointing out that date arithmetic is not an exact science and one needs to be aware of the data type.
The one instance is where I really benefit from a deep discussion of Time is the one posted by "Kimball Group" @ http://www.kimballgroup.com/1997/07/10/its-time-for-time/.
Once again, I have a whole filling of Learning for the Day!
Thanks for sharing!
Sharon, that clause says "greater than or equal to October 1 at midnight *AND* less than or equal to October 1 at midnight." This means only data that happened to be timestamped at exactly midnight would come out. I think you meant:
>= '20131001' AND < '20131002'
i use the >= and <= operator for searcher date range for 1 date only (e.g dtdate >= '2013-10-1' and dtdate <= '2013-10-1') no data came out…
Thank you for the post. I want to add for people who use stored procedures to use the following parameter for the open ended approach:
WHERE Table.DateField >= @pStartDate AND Table.DateField < DATEADD(day,1,@pEndDate)
Thanks Chuck, my approach here mirrors my approach in several other aspects of programming – I'd rather just be consistent than have to keep a list of rules reminding me when it's okay to code a certain way and when it's not. I'm not saying there should be no rules or choices but consistency leads to simplicity IMHO. I tried to make the same point about the INFORMATION_SCHEMA views – why use them at all if there are scenarios where they don't work?
I wouldn't put the blame only on BETWEEN. I'd say it's the use of BETWEEN with datatypes having both date and time parts, and is nothing more and nothing less than the combination of two easily forgotten aspects (BETWEEN is INclusive, string dates w/o time default to 00:00) that when combined create situations that can easily trip people up, especially those under pressure, in a hurry, or when called in the wee hours to troubleshoot. What I appreciate most about this post though is the debunking of several commonly seen attempts to workaround the problem.
Well come on now, none of us was born knowing the basics, and it's kind of unfair to expect everyone to have figured this out before they're allowed to touch a database.
I like to blog about these things because I'd rather have folks read about these issues and prevent them than find out the hard way.
This is basic stuff. The best solution would be the first and most obvious impluse had BETWEEN never been created.
If you can't figure out the whole "less than greater than stuff," or even if you're too lazy to use it, then you shouldn't be working with databases anyway.
Agreed! BETWEEN is evil and I never use it.
Just found a bug in some software, likely due to the use of BETWEEN with dates.
I learned this one from Jeff Moden on SSC a few years ago and NEVER use BETWEEN with dates.
Aaron, thank you for this great post! Too often seen BETWEEN being used to select orders from a certain period resulting ok because there were no orders at midnight…
But as Antony and Jason say: this discussion will also never end…
After reading this article I can say for sure that I will think twice (at least) before using BETWEEN for date ranges ever again.
I have used BETWEEN in code and knew that it was an inclusive range – but have to admit to using the minus a second trick – which in my case was accurate at the time – but as you indicate could break if the underlying datatype were to be changed.
I was also not explicitly aware that the order of parameters mattered – which is good to know.
Aaron, I really appreciate this post. Using 'BETWEEN' is the first impulse, and people may not think of the solution that you posted, even though it seems obvious.
Anyone who has a good math background should never have problem with BETWEEN. Good post nevertheless!
Thanks Jason. I certainly agree that BETWEEN makes typing easier. However, except in the case of a single day, since you are always dealing with two points in a range, I don't know why it's necessary to need a single operator (except to avoid having to type the column name twice). When dealing with a single day, I guess an operator such as ON_DAY_OF(<date>) could work, which under the covers would be converted to >= <date> and < <date+1>. I've seen others suggest things like BETWEEN_INCLUSIVE and BETWEEN_EXCLUSIVE – but once you start needing those descriptors, you really offset the perceived benefit in the first place – less typing. I think open-ended ranges are quite intuitive, I just think people are less familiar with them because they're "so hard to type." 🙂
Anyway, thanks for the comments!
Aaron, this is a great blog post. I myself favor the between operator primarily because I'm lazy.
I also find the syntax to be much more expressive and concise(and misleading as you rightly point out).
I'm curious to hear your thoughts regarding suggesting a new operator that supports the more intuitive semantics. Do you think that would get any traction or even be a good idea?
Keep up the awesome work. You exemplify what it means to be an MVP in mind 8)
Antony, sure, there are many ways you can mis-handle date range queries, and they aren't exclusive to BETWEEN. However, IMHO, using BETWEEN makes them more likely. This is because, regardless of what it says on the tin (and whether or not you find it problematic), a lot of people don't read the tin, and bring assumptions along with them from other languages or other data types. I've seen it happen often enough that I feel compelled to warn people about its usage. Even if you understand the implications, you don't know who is going to inherit your code and whether they will understand it the same way you do, and you don't know whether your code will be reviewed when data type changes are made. Open-ended ranges are much more reliable, much more consistent, and much more self-documenting.
And again, this is a blog post, so IMHO goes without saying.
To be fair, incorrect assumptions about date handling and time portions of datetimes are nothing specific to BETWEEN, this issue is the same however you write your date range criteria.
BETWEEN does was it says on the tin – personally I've never found that to be problematic, at all.