October 19, 2011 | SQL Server

What do BETWEEN and the devil have in common?

I'll make no bones about it: BETWEEN is evil. For one, the meaning of the word in English does not always match the meaning of the operator in T-SQL. In T-SQL, BETWEEN is an *inclusive* range – not everyone gets that. Sure, in casual conversation when someone says "between 3 and 6" the answer really could be 3, 4, 5 or 6; but other times, they really mean to restrict the set to only 4 or 5 (an *exclusive* range). And don't get me started on the cable company telling you they'll be there between noon and 3:00 – we know that means 4:30, at best.

People also make incorrect assumptions about BETWEEN in that it doesn't matter which order the parameters are listed – e.g. BETWEEN 3 AND 5 and BETWEEN 5 AND 3 should yield the same answer. However, as the documentation states:

BETWEEN returns TRUE if the value of test_expression is greater than or equal to the value of begin_expression and less than or equal to the value of end_expression.

This means that you should be validating your inputs. If you are accepting user text for the boundaries, make sure that parameter A is less than or equal to parameter B. If this is not true, swap them (or scold the user).

Another incorrect assumption people make about BETWEEN, specifically when dealing with DATE/TIME data types, is that SQL Server can read their mind. If they say BETWEEN July 1 2011 AND July 31 2011, they usually mean they want the whole month. But that actually translates to something like the following (number of zeros dictated by precision of data type):

 ... BETWEEN '20110701 00:00:00.000' AND '20110731 00:00:00.000';

In some other platforms, or if your data is stored as DATE or as a DATETIME type but without time, this will work okay. But in most cases we are actually storing date and time data together, and this monthly report is going to fall short, by somewhere between 3 and 4%. (See what I did there?)  I discussed this in a recent addition to my Bad Habits to Kick series.

I also talked recently about the new EOMONTH() function in DenaliSQL Server 2012, which threatens to enable even more people to do the exact same thing. I suspect that many folks will assume that it will find the true "end of the month" depending on the data type. For example, they might expect the following results, given an input of July 21 2011:

SMALLDATETIME  '20110731 23:59:00'
DATETIME       '20110731 23:59:59.997'
DATETIME2      '20110731 23:59:59.9999999'

This is not the case; EOMONTH() will always return the last day of the given month, at midnight, regardless of the data type. The results here are actually:

SMALLDATETIME  '20110731 00:00'
DATETIME       '20110731 00:00.000'
DATETIME2      '20110731 00:00.0000000'

So you can't use EOMONTH() instead of tricks you are probably using today (but shouldn't be), such as subtracting time components from the beginning of the next month:

SMALLDATETIME  DATEADD(MINUTE,      -1,   '20110801') 
DATETIME       DATEADD(MILLISECOND, -3,   '20110801')
DATETIME2      DATEADD(NANOSECOND,  -100, '20110801')

Doing all of this to foster the use of BETWEEN is not going to pay off in the long run. If the underlying data type changes, your queries are going to break – not in the good way, where users report error messages, but rather where the results are just slightly wrong. Let's build a very simple table with six rows and demonstrate several ways that BETWEEN combined with the above tricks and assumptions can mess up query results.

 USE tempdb;
GO
 
CREATE TABLE dbo.Data
(
    OrderID   INT IDENTITY(1,1),
    OrderDate DATETIME2
);
 
INSERT dbo.Data(OrderDate) VALUES
 ('20110630 23:59:59.9999999'),
 ('20110730 15:32:00.0000000'),
 ('20110730 23:59:59.9999999'),
 ('20110731 00:00:00.0000000'),
 ('20110731 23:59:59.9999999'),
 ('20110801 00:00:00.0000000');
GO
 
SELECT OrderID, OrderDate FROM dbo.Data; 
GO

The following query produces three rows, while many expect four (assuming that SQL Server knows they want all data from the end range date):

 SELECT OrderID, OrderDate 
    FROM dbo.Data
    WHERE OrderDate BETWEEN '20110701' AND '20110731';

Now what if the underlying data type changes? Rounding can cause some unexpected results:

 SELECT OrderID, OrderDate 
    FROM dbo.Data
    WHERE CONVERT(SMALLDATETIME, OrderDate) BETWEEN '20110701' AND '20110731';

Now what about some common tricks – such as subtracting a minute from the next day? While this might work if the underlying data type is SMALLDATETIME, with other data types the rounding causes rows to be missed:

 SELECT OrderID, OrderDate 
    FROM dbo.Data
    WHERE OrderDate BETWEEN '20110701' AND DATEADD(MINUTE, -1, '20110801');

Let's try 3 milliseconds. This might work for DATETIME, but not for DATETIME2 – order #5 is still missing:

 SELECT OrderID, OrderDate 
    FROM dbo.Data
    WHERE OrderDate BETWEEN '20110701' AND DATEADD(MILLISECOND, -3, '20110801');

How about a microsecond? Nope, still not right, order #5 is missing:

 SELECT OrderID, OrderDate 
    FROM dbo.Data
    WHERE OrderDate BETWEEN '20110701' AND DATEADD(MICROSECOND, -1, CONVERT(DATETIME2, '20110801'));

How about a nanosecond? Well, now order #5 is included, but order #6 is also included, because this rounded up (yes, also possible with DATETIME2):

 SELECT OrderID, OrderDate 
    FROM dbo.Data
    WHERE OrderDate BETWEEN '20110701' AND DATEADD(NANOSECOND, -1, CONVERT(DATETIME2, '20110801'));

In this case, the answer is to use 100 nanoseconds:

 SELECT OrderID, OrderDate 
    FROM dbo.Data
    WHERE OrderDate 
    BETWEEN '20110701' AND DATEADD(NANOSECOND, -100, CONVERT(DATETIME2, '20110801'));

But still, we can't rely on that because it shifts if the underlying data type changes. If we use SMALLDATETIME, order #1 incorrectly reappears, and order #5 goes missing again. Who wants to keep track of all these rules?

 SELECT OrderID, OrderDate 
    FROM dbo.Data
    WHERE CONVERT(SMALLDATETIME, OrderDate) BETWEEN '20110701' 
      AND DATEADD(NANOSECOND, -100, CONVERT(DATETIME2, '20110801'));

The real answer? Use an open-ended date range. Always. Seriously, try to break it. This will always work, regardless of the underlying data types (I'll explicitly state that I assume that the data types are, in fact, in the date/time family, and not integers or strings):

 SELECT OrderID, OrderDate 
    FROM dbo.Data
    WHERE OrderDate >= '20110701' 
      AND OrderDate <  '20110801';

 

Also, these range boundaries are quite easy to determine, with or without the use of helper functions like EOMONTH(). If you're really excited about using EOMONTH(), please keep the above in mind. You can still use an open-ended range with EOMONTH(), just add a day. e.g.

 ... WHERE OrderDate >= '20110701' 
      AND OrderDate <  DATEADD(DAY, 1, EOMONTH('20110701'));

You might be tempted to use BETWEEN if you are using the DATE data type. While I think this is okay, as I've mentioned in the past, I would place consistency above brevity, and use open-ended ranges in all cases. After all, you never know when that DATE column may change to include more granular information.

Now, I hope that convinces some of you to stop using BETWEEN for date range queries. If the potential for incorrect results doesn't convince you, maybe it will help if you try to remember that when you use BETWEEN, you may see it color-coded in grey (or gray), but I see red. 🙂 

Don't forget to clean up:

 DROP TABLE dbo.Data;
GO

 

25 comments on this post

    • Antony - October 19, 2011, 11:03 PM

      To be fair, incorrect assumptions about date handling and time portions of datetimes are nothing specific to BETWEEN, this issue is the same however you write your date range criteria.
      BETWEEN does was it says on the tin – personally I've never found that to be problematic, at all.

    • AaronBertrand - October 19, 2011, 11:11 PM

      Antony, sure, there are many ways you can mis-handle date range queries, and they aren't exclusive to BETWEEN. However, IMHO, using BETWEEN makes them more likely. This is because, regardless of what it says on the tin (and whether or not you find it problematic), a lot of people don't read the tin, and bring assumptions along with them from other languages or other data types. I've seen it happen often enough that I feel compelled to warn people about its usage. Even if you understand the implications, you don't know who is going to inherit your code and whether they will understand it the same way you do, and you don't know whether your code will be reviewed when data type changes are made. Open-ended ranges are much more reliable, much more consistent, and much more self-documenting.
      And again, this is a blog post, so IMHO goes without saying.

    • Jason Horner - October 19, 2011, 11:26 PM

      Aaron, this is a great blog post. I myself favor the between operator primarily because I'm lazy.
      I also find the syntax to be much more expressive and concise(and misleading as you rightly point out).
      I'm curious to hear your thoughts regarding suggesting a new operator that supports the more intuitive semantics. Do you think that would get any traction or even be a good idea?
      Keep up the awesome work. You exemplify what it means to be an MVP in  mind 8)

    • AaronBertrand - October 19, 2011, 11:32 PM

      Thanks Jason. I certainly agree that BETWEEN makes typing easier. However, except in the case of a single day, since you are always dealing with two points in a range, I don't know why it's necessary to need a single operator (except to avoid having to type the column name twice). When dealing with a single day, I guess an operator such as ON_DAY_OF(<date>) could work, which under the covers would be converted to >= <date> and < <date+1>. I've seen others suggest things like BETWEEN_INCLUSIVE and BETWEEN_EXCLUSIVE – but once you start needing those descriptors, you really offset the perceived benefit in the first place – less typing. I think open-ended ranges are quite intuitive, I just think people are less familiar with them because they're "so hard to type." 🙂
      Anyway, thanks for the comments!
      A

    • Harry - October 20, 2011, 4:40 AM

      Anyone who has a good math background should never have problem with BETWEEN. Good post nevertheless!

    • Dan G - October 20, 2011, 5:55 PM

      Aaron, I really appreciate this post.  Using 'BETWEEN' is the first impulse, and people may not think of the solution that you posted, even though it seems obvious.

    • Todd.D.Nelson - October 20, 2011, 6:03 PM

      After reading this article I can say for sure that I will think twice (at least) before using BETWEEN for date ranges ever again.
      I have used BETWEEN in code and knew that it was an inclusive range – but have to admit to using the minus a second trick – which in my case was accurate at the time – but as you indicate could break if the underlying datatype were to be changed.
      I was also not explicitly aware that the order of parameters mattered – which is good to know.

    • Henk - October 20, 2011, 10:56 PM

      Aaron, thank you for this great post! Too often seen BETWEEN being used to select orders from a certain period resulting ok because there were no orders at midnight…
      But as Antony and Jason say: this discussion will also never end…

    • unclebiguns - October 22, 2011, 11:33 PM

      Just found a bug in some software, likely due to the use of BETWEEN with dates.
      I learned this one from Jeff Moden on SSC a few years ago and NEVER use BETWEEN with dates.

    • Alex Friedman - October 23, 2011, 5:45 PM

      Agreed! BETWEEN is evil and I never use it.

    • Some Crazy Person - October 25, 2011, 5:30 PM

      This is basic stuff. The best solution would be the first and most obvious impluse had BETWEEN never been created.
      If you can't figure out the whole "less than greater than stuff," or even if you're too lazy to use it, then you shouldn't be working with databases anyway.

    • AaronBertrand - October 25, 2011, 6:18 PM

      Well come on now, none of us was born knowing the basics, and it's kind of unfair to expect everyone to have figured this out before they're allowed to touch a database.
      I like to blog about these things because I'd rather have folks read about these issues and prevent them than find out the hard way.

    • Chuck Rummel - November 14, 2011, 12:34 AM

      I wouldn't put the blame only on BETWEEN.  I'd say it's the use of BETWEEN with datatypes having both date and time parts, and is nothing more and nothing less than the combination of two easily forgotten aspects (BETWEEN is INclusive, string dates w/o time default to 00:00) that when combined create situations that can easily trip people up, especially those under pressure, in a hurry, or when called in the wee hours to troubleshoot.  What I appreciate most about this post though is the debunking of several commonly seen attempts to workaround the problem.

    • Aaron Bertrand - November 14, 2011, 2:37 AM

      Thanks Chuck, my approach here mirrors my approach in several other aspects of programming – I'd rather just be consistent than have to keep a list of rules reminding me when it's okay to code a certain way and when it's not. I'm not saying there should be no rules or choices but consistency leads to simplicity IMHO. I tried to make the same point about the INFORMATION_SCHEMA views – why use them at all if there are scenarios where they don't work?

    • Owen White - April 4, 2013, 7:35 PM

      Thank you for the post.  I want to add for people who use stored procedures to use the following parameter for the open ended approach:
      WHERE Table.DateField >= @pStartDate AND Table.DateField < DATEADD(d,1,@pEndDate)

    • Sharon - January 14, 2014, 4:07 PM

      i use the >= and <= operator for searcher date range for 1 date only (e.g dtdate >= '2013-10-1' and dtdate <= '2013-10-1') no data came out…
      thanks…

    • AaronBertrand - January 14, 2014, 5:34 PM

      Sharon, that clause says "greater than or equal to October 1 at midnight *AND* less than or equal to October 1 at midnight." This means only data that happened to be timestamped at exactly midnight would come out. I think you meant:
      >= '20131001' AND < '20131002'

    • Daniel Adeniji - April 3, 2014, 2:28 AM

      Nice work.  You had me at Hello, meaning to say that you had me with the Blog Title.
      Good extensive work.  I guess you had a lot of time on your hands, and the Database Community in its entirely will benefit for many, many years to come.
      I really enjoyed you pointing out that date arithmetic is not an exact science and one needs to be aware of the data type.
      The one instance is where I really benefit from a deep discussion of Time is the one posted by "Kimball Group" @ http://www.kimballgroup.com/1997/07/10/its-time-for-time/.
      Once again, I have a whole filling of Learning for the Day!
      Thanks for sharing!

    • Erwin Smout - July 11, 2014, 11:51 AM

      About the "BETWEEN 5 AND 3" thing : the SQL standard has an option called SYMMETRIC.  4 does not match "BETWEEN 5 AND 3" but it does match "BETWEEN 5 AND 3 SYMMETRIC".
      Optional conformance feature, so I'd be surprised if any engine really had this.

    • Daniel Melguizo - September 9, 2014, 7:43 PM

      Nice article,
      What about write this:
      SELECT OrderID, OrderDate
         FROM dbo.Data
         WHERE Convert(varchar,OrderDate,112) BETWEEN '20110701' AND '20110731';

    • AaronBertrand - September 9, 2014, 8:04 PM

      @Daniel no, two problems with that:
      (1) The explicit conversion will force a table scan, even if there is an index on OrderDate that would be used when using a proper, open-ended range using the right data types.
      (2) varchar without length? Please read:
      /blogs/aaron_bertrand/archive/2009/10/09/bad-habits-to-kick-declaring-varchar-without-length.aspx

    • donviti - September 18, 2014, 5:24 PM

      Very helpful and the time you took is appreciated

    • Martin - December 16, 2015, 1:35 AM

      Thank you for the in-depth analysis for the mistake which I see from novice programmers again and again – now I'll have a link to redirect them to.
      But now I'll have to find how to achieve correct inclusive BETWEEN with date endpoints compared against date&time columns in various SQL servers – Microsoft, MySQL, PostgreSQL, assuming that we always ignore milliseconds part leaving it to 000 (or 0000000, depends on particular SQL server datatype precision).

    • AaronBertrand - December 22, 2015, 12:12 AM

      @Martin Surely all of those platforms support syntax like >= beginning of period and < beginning of next period, regardless of precision.

    • RWatkins - February 8, 2016, 8:15 PM

      Found this article verifying that what I had learned the hard way.  I was fetching records for one day using between and adding that data to the data for the next day.  I surmised that the oddities of between would fade out since I was essentially taking everything.  I failed to take into account midnight – records with the datetime at midnight appear in both datasets thanks to my improperly using between.  >= and < are reliable friends that I am now back to using….

Comments are closed.