T-SQL Tuesday #146: Preconceived notions

For T-SQL Tuesday this month, Andy Yun (@SQLBek) asks us to talk about times where we've overcome preconceived notions. Having been in this industry for 25 years, and still learning every single day, let me tell you, I have plenty to choose from. But something happened just last week that I thought was perfect timing and fit right in with the theme.

Filtered Indexes

Confession time. For filtered indexes, I have long held the following impressions:

That the definition of the filter had to exactly match the predicate in the query.
That col IN (1, 2) was not supported in a filter definition, since it's the same as col = 1 OR col = 2, which is definitely not supported.

Now, our Posts table has too many indexes, many of which can likely be consolidated. And in a lot of cases, the queries that use these indexes are only after specific values of PostTypeId. Our application code injects values into inline queries like this:

 ... WHERE PostTypeId IN ( {=Question}, {=Answer} );

… which means SQL Server sees hard-coded values …

 ... WHERE PostTypeId IN ( 1, 2 );

I thought that filtered indexes WHERE PostTypeId IN (1, 2) would be beneficial and make some of the index consolidation I've started working on even more effective. However, because of preconceived notion #1, I suggested an index like:

 ... WHERE PostTypeId >= 1 AND PostTypeId <= 2;

I immediately discounted that as impractical – since I also had preconceived notion #2, my thinking was that we would have to change all of the queries to match the filter definition.

Andy (@AMtwo) was quick to point out that my memory was flawed, and that both of my preconceived notions were rubbish. A quick search reminded me of this post by Kendra Little:

Using “OR” and “IN” with SQL Server’s Filtered Indexes

This immediately corrected both of my preconceived notions. Filtered index matching has certainly improved over time, I had just been so careful since the original implementation all those years ago that I didn't notice. I ran some tests and confirmed that, yes, I could create an index using IN (), and I could get index matching even for predicates that didn't match the filter expression exactly (as long as I didn't try to use variables or parameters – see this post from Jeremiah for more details).

It is interesting to note, though, that a query asking for WHERE PostTypeId >= 1 AND PostTypeId <= 2 will not match an index on WHERE PostTypeId IN (1,2) BUT a query asking for WHERE PostTypeId IN (1,2) will match an index on WHERE PostTypeId >= 1 AND PostTypeId <= 2. At least if PostTypeId is the leading key; if not, that falls apart. Not confusing or inconsistent at all.

Anyway, it was all for naught. For a filtered index to be worthwhile, it has to be relatively narrow and it has to eliminate a lot of rows. Unfortunately, the data skew in the Posts table is almost 100% Questions and Answers:

PostType distribution in StackOverflow.dbo.Posts

Since most queries ask for both types, and that's practically the whole table, any index with a filter would not really be any better than the same index without one. So, in this case, they're out. Just not for the original reasons I thought they wouldn't be an option at all.

Filtered Indexes : The Ugly

This discussion ultimately came about because, just the week before, we did solve a couple of other nagging internal problems with filtered indexes. They can definitely be a great solution in the right use case, but many of us have talked about some of their known and lesser-known weaknesses:

A filtered index isn't always possible. And sometimes just moves problems. And even for filtered indexes you can create, you probably shouldn't bother, if:

The filter doesn't eliminate a substantial number of rows. The whole point of the filter is to both improve read performance and also to reduce storage and maintenance costs. Unless the index is really narrow (and can't then satisfy all that many queries), scanning 99% of the table is not much better than scanning 100% of the table, and hardly worth the maintenance costs on virtually all writes.
Using the filter requires changes to consuming queries; especially if those queries are generated in application code, or the changes involve forcing an index hint.
The presence of the filtered index introduces additional burdens - for example, a job started failing when we added a filtered index to a certain table, because that darned SQL Server Agent inexplicably sets QUOTED_IDENTIFIER to OFF.

By: Aaron Bertrand

I am a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. I am a long-time Microsoft MVP, write at Simple Talk, SQLPerformance, and MSSQLTips, and have had the honor of speaking at more conferences than I can remember. In non-tech life, I am a husband, a father of two, a huge hockey and football fan, and my pronouns are he/him.

Daniel Boucher says:

January 17, 2022 at 8:26 PM

I've known about filtered indexes for a long, however, I have never used one and found myself wondering if I wasn't "smart enough" to know I should be. Given the reasons you've given to use one, I now know better. Thank you for the post, Aaron.
Paul White says:

January 11, 2022 at 1:11 PM

A filtered index on X IN (1, 2) can't be matched with a predicate of X BETWEEN 1 AND 2 because the optimizer only reasons about the range of values.

A value of X of say 1.5 would match the predicate but not be contained within the filtered index.

That's not a problem when X is a whole-number type like integer, but the optimizer doesn't currently factor that logic in.

Anyway, that's the why.
- Aaron Bertrand says:
  
  January 11, 2022 at 2:11 PM
  
  Derp, of course, thanks Paul! I need to think beyond integers, but really the only filtered indexes I ever consider are =/IN <integer> or IS [NOT] NULL.
Brent Ozar says:

January 11, 2022 at 7:28 AM

Nice post. Filtered indexes are one of my favorite "so close, but yet so far" features.

Preconceived Notions with Filtered Indexes – Curated SQL

January 12, 2022

[…] Aaron Bertrand has learned a thing or two about filtered indexes: […]
T-SQL Tuesday #146 Round-up! Upending Preconceived Notions | Every Byte Counts

January 24, 2022

[…] Bertrand (@AaronBertrand) https://sqlblog.org/2022/01/11/t-sql-tuesday-146-preconceived-notions Filtered indexes could be SO GREAT but have their fair share of quirks. Aaron tackles two of those […]