SQL Server v.Next ("Denali") : How a columnstore index is not like a normal index
At the end of my Denali presentation at SQLSaturday #65 in Vancouver, a member of the audience asked, "What makes a columnstore index different from a regular nonclustered index?" At the end of a busy day, I was at a loss for an answer, and I'll explain why.
First, I'll briefly explain the basic, core, high-level functionality of a columnstore index (you can read a lot more details in this white paper). Basically, instead of storing index data together on a page, it divvies up the data from each column into its own set of pages. If you are after the data from only one column, this doesn't provide a great advantage over a traditional index, since you're not going to be able to store that many more rows on a page.
The reason I was having a hard time articulating the benefit is because I was thinking about a simple non-clustered index with one column, and comparing it to a columnstore index with only one column defined. The benefit of the columnstore index is that you can basically declare it against most or all of the columns in a table, effectively similar to creating an index on each column – and this is where column store indexes provide the largest benefit. The differences between a column store index and a handful of indexes for each column are (1) size due to the structure of the index and the fact that compression will likely work much better when you can fit more like values on a page, and (2) you only pay for the index maintenance on a columnstore index once, because of what I'll describe next.
The columnstore index has limitations that definitely make you feel like the grass is greener on the other side. Most importantly, in Denali, due to the cost of maintenance, a columnstore index will be read only… essentially, you will need to rebuild the index when data changes. Some other limitations: only one columnstore index per table, the index must be partition-aligned, can't be filtered, and the base object must be a table (so, no indexed views).
This feature is definitely geared to data warehouse scenarios, or where you are free to rebuild indexes nightly or on some other interval. I can't give you a good idea about the cost of maintaining this index manually compared to multiple traditional indexes, since the feature did not make the cut for the publicly available CTP1. But I hope to publish some blog posts with real numbers on both the read and write sides when the next public CTP is made available. I am fairly certain I will be able to demonstrate cases where this type of index will drastically increase read performance (without losing substantial write performance), provided that you are in a scenario where you can update the data on an infrequent schedule, such as nightly.
If you want some detailed background on this technology, please check out Dr. David DeWitt's PASS Summit 2009 keynote slide deck. And if you’re a PASS member, you can access the video at sqlpass.org (login required).
[I am currently at the MVP Summit, and I just want to clarify (in case anyone was wondering), none of the material I discuss above is in violation of NDA.]