Blogging from the PASS Keynote : 2009-11-05
Bill Graziano took the stage and promised us the shortest keynote yet. He started by giving thanks to outgoing board members Greg Low, Pat Wright and Kevin Kline. Wayne Snyder took over and gave an emotional homage to Kevin, who gave 10 solid years to PASS. Very touching. Kevin left the stage to a standing ovation.
Introduced Brian Moran, Jeremiah Peschka and Thomas LaRock as the new Directors-at-Large, and Rushabh Mehta as the new President. Reaffirmed commitment to the community, and announced the PASS European Conference in Neuss, Germany, April 21-23, 2010. The North American summit is already scheduled for November 8-11, 2010, again in Seattle. (And if you register early enough, you can get in for $995.) A lot of people who live in different time zones have expressed the desire to move the conference around, but I agree with Bill; having the conference this close to Microsoft provides enough benefit to offset the impact on travel. That's my opinion, of course.
Patrick Ortiz, Infrastructure Consulting Services, Dell
Patrick came on and talked about the structure of Dell's operations involving Microsoft architecture (Exchange, SQL Server, SharePoint, etc.). Then he jumped into how they approach the combination of consolidation, disaster recovery, and configuration management. Apologies to Patrick, but there wasn't really anything exciting about his presentation; it seemed more that it should have been an elective session as opposed to a keynote for everyone in attendance. But I guess you get this privilege when you are such a big supporting vendor, and I do hope we collectively appreciate that. Okay, about halfway through, he did have one funny line about typical disaster recovery behavior, which seemed to wake up about 10% of the audience. But this didn't redeem the segment; sorry.
David DeWitt, Data and Storage Platform Division, Microsoft
David came on and made some funny comments about past incidents on stage, including the 192-core server that seemed like it was going to catch on fire when the fans kicked in. David runs the Jim Gray Systems Lab in Madison, WI. He is working on SQL Server Parallel Data Warehouse (or, as David would like to name it, SQL*). He promises to overwhelm us with technical details as opposed to making a marketing-ish presentation.
He compared how things have changed since 1980, including a 1,000X improvement in CPU cache, memory capacity, and CPU performance, and 10,000X increase in storage capacity. Whereas transfer times have only improved 65X, and seek times have only improved 10X. Seems funny that we are worried about getting 32-, 64-, 192-core machines when the disk performance simply can't scale to keep those CPUs busy. In fact when he measures transfer bandwidth per byte of storage, drives are actually 150X slower today compared to 1980, in relative terms. In 1980, the ratio of perf from Sequential : Random is 5 : 1. Today, it is 33 : 1. Meaning we have to focus on sequential reads and move the disk heads as little as possible. He also explained that as much as 50% of the time, the CPUs is sitting there, waiting for the memory to deliver something into its L2 caches.
David's idea about improving the storage bottleneck problem is to use column-wise storage instead of row-wise. Essentially, imagine storing all the values for each column, instead of each row, on common pages. The example showed how you could store ~2,000 values for a BalanceDue column (INT) on a single page, as opposed to the page being crowded by the other columns, and therefore being able to store far fewer rows on each page. (You still have to worry about the I/O for the other column values you want to retrieve; however a subset of columns will be faster in this model. SELECT * will never be faster, of course. But we usually don't want SELECT *, right?) This is a really interesting concept, and at its core it is quite simple, but implementation in existing architectures is far from trivial.
Since disk capacities have gotten 10,000X better, you can store redundant copies using different sort orders. Especially because with columnar storage, you can compress very well, leading to great reductions in storage requirements – leaving plenty of free space that will otherwise go to waste. By using run length encoding compression – in a certain sort order, you only need to store the offsets of contiguous rows that . Bit-vector encoding and dictionary encoding can be combined with run length encoding to achieve really fantastic compression rates; David's research yields improvements from 3X to 10X over row store.
Compression makes a lot of sense in this case because (remember) CPU is 1,000X faster than it used to be, and disk is only 65X. So any time we can trade CPU cycles in exchange for less I/O, we should do it. Basically we are striving to move the majority of the work to thecomponent(s) of the system that have improved the most over time (and continue to do so).
He explained the difference between early materialization and late materialization (where materialization is the process of turning the columns into rows) – queries with joins should use early materialization because they need to process against the whole table; queries without joins can use later materialization which again pushes the materialization work to CPU.
Updates are the big problem here… you have these very tightly packed columns, so you store deltas (which the queries must observe) and occasionally rebuild. Not suitable for OLTP or cases where reads occur against more than half of the columns of a table.
Microsoft is shipping VertiPaq, an in-memory column store, in SQL Server 2008 R2. So the hint is that there is definitely some work in this area underway for SQL11.
My mind is starting to hurt, but there are some very cool ideas here.
Note that throughout David's time on stage, Twitter was still abuzz with complaining about the Dell portion of the keynote. My favorite was from Steve Jones: "@BrentO Somebody tell the Dell guy PASS is in Orlando next year."
Wayne came on stage and announced that the keynote will be available on the DVDs. Just one more reason the $125 will be worth every penny.