SQL Server v.Next (Denali) : Using the OFFSET clause (paging)
A very common request over the past, oh, 10 years, maybe more, has been better support for paging within T-SQL (this Connect request dates back to 2005). Well, you will see some broader support for this within a year, give or take: the next version of SQL Server will offer OFFSET / FETCH syntax. Tobias Ternstrom spilled the beans during yesterday's keynote at SQL Connections, so there's not much to stop anyone from starting to publish code samples (and I posted a very brief and simple example in my post from the SQL Connections keynote). I thought I would just show a quick example using AdventureWorks2008R2, demonstrating both the shorter syntax as well as a slightly more efficient plan.
Here is the way we might handle paging today… several variations exist, but hopefully we can agree that typically the approach is similarly ugly and verbose:
DECLARE @PageSize TINYINT = 20, @CurrentPage INT = 1500; WITH o AS ( SELECT TOP (@CurrentPage * @PageSize) [RowNumber] = ROW_NUMBER() OVER (ORDER BY SalesOrderID), SalesOrderID /* , ... */ FROM Sales.SalesOrderHeader ) SELECT SalesOrderID /* , ... */ FROM o WHERE [RowNumber] BETWEEN ((@CurrentPage - 1) * @PageSize + 1) AND (((@CurrentPage - 1) * @PageSize) + @PageSize) ORDER BY [RowNumber];
Now with the new syntax, coming soon to a virtual machine near you, a lot of this messiness and redundancy goes away:
DECLARE @PageSize TINYINT = 20, @CurrentPage INT = 1500; SELECT SalesOrderID /* , ... */ FROM Sales.SalesOrderHeader ORDER BY SalesOrderID OFFSET (@PageSize * (@CurrentPage - 1)) ROWS FETCH NEXT @PageSize ROWS ONLY;
Isn't that nicer? I thought so.
But wait, what about the plans?
In Management Studio, running the actual execution plan yields very little difference. There is an additional filter (the WHERE clause) and a SEQUENCE project (the ROW_NUMBER()), but otherwise these plans seem pretty equivalent (click to embiggen):
In order to visualize some other facts about these plans, I save the plan XML to disk, and then load it up in Plan Explorer. The first thing I notice right off the bat is that tomayto <> tomatto: in SSMS, the cost % of each plan as a percentage of the batch was rounded off to 50%, but the old way of doing things is (not surprisingly) a wee bit less efficient:
Is it a huge difference? Absolutely not. And when we look at the graphical plans in the default mode in Plan Explorer, we see pretty much the same thing we see in Management Studio.
(Please note that these screen shots are doctored in order to save space. No, I didn't change any data, but I did pretend that Plan Explorer can visualize two execution plans at once.)
However, when we right-click the plan and change Data Widths By to Data Size (MB), something else becomes a little more apparent… nearly 10 MB of data gets passed between multiple operators using the old plan, whereas this transfer only seems to occur once in the new plan:
(And yes, looking back, this should have stuck out to me using the default Rows option as well, but for some reason 9 MB hit home a little harder. In my defense, the lines do get a little fatter when you show data vs. row counts.)
The fact that the "size" of the clustered index scan is passed between many more operators in the old version of the query makes me a little skeptical that it is doing more work than it needs to be, and this could be contributing to the overall cost difference (though I will confess that the I/O costs were actually slightly higher for the new version of the plan).
But by and large, it seems like the work done by the engine under the covers is relatively the same, with only a few minor differences. These differences are unlikely to cause any noticeable change in your existing applications, but keep in mind that this was an overly simplistic test, and that you should always try out alternatives on your own systems, with your own hardware, your own data, and your own load and concurrency. I plan to do some more testing of this feature against some horrific paging scenarios I've had to support in the past.
There are a couple of notes on the syntax that may be of interest:
- In order to support parameterized ordering, you will still have to use dynamic SQL or complex CASE expressions in the ORDER BY clause. Tobias said that it may be extended in future versions, but the bare minimum is all that made the cut for Denali. I'll try to adapt some old examples to the new syntax – while it will still represent less optimal methods, the code should still end up looking less sloppy.
- In order to use OFFSET, you *must* include an ORDER BY clause. If you try something like this (which essentially says we don't care about the order)…
SELECT * FROM sys.objects OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;
…you will get the following error messages:
Msg 102, Level 15, State 1, Line 3 Incorrect syntax near '0'. Msg 153, Level 15, State 2, Line 4 Invalid usage of the option NEXT in the FETCH statement.
In my opinion, this is the way it should be, and it has always bothered me that you could use TOP in various "dirty" ways – without an ORDER BY clause, or simply to fool SQL Server into allowing an inconsequential ORDER BY clause into a view.
In fact, OFFSET and FETCH NEXT FOR are documented in Books Online under the ORDER BY topic.
Some may consider this a small victory, if a victory at all, but I suspect there is a large faction of people out there who are rejoicing in the introduction of a cohesive way to implement paging. It is probably not as flexible as some of you might have expected, but for a very first try I think we'll take what we can get.