SQL Server 2008 R2 : A quick experiment in Unicode Compression
Fellow MVP Simon Sabin blogged today about one of the few engine enhancements we'll be seeing in SQL Server 2008 R2 : Unicode compression. You can read more about the topic in Books Online, but basically what is going to happen is that NCHAR / NVARCHAR (but not NVARCHAR(MAX)) columns, in objects that are row- or page-compressed, can benefit from additional compression, where realistically you can cut your storage requirements in half, depending on the language / character sets in use.
But I'm the kind of guy who has to see it to believe it! So, I mocked up a quick test of storing some
Finnish Danish/Norweigan (?) characters, and ran it both on a SQL Server 2008 instance (SP1 + CU3, 10.0.2723), and a SQL Server 2008 R2 instance (10.50.1092).
USE tempdb; GO SET NOCOUNT ON; GO CREATE TABLE dbo.test(foo NVARCHAR(2048)) WITH (DATA_COMPRESSION = ROW); GO INSERT dbo.test(foo) SELECT RTRIM(RAND()) + REPLICATE(N'øååøæ', 300); GO 50000 EXEC sp_spaceused N'dbo.test'; GO DROP TABLE dbo.test; GO
SQL Server 2008 results:
SQL Server 2008 R2 results:
The difference is astounding: a space savings of roughly 60%, FOR FREE. That's right, this is an enhancement you get just by upgrading… I did not do anything differently about the creation of these tables except continue to use compression. (And note that I also performed this test with page compression, and the results were identical all around.) Keep in mind that if you upgrade at some point (you can't upgrade to R2 in its present form), you will need to rebuild indexes in order to implement this new compression method across the entire table.
And of course, by "free," I am not talking about the licenses. As with row and page compression, this feature is only available in Developer, Enterprise, and Enterprise Evaluation editions.
Now, I'll admit, the test is not super-realistic, and is biased toward good compression (since the same pattern is repeated over and over again on every row and on every page), and as such it demonstrates something pretty close to best-case scenario. But even worst-case scenario is not exactly "bad" — you may not see a gain at all, but you can't lose anything either, because the compression algorithm in SQL Server 2008 is smart enough to know when it is actually going to *lose* space by implementing compression, and won't do it in that case.
At some point I will test the performance of writing, reading and seeking against a Unicode compressed table (and I will come up with more plausible test data at that point). Because nothing is ever really free, is it? Stay tuned to find out.