Recently, Red Gate Software released a new version of their product called SQL Storage Compress 5.0. This is an alternative to using the native Data Compression and Backup Compression features introduced in SQL Server 2008. In order to utilize Data Compression, you must be running Enterprise Edition or above (and yes, for the nit-pickers, I know this includes Developer and Evaluation, but these aren't real production editions). For Backup Compression, this was relaxed in SQL Server 2008 R2 to also include Standard Edition, however Data Compression remains a big ticket item. So certainly on the surface, SQL Storage Compress seems to be a decent solution for people who:
(a) run SQL Server Standard Edition, and can't take advantage of the native data compression features;
(b) have a mix of editions, and want to use the same compression solution everywhere; or,
(c) commonly perform backup/restore operations between different editions.
Setting it up is quite easy – you run the installer, it installs a service and a configuration tool, and basically you only have to change a few things in your normal routines in order to take advantage of compression from the product.
At first I was surprised that nothing really happened at the end of setup, except that the activation screen was raised. To get to the configuration tool, go to Start > Programs > Red Gate > HyperBac > HyperBac Configuration Manager. This is where you can change settings, most commonly which extensions should actively be compressed by the service:
On the other tabs, you can also start and stop the HyperBac service, inspect its status, and configure locations for log and index files. Red Gate has published a detailed walk-through which helps you learn how to utilize the product once you have it up and running.
To keep things simple, I wanted to create new databases from scratch, mark them as compressed, and then populate them with data. I wanted to measure the following aspects:
Of course I wanted to compare these metrics using both native and Red Gate compression, and use a baseline where no compression was used at all.
Rather than repeat all of the boring test setup and sample data, I will point you to this post of mine from last year:
The tests I ran this week were essentially identical to the tests from the above blog post;there were a few exceptions, for example to set up the Red Gate compressed database, I had to restore an existing backup using WITH MOVE to . I haven't yet tested whether the service works equally on a database where the CREATE DATABASE statement uses the modified extensions; I am not 100% clear on whether the service only listens for RESTORE statements and doesn't pay attention to CREATE DATABASE. More tests later.
The machine, also, is the same – an 8GB VM with 4 cores. The hardware isn't an essential detail of the tests, because they're all running on the same box – but keep in mind that these are local virtual disks, so the I/O certainly doesn't represent what you might be dealing with on a real server with a decent disk subsystem. As always, I recommend you do your own performance tests in your own environment!
For the speed tests, I divided the timing of the results into the three major operational categories: initial load, read, and write, and then stacked these to show overall duration. From the below graph you can see that, as you might expect, the fastest overall speed for the workload was in the database with no compression. For the database using SQL Storage Compress, the initial data load was slower than native compression, but subsequent operations were faster.
<img src="https://sqlblog.org/wp-content/uploads/2018/01/28289_Screen20shot202010-08-2620at2011.29.2320AM.png" width="480" border="0" height="289"> <i>Duration, in seconds, of identical workload replay</i>
Next I compared the data and log file sizes. SQL Storage Compress certainly squeezed the same data into the smallest data file, by almost 50% compared to native data compression. However, this came at quite a cost in log space: the same workload produced almost 300 MB of log compared to about 40 MB of log in both the native compressed and non-compressed databases:
<img src="https://sqlblog.org/wp-content/uploads/2018/01/28289_Screen20shot202010-08-2620at2011.29.3420AM.png"> <i>Size, in MB, of compressed and non-compressed data and log files</i>
Finally, I ran some backups to test eventual backup size. I ran four different tests: simple backup with no compression, a simple backup with native compression, and the two HyperBac compression modes (normal and fast). The timings for these backups were irrelevant, because they all completed within one or two seconds (just not enough data to perform a meaningful comparison). But the resulting sizes had a few surprises:
<img src="https://sqlblog.org/wp-content/uploads/2018/01/28289_Screen20shot202010-08-2620at2011.29.0820AM.png" width="482" border="0" height="319"> <i>Backup size, in MB, of compressed and non-compressed data</i>
Note that even though the original size of the SQL Storage Compressed database was the smallest, taking a normal, non-compressed backup actually yields a much larger file – relatively equivalent to a non-compressed backup of non-compressed data. As of SQL Server 2008 R2, backup compression is now included in Standard Edition and above, so whether or not you have SQL Storage Compress installed, most of you will be able to compress your backups in one way or another – so this is unlikely to hit anyone in practice. Without thinking too deeply about what is in this backup file, the result is peculiar to me. Other than that, the chart speaks for itself – for compressed or non-compressed data, native backup compression and HyperBac compression provide almost identical compression ratios. Native compression has the edge if you are in a situation where you need to restore databases to servers without the SQL Storage Compress service running. I hope to do more tests in the future comparing speed and resource usage as well as size on much larger databases.
As mentioned before, this was a pretty basic test on low-end hardware. This is in a virtual machine with a single C:\, so the lack of I/O distribution probably impacted both solutions negatively when compared to a real-world scenario. Could the SQL Storage Compress performance been optimized? Possibly, especially if I had consulted Red Gate. In general, though, the test was apples-to-apples… and it was quite true to what a typical user might experience when trying out the product for the first time. I'll have to put some thought into making some higher-octane hardware available for further and more realistic performance tests.
As I was running the tests, some questions occurred to me. Three I have answers for; one I do not.
In the native Data Compression solution, you can pick and choose which objects you want to use compression, and you can turn these on and off more or less ad hoc as well. You can do this with SQL Storage Compress as well, but you need to be much better prepared up front: since the Red Gate solution works on a file basis, you would have to place objects on specific physical files within different filegroups in order to compress some objects but not others. This is another thing I plan to play with in future tests.
Here they explain how to recover your database if you have compressed it and the trial has expired:
If the background service stops, your database will become inaccessible. Personally I think there should be a much bigger warning on the Stop button in the Configuration Manager. If you stop the service and re-start it, you will get this error message if you try to use the database:
Msg 945, Level 14, State 2, Line 1
Database 'RGCompression' cannot be opened due to inaccessible files or insufficient memory or disk space. See the SQL Server errorlog for details.
Management Studio still shows the database in the tree, but it is offline (not explicitly, but for all intents and purposes):
A compressed database in SSMS if the SQL Storage Compress service has stopped
And as the error message suggests, there are plenty of errors in the SQL Server error log, all of which – at least on first glance – point to "database gone, no paddle" (click to embiggen):
This aspect looks pretty brittle to me so far – I'll have a follow-up post that explains how to recover your databases should this happen to you.
I would expect so, and if it isn't a requirement, it is probably still a good idea, especially given the above. But I did not have a cluster available to run these initial tests. If you have any information about this, please let me know.
I've either realized or further cemented a few ideas in the process of conducting these tests. In no particular order:
EDIT: I address these questions to some degree in a follow-up post.