More info on Red Gate SQL Storage Compress 5.0

The other day I wrote a blog post about my initial trials with SQL Storage Compress 5.0.  For the most part my experience with the product was fantastic, however there were a few areas where I felt bit uneasy – and wanted to address those concerns.
 

Recoverability

Since I am a troublemaker by nature, one of the first things I tried in my initial round of testing was to access a compressed database while the HyperBac service was stopped.  I simply stopped the service through the HyperBac Configuration Manager, and quickly discovered that my compressed databases were no longer accessible.  Assuming there is no data loss, this situation is easy enough to recover from, by following these steps:

  1. Take the database offline (you can right-click the database in Object Explorer, or use ALTER DATABASE);
  2. Start the HyperBac service;
  3. Bring the database online.

You can also use a more tedious technique, which decompresses each individual mdfx/ldfx file into its equivalent mdf/ldf file, and then simply attach those files as if it were a normal database.  This is similar to the technique you would have to follow if you were using the product and your license expired, or if you decommissioned a SQL Server and only had the .mdfx/.ldfx files available:

  1. Launch a command prompt in the HyperBac\bin directory (in my case, C:\Program Files (x86)\Red Gate\HyperBac\bin\);
  2. Run the following for each mdfx/ldfx file:

    hyperutil.exe /S"<path>\<file>.mdfx" /O"<path>\<file>.mdf" /E
    hyperutil.exe /S"<path>\<file>.ldfx" /O"<path>\<file>.ldf" /E

  • Since these are no longer compressed mdfx/ldfx files but standard mdf/ldf files, use the following syntax to attach the database:

    CREATE DATABASE <db_name>
      ON PRIMARY (FILENAME = '<path>\<file>.mdf')
      LOG ON (FILENAME = '<path>\<file>.ldf')
      FOR ATTACH;

  • Of course in this case you will no longer be able to automatically use compression against these databases, unless you backup and restore them as mdfx and/or ldfx files again.

    In a thread on the Red Gate Forum, Jeffrey Aven agreed that it might be beneficial to warn users of the impact when pressing the Stop button in the HyperBac Configuration Manager; so, they are going to consider addressing this in a future update to the product.  While this doesn't help people who might (even inadvertently) stop the service via Task Manager, the Services applet or NET STOP, at least the casual goof might be prevented.
     

    Log Size / Duration

    In my first tests, the log file for the HyperBac compressed database was roughly 10X larger than the log file for native or non-compressed data.  I checked multiple times that for each test everything was the same – identical workload, same isolation level, same autogrowth settings, same recovery model.  While I had no problem creating this scenario multiple times on the first day, I've had trouble re-creating it since the first time I stopped and re-started the HyperBac service.  Today I ran some tests with a little more data being pumped through to see if I could re-create it, and thought I would share more pretty graphs.  I conducted four different tests instead of three:

    1. No compression
    2. Native compression
    3. Red Gate compression on both data and log files
    4. Red Gate compression on data file only

    Here were the comparisons of time required to perform data load and update workload tasks:

     
    <img src="https://sqlblog.org/wp-content/uploads/2018/01/28388_RG_2_Time.png">
      <i>Time, in seconds, required to perform workload tasks</i>

    From this we can see that, when the background service has to compress both data file and log file information, the time required to run the workload is dramatically affected.

    Next are the comparisons to data, log and overall size using the various compression methods:

     
    <img src="https://sqlblog.org/wp-content/uploads/2018/01/28388_RG_2_Size.png">
      <i>Size, in MB, of data and log files after workload</i>

    While the log is obviously larger when we don't have HyperBac compress it, it seems that for volatile workloads (at least in typical OLTP applications) it may be worth considering only compressing the data file(s) and leaving the log file in the default mode.  The duration for the workload when both log and data were being compressed by the HyperBac service was more than 3X longer than compressing just the data file.  And while the uncompressed log file was much larger in that case, the overall size was not much larger than native compression, and the time required was much more in line as well.

    (I'll be sharing my workload scripts with Red Gate, and can publish them here on request as well.)

    Performance Overhead

    Again I did not perform any tracing, but I did pay a little more attention to spot checking Task Manager to see the impact on CPU.  During INSERT/UPDATE operations the CPU was running pretty hot; the biggest consumers were (predictably) sqlservr.exe and HyperBacSrv.exe:

    I then discovered that the HyperBacSrv.exe process utilizes a substantial amount of CPU even when there is no activity against SSC compressed databases; it utilized anywhere from 12 to 60% of the CPU while a workload was running against a non-compressed database.  I'm not saying this is alarming, just something to watch for when running your own tests.  The CPU overhead of native compression and how it affects sqlservr.exe still appears to be in the 5% range for my scenario – but again I will need to run more elaborate trace and perfmon tests to provide more accurate overhead analysis. 

    Conclusion

    SQL Storage Compress 5.0 seems to be a candidate worthy of testing.  In environments with adequate CPU resources and where disk space and/or I/O are your primary bottlenecks, or where you don't have the ability to use native data compression, the product can fill a tangible gap.  But as always, to determine if a product will be beneficial to your environment, I will always recommend that you perform your own benchmarks and are aware of the risks and overhead.
     

    Aaron Bertrand

    I am a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. I am a long-time Microsoft MVP, write at Simple Talk, SQLPerformance, and MSSQLTips, and have had the honor of speaking at more conferences than I can remember. In non-tech life, I am a father of two, a huge hockey and football fan, and my pronouns are he/him. If I've helped you out, consider thanking me with a coffee. :-)

    4 Responses

    1. DonaldC says:

      Just a quick note that the quirk with repeated restores of a database I noted in my earlier comment has now been fixed in release 5.1 of Storage Compress.

    2. DonaldC says:

      Excellent pair of articles on SQL Storage Compress – we trialled HyperBac Online (v.4.3) and have made the decision to purchase for elements of our Dev and Test environments. Our testing showed it to be a robust and effective tool for compressing data files, but with certain idiosyncrasies. Performance testing within this virtualised environment (VMWare ESX 4.1, running on HP BL460C blades with storage on an EVA 6000) showed that running compression on data and log files caused a notable degradation in write and read operations (a factor of 2-3). This was much less marked with only data files compressed. On the other hand, compression ratios of between 5:1 and 3:1 were achieved, so our decision was a trade-off between cost of storage and accepting a performance hit.
      There is one oddity which users do need to be aware of if they are using SQL Storage Compress as an integral part of a regular (nightly or weekly, say) refresh of a test environment from backups of production data. If the database restore scripts use RESTORE with REPLACE, each restore adds what appears to be an initialisation overhead of around 2% of file size. After 30 increments, the restored, compressed database will be almost the same size on disk as the original, uncompressed file. In principle, this incremental growth will continue indefinitely until all available disk space is exhausted. I guess the immediate resolution is to use DROP and RESTORE in scripts, but this seems to be a foible of the product.
      Overall, this is another good HyperBac-based product, but I'd be cautious in using it in a production environment.

    3. Ian Baber says:

      From our testing we've discovered that if you shrink the log file of a Hyperbac compressed database, the physical OS file doesn't shrink.
      Space is only release from the physical OS file when the database is detatched / attatched.

    4. TheSQLGuru says:

      I need to chat with Jeff Aven about that 12-60% CPU utilization while workloads are running against NON-compressed databases.  One would think a file filter wouldn't do that when activity was happening to non-intercepted file types…