January 5, 2010 | SQL Server

My experiences upgrading 2005 => 2008

About two months ago, I began planning a cluster migration for one of our primary SQL Server 2005 clusters to newer hardware, and a simultaneous upgrade to SQL Server 2008 SP1.  The old system runs Windows Server 2003 R2 Enterprise, 32-bit, with 4 dual-core CPUs and 16GB of RAM.  The new system runs Windows Server 2008 R2 Enterprise, 64-bit, with 4 quad-core CPUs and 32GB of RAM.  We were also making substantial upgrades to our disk structure (more on this below).

There were a lot of little details to iron out, including validating that the upgrade would work, testing copy speeds, and creating a checklist of a barrage of connection string updates that would need to be synchronized (too bad Linchi hadn't written his recent bad practices post 6 months ago, as I probably would have spent more time focusing on that aspect of our environment).

In part to minimize overall downtime, in part to ensure there would be a fallback plan, and in part because I wanted to utilize this new server hardware, I opted to perform a side-by-side upgrade.  There were several aspects of the move that would introduce new variables to the server's setup, and unfortunately due to maintenance windows and looming vacations, we didn't have the luxury of making a hardware upgrade independent of upgrading SQL Server itself.

It sounds like a risky endeavor, but it actually went very smoothly.  Here are the general steps I took:

  • Upgrade Advisor
    This pointed out a few things that needed to be updated.  Well, actually, that didn't need to be updated.  We had an outdated stored procedure that was no longer being used but had BACKUP LOG WITH TRUNCATE_ONLY to shrink log files periodically for less important databases.  And the advisor sent me on a few wild goose chases; for example, it complained that "ORDER BY aliases cannot use prefixes" for a few larger stored procedures, though combing through them quite thoroughly, one by one, I could not find an offending statement, and have not observed any kind of failure yet (knock on wood).  Of course we had been running our QA/test environment on SQL Server 2008 for several months, so I wasn't really expecting any surprises here.  But I still think it is a good idea to run the Upgrade Advisor, just in case.
  • Preparing the disks
    The underlying disk subsystem is an EMC^2 CLARiiON, and the LUNs in use by the system were initially set up by an IT person in a bubble about 5 years ago.  All of the LUNs were set up with 10K drives, RAID 5, without proper partition alignment, and with either 4K or 8K allocation unit sizes.  (I know, you can almost smell the I/O overhead, right?)  For the new system, we added shelves that had 15K drives, RAID 10, partition aligned, and with 64K allocation unit sizes.  The difference in I/O overhead has been huge (again, more on this below).  Most importantly, on the old system, we only had three LUNs of any substance; on the new system, there are seven.  This has allowed us to lay out our data and log files more appropriately and to get tempdb data files onto dedicated spindles.
  • Inventory of connection strings
    We had to go through all of our web applications, as well as 3rd party / vendor applications and one-off console applications, and identify all of the locations where we would have to point code at the new cluster.  Since we knew exactly when the migration would start and that these apps would not run again until we turned them on, in cases where it made sense (e.g. classic ASP and VBScript code, where connection strings were not provided by config files), it was very trivial to add IF/ELSE and/or CASE logic to check the current timestamp and connect to the new or old cluster accordingly.  This significantly reduced the number of lines of code and/or config files that had to be touched on the day of the migration, where we would certainly be under more pressure.  As I mentioned earlier, given more time, it probably would have made sense to use this opportunity to take Linchi's approach to connection management (though in some cases we don't want to rely on DNS, and having to change the hosts file on each machine doesn't really gain us much over changing a web.config file).  But at the very least, we were able to document all of these locations so we could revisit our connection strategy in the future.
      
  • Preparing the install
    Slipstreaming is the way to go in SQL Server 2008; Kendra Little has a great blog post about it.  Thanks to Peter Saddow and the rest of the setup team; this has absolutely been the best enhancement to SQL Server's setup module.  Ever.  You wouldn't think that saving the hassle of installing a service pack and then a cumulative update would be that big of a deal, but when dealing with a cluster install, the fewer variables and restarts, the better.
     
  • Setting up jobs
    One of the nice side effects of preparing for a move like this is that you get to do a little housekeeping.  Over the years, several SQL Server Agent jobs in our system had become multi-purpose: "Let's add a step to this job, instead of creating a new job."  The intention was to reduce clutter in the Job Activity Monitor and in 3rd party tools like SQL Sentry Event Manager.  In some cases, you do want jobs to have multiple steps, so that you can prevent different tasks from running over one another.  But quite often the reduction in clutter actually makes these tasks harder to manage and, in some cases, to even find.  This migration gave me an excuse opportunity to reorganize our jobs, create additional categories for easier management, and to clean up some obsolete stuff.
  • Preparing to move the databases
    With a side-by-side upgrade, you can choose to either (a) replicate/mirror then switch, (b) detach/attach, or (c) backup/restore.  Well there are probably other ways we could have considered, but those were the primary three options we had on the table.  Since we had a maintenance window where we could afford downtime, and because there were literally hundreds of databases to transfer, I chose the detach/attach route; I couldn't possibly expect to synchronize that much replication (and it's far too demanding for mirroring), and I didn't want to wait for the window to start to kick off backups, and have to wait for them to finish.  So well in advance of the actual migration day, I built a script that would detach all of the databases, and then a script for each destination LUN that would move the detached database and log files to those new drive letters on the new server.  These scripts would be kicked off independently, to avoid the copies from being serial, but only spread out enough to avoid saturating the controllers and getting in each others' way.

With all that preparation in place, the day of the actual migration was very smooth.  SQL Server was already set up on the new hardware, and the new disks were in place; all we really had to do was kick off the detach / copy scripts, then hurry up and wait.  A few of the attach operations took a little longer than I anticipated, but only by a little and well within the fudge factor I had applied.  We used the full maintenance window to perform unit testing on all of the different applications and modules, and to clean up the minor but inevitable things that we missed (the biggest problem was a database that was left out of the manually tweaked attach scripts, and that was very easy to solve).  All told, we had shut down the system, moved the entire application and nearly 2TB of data, tested all of the apps, and turned everything back on, all in just under 8 hours.

So what did we get for all of our troubles?

In a word : plenty! 

The first and most obvious improvement was in our regular backups.  Previously, we were using RedGate's SQL Backup, which is a fantastic product if you're on SQL Server 2005.  But with 2008's native backup compression, we are getting better sizing, faster.  For example, for a 300GB database, RedGate had been compressing it to 77GB, and natively it is being compressed to 48GB.  And our backup jobs are completing about four times faster: our nightly full backups used to start at 7 PM and finish around 11:45 PM; now they are finished by about 8:15 PM (actually tonight they were finished by 8:24 PM).  (Edit: I want to emphasize that the comparison here is about RedGate on 2005 and old, crappy hardware against native on 2008 and new, shiny hardware – no true apples to apples comparison was performed.)  Restores (which we test nightly, and also use for scheduled historic reporting queries) are about four times faster as well, though that is on another server that did NOT receive hardware upgrades (still on x86, 10K RAID5, etc.), so that is all about compression alone.

My next observation was improved I/O.  Due to all of the differences in our hardware, as well as introducing page compression to the largest tables in our busiest database, our I/O numbers are orders of magnitudes better.  And this has led to less blocking, since we now have far lower waits on I/O (all of our blocking on the old system was due to PAGELATCH_IO_SH).  In our busiest database, we would often see I/O waits exceeding 5 minutes (I eventually became desensitized to these alert conditions because the blocking, while annoying, was mostly affecting background, non-user processes).  Now we see virtually no I/O blocking at all, and PAGELATCH waits have disappeared from our top waits list.

And finally, we are simply observing far better overall system stability.  In addition to the improved I/O, CPU usage is much lower, cache hit ratios are better, and the previously sluggish supporting applications have returned to the snappiness they enjoyed in their infancy.  Just to give an idea of how similar workloads affect the system, I'll leave you with a couple of screen shots from SQL Sentry Performance Advisor.  In both cases, the top image is before the migration, and the bottom image is after the migration.

CPU, before and after (click to embiggen):

 

Memory & Cache Hit Ratios, before and after (click to embiggen): 

 

 
What's next?

I obviously still have some page life expectancy issues to deal with, involving ad hoc plan cache (unfortunately the nature of the system lends itself to a LOT of ad hoc queries).  I also plan to add more data compression, add filtered indexes, and implement partitioning on this system.  And I hope that I will be able to present a strong enough case to convince those above me to gradually migrate all of our systems to x64, 15K RAID10, and SQL Server 2008.  The sooner the better!

13 comments on this post

    • Ekrem Önsoy - January 5, 2010, 12:56 PM

      Thanks for this article Aaron.
      We also are in a progress of moving our databases from SQL Server 2000\2005 to 2008… We have already moved two of our most critical database servers to SQL Server 2008 but the Windows Servers on those hardware still Windows Server 2003. Windows Server 2008 is not accepted because of hardware validation process…
      I just wanted to stress out that updating statistics is not necessary for upgrading from SQL Server 2005 to 2008, however it's needed if someones's going to upgrade from 2000 to 2005\2008. I'm pointing this out because it's really taking time and it's not necessary if the operation would be performed from 2005 to 2008.
      We will be moving our top critical databases from 2005 to 2008 in the first quarter of this year and it makes us excited because it's our core banking system.
      So far so good, even though we have not started using new features of SQL Server 2008 on our prod databases yet.

    • JR Shortall - January 5, 2010, 4:48 PM

      Great article. How much lead time did you have preparing for the big migration?

    • AaronBertrand - January 5, 2010, 5:34 PM

      Ekrem, good point about the statistics.  I didn't think to mention them because I didn't need to do it (and the biggest tables were rebuilt anyway because I could finally add compression).  But for those upgrading from SQL Server 2000, it is definitely one of several things that should be done after upgrading.  More here:
      http://technet.microsoft.com/en-us/library/bb933942%28SQL.105%29.aspx
      (Note that this article is from R2 documentation but it is pretty much identical with the 2008 docs.  Also note that this talks about in-place upgrades but much of the material is also relevant if you use backup/restore or detach/attach methodologies.)
      JR, thanks, to be honest I probably knew four years ago that the day was coming, so mentally at least I had been preparing for it for that long.  However some events around Thanksgiving rushed the process, and the only available weekend to pull off the move was a short two weeks away from that point, so in practical terms that was the lead time (I had a bit of a head start, again mentally, but most of the others involved did not).

    • Glenn Berry - January 5, 2010, 7:10 PM

      Nice writeup, Aaron. You might want to write another post that goes into more detail about your new hardware, your sp_configure settings, etc.

    • josantia - January 5, 2010, 10:17 PM

      Nice Article, Can you elaborate on the configuration on tempdb?  How many disks did you configure for it?

    • Giammarco Schisani - January 6, 2010, 2:59 AM

      Impressive. Thanks for sharing.

    • AaronBertrand - January 6, 2010, 5:50 PM

      Josantia, right now tempdb is split out into 4 x 8 GB files, but due to limited shelf space, leftover drives, and willingness to manage a growing number of LUNs, they are on a single LUN (T:\) that is 133 GB, made up of 2 x 15K drives (in RAID1, again due to limited drives left after allocation to the rest of the system).  Tempdb is not a bottleneck for us, so I could not justify more LUNs and drives to the bean counters, but I still fought very hard to get it on its own dedicated set of spindles, since nothing else is on this LUN.  Previously tempdb files were striped across drives used for data/log, wherever I could afford to put one.  If it becomes a performance issue and I can get the additional LUNs it will be easy enough to move 2 of those files to a new LUN, or 3 files to three new LUNs if it comes to that.  But like I said, tempdb is commonly a sore point, but so far for us it hasn't been.

    • Chris Wood - January 6, 2010, 9:00 PM

      Aaron,
      Can you elaborate a bit more on Red Gate SQLBackup. What version were you using and do you know if Red Gate have responded to make things better? We are running 6.3 and hope to upgrade from SQL2005 SP2/3 builds to SQL2008 some time. In many ways dropping extra software can save time and money (upgrading 5.4 to 6.3 currently takes time and affort) (licence fees for Red Gate could be cut). In other words does having Enterprise SQL2008 pay for not having Red Gate SQLBackup.
      Thanks
      Chris

    • AaronBertrand - January 6, 2010, 10:28 PM

      Funny, RedGate asked some of the same questions.  I will admit that my comparison is not really fair since I did not test RedGate on the new hardware (my suspicion is that most if not all of my performance gains were from the hardware situation and not the software).  I was using 5.4.0.55 IIRC.  I would definitely test in your environment with RG (something I didn't have the luxury to do).  For us, since we were already within our maintenance window on the old system, and since the new timings place us very, very, very comfortably within our maintenance windows, there was no need to seek a 3rd party solution on the new hardware, not for cost reasons (obviously the SQL Backup licensing costs are a drop in the bucket compared to hardware and SQL licenses).  But rather simply to have one less piece of software to manage (though if the native solution had performed *worse* on the new system than RG did on the old, I definitely would have installed it to try it out).
      Again, my main point is: my experience may not reflect yours, so you should test the improvements you see in 2008 both with native backup and 3rd party.  My wording was probably poor in painting an unfair picture of what happened anyway.  It would be ideal if I eventually install the latest version of RG on the new hardware and perform a true apples to apples test where everything else is the same.  But that's not something I'm going to tackle today.

    • Linchi Shea - January 7, 2010, 1:40 AM

      'Embiggen' — nice choice of the word!

    • Brian - January 12, 2010, 10:09 PM

      Now that you are on 2008, you can use the new "Optimize for ad hoc workloads Option" in the SQL 2008 engine to address your ad hoc plan cache issue. You can apply the option without having to restart SQL Server too. We've been using it for about a month now with excellent results.
      http://msdn.microsoft.com/en-us/library/cc645587.aspx
      Brian

    • AaronBertrand - January 12, 2010, 11:04 PM

      Thanks Brian,
      I did set that option (as well as parameterization forced) and, while this combination seems to have collectively helped PLE (right now PLE > 4000), ad hoc plan cache is still bouncing from 30% to 60%… I think the churn is inevitable due to the nature of the system.  Thankfully there are no performance symptoms, it's just something that stands out in the monitoring tool.

    • Bhargav - May 6, 2015, 1:48 PM

      I am tester and got job to test UI after migrating from 2005 to 2008 R2
      Now I am not sure how it is going to impact ui or data in the UI.
      So need help what error I can expect and what should be focus area.

Comments are closed.