Troubleshooting Service Broker through SQL Server error logs
Last week I was trying to troubleshoot some activation stored procedures in Service Broker. One nice thing about this framework is that it logs activation procedure errors in the SQL Server error log, since activation is not an interactive experience. You will see error messages like this:
(And yes, I could have disabled activation and debugged manually, but since the errors were few and far between and depended upon user input, this was not trivial. You can see this article for a tutorial on this, as well as some of the limitations on actively debugging a Service Broker queue. There is also useful some information in this page from Books Online.)
On the down side, loading these logs through SQL Server Management Studio is cumbersome, especially if your instance has been up for some time. You have to wait for the entire set of log data to load, and only then can you filter to restrict the results to only include error messages like the above. So I started thinking about a better way of reviewing the error logs without relying on the GUI within SSMS.
My first attempt, of course, was to load the error log files into a work table through BCP or BULK INSERT, then filter and query against it. This was troublesome because (a) I had to dynamically determine the location of the log files for each instance, and (b) the most current log file cannot be accessed in this way because it is currently in use by SQL Server. So, I could get at it this way by shutting down SQL Server and accessing the log files from another instance (or copying the most current log file and then restarting SQL Server), but this is not exactly ideal.
My next attempt was to use xp_cmdshell 'type <path>\ERRORLOG[.x]' with the same intent. This got around the "in use" problem, but it still had the issue that I needed to determine <path> on each instance; also, it introduced a new problem: the output is truncated at 255 characters, which made troubleshooting very difficult indeed, since I could not see a single complete error message.
So my third and final attempt was to use the undocumented stored procedures sp_enumerrorlogs and sp_readerrorlog. Two things you should know about these procedures: (1) they are undocumented, and therefore unsupported; and (2) the behavior can change between SQL Server releases. In fact, the output of sp_readerrorlog DID change between SQL Server 2000 and SQL Server 2005. When I first tried this procedure in SQL Server 2005, I was quite puzzled, and am amazed to still find that most references on the web still show its former behavior. However, I feel fairly confident using both procedures in SQL Server 2005 and SQL Server 2008 instances, since that is what SSMS uses to generate the log viewer within the GUI. Something to keep in mind, though, when there is a new release of Management Studio in a few years.
I wrote a stored procedure that basically provides the same output as the SQL Server error log viewer, but allows me to filter on content and on date range. On one of my more interesting systems I have 500+ databases, so another important thing I filter out of the output by default is database/log backup activity. You can add your own filters to leave out information having to do with replication, or mirroring, or cluster failovers, etc.
One surprise was realizing that the LogDate is stored in UTC and not local time. This is logical of course, but led to some finger-scratching when I was searching for errors that happened in the last 5 minutes. It would be tricky to convert all log data to local time, especially if your log files straddle a daylight savings time boundary; so just remember that when passing datetime values into the stored procedure. If you use your local time for boundaries, you might unknowingly leave out relevant events, or include irrelevant ones. Anyway, here is the procedure:
And here is the syntax I used to discover all service broker activation errors over the past few days:
Hope you find the procedure useful. I'm all ears, of course, if you spot problems or have enhancement suggestions.