November 28, 2008 | SQL Server

Troubleshooting Service Broker through SQL Server error logs

Last week I was trying to troubleshoot some activation stored procedures in Service Broker.  One nice thing about this framework is that it logs activation procedure errors in the SQL Server error log, since activation is not an interactive experience.  You will see error messages like this:

The activated proc dbo.procedure_name running on queue database_name.dbo.queue_name output the following: '<error message>.'

(And yes, I could have disabled activation and debugged manually, but since the errors were few and far between and depended upon user input, this was not trivial.  You can see this article for a tutorial on this, as well as some of the limitations on actively debugging a Service Broker queue.  There is also useful some information in this page from Books Online.) 

On the down side, loading these logs through SQL Server Management Studio is cumbersome, especially if your instance has been up for some time.  You have to wait for the entire set of log data to load, and only then can you filter to restrict the results to only include error messages like the above.  So I started thinking about a better way of reviewing the error logs without relying on the GUI within SSMS.

My first attempt, of course, was to load the error log files into a work table through BCP or BULK INSERT, then filter and query against it.  This was troublesome because (a) I had to dynamically determine the location of the log files for each instance, and (b) the most current log file cannot be accessed in this way because it is currently in use by SQL Server.  So, I could get at it this way by shutting down SQL Server and accessing the log files from another instance (or copying the most current log file and then restarting SQL Server), but this is not exactly ideal.

My next attempt was to use xp_cmdshell 'type <path>\ERRORLOG[.x]' with the same intent.  This got around the "in use" problem, but it still had the issue that I needed to determine <path> on each instance; also, it introduced a new problem: the output is truncated at 255 characters, which made troubleshooting very difficult indeed, since I could not see a single complete error message.

So my third and final attempt was to use the undocumented stored procedures sp_enumerrorlogs and sp_readerrorlog.  Two things you should know about these procedures: (1) they are undocumented, and therefore unsupported; and (2) the behavior can change between SQL Server releases.  In fact, the output of sp_readerrorlog DID change between SQL Server 2000 and SQL Server 2005.  When I first tried this procedure in SQL Server 2005, I was quite puzzled, and am amazed to still find that most references on the web still show its former behavior.  However, I feel fairly confident using both procedures in SQL Server 2005 and SQL Server 2008 instances, since that is what SSMS uses to generate the log viewer within the GUI.  Something to keep in mind, though, when there is a new release of Management Studio in a few years.

I wrote a stored procedure that basically provides the same output as the SQL Server error log viewer, but allows me to filter on content and on date range.  On one of my more interesting systems I have 500+ databases, so another important thing I filter out of the output by default is database/log backup activity.  You can add your own filters to leave out information having to do with replication, or mirroring, or cluster failovers, etc.

One surprise was realizing that the LogDate is stored in UTC and not local time.  This is logical of course, but led to some finger-scratching when I was searching for errors that happened in the last 5 minutes.  It would be tricky to convert all log data to local time, especially if your log files straddle a daylight savings time boundary; so just remember that when passing datetime values into the stored procedure.  If you use your local time for boundaries, you might unknowingly leave out relevant events, or include irrelevant ones.  Anyway, here is the procedure:

USE [MyUtilityDatabase];
GO
IF OBJECT_ID('dbo.SearchSQLErrorLogs', 'P') IS NOT NULL
   DROP PROCEDURE dbo.SearchSQLErrorLogs;
GO
CREATE PROCEDURE dbo.SearchSQLErrorLogs
   @SearchString   NVARCHAR(255) = '',
   @StartDate      DATETIME      = NULL,
   @EndDate        DATETIME      = NULL,
   @IgnoreBackups  BIT           = 1,
   @CurrentLogOnly BIT           = 0
AS
BEGIN
   SET NOCOUNT ON;

   IF OBJECT_ID('tempdb..#Log', 'U') > 0
       DROP TABLE #Log;
       
   IF OBJECT_ID('tempdb..#Logs', 'U') > 0
       DROP TABLE #Logs;

   DECLARE
       @FirstLog   INT,
       @LastLog    INT,
       @CurrentLog INT;

   IF @StartDate IS NULL
       SET @FirstLog = 6;
       
   IF @EndDate IS NULL
       SET @LastLog = 0;
       
   SELECT 
       @StartDate = COALESCE(@StartDate, '19000101'),
       @EndDate = COALESCE(@EndDate, GETUTCDATE()),
       @SearchString = '%' + COALESCE(@SearchString, '') + '%';

   IF @CurrentLogOnly = 0 OR COALESCE(@FirstLog, @LastLog) IS NULL
   BEGIN
       CREATE TABLE #logs
       (
           ArchiveNo TINYINT,
           dt DATETIME,
           sz BIGINT
       );

       INSERT #logs
           EXEC [master].dbo.sp_enumerrorlogs
               @p1 = 1;

       SELECT
           @FirstLog = MAX(ArchiveNo),
           @LastLog = MIN(ArchiveNo)
       FROM
       (
           SELECT 
               ArchiveNo,
               [Start] = COALESCE(
                   (
                       SELECT dt FROM #Logs
                       WHERE ArchiveNo = L.ArchiveNo + 1
                   ), '19000101'),
               [End] = dt
           FROM
               #Logs AS L
       ) AS x
       WHERE
           [Start] BETWEEN @StartDate AND @EndDate
           OR [End] BETWEEN @StartDate AND @EndDate;

       IF OBJECT_ID('tempdb..#Logs', 'U') > 0
           DROP TABLE #Logs;
   END

   IF @CurrentLogOnly = 1
   BEGIN
       SELECT @FirstLog = 0, @LastLog = 0;
   END

   CREATE TABLE #Log 
   (
       LogDate     DATETIME,
       ProcessInfo VARCHAR(64),
       MessageText NVARCHAR(MAX)
   );
   CREATE CLUSTERED INDEX d ON #Log(LogDate);

   SELECT @CurrentLog = @FirstLog;

   WHILE @CurrentLog >= @LastLog
   BEGIN
       INSERT #Log
           EXEC [master].dbo.sp_readerrorlog
               @p1 = @CurrentLog;
               
       SET @CurrentLog = @CurrentLog - 1;
   END

   IF @IgnoreBackups = 1
   BEGIN
       DELETE #Log
           WHERE ProcessInfo = 'Backup'
           AND (
               MessageText LIKE 'Database backed up%'
               OR MessageText LIKE 'Log backed up%'
           );
   END
   
   SELECT LogDate, MessageText
       FROM #Log
       WHERE LogDate BETWEEN @StartDate AND @EndDate
       AND MessageText LIKE @SearchString
       ORDER BY LogDate;

   IF OBJECT_ID('tempdb..#Log', 'U') > 0
       DROP TABLE #Log;
END
GO

And here is the syntax I used to discover all service broker activation errors over the past few days:

DECLARE @start SMALLDATETIME;

SET @start = DATEADD(DAY, -3, DATEDIFF(DAY, 0, CURRENT_TIMESTAMP));

EXEC [MyUtilityDatabase].dbo.SearchSQLErrorLogs
   @StartDate = @start,
   @SearchString = 'activated proc%myActivationProc%on queue%myQueueName';

Hope you find the procedure useful.  I'm all ears, of course, if you spot problems or have enhancement suggestions.

2 comments on this post

    • Jimmy May, Aspiring Geek - November 29, 2008, 7:28 PM

      Excellent post, Aaron.
      If you get really ambitious, Ward Pond blogged about "Slaying The Time Zone Conversion Beast (or at least rendering it comatose)"
      http://blogs.technet.com/wardpond/archive/2006/06/09/database-programming-slaying-the-time-zone-conversion-beast-or-at-least-rendering-it-comatose.aspx

    • AaronBertrand - November 29, 2008, 11:49 PM

      Thanks Jimmy, I use a technique similar to Ward's in a couple of our end user applications so that reports are rendered in the time zone of the user rather than what's stored in the database (the database only holds UTC values for date/time).  I use a TimeZones table, a TimeZoneID in the users table, and then a calendar table a la http://www.aspfaq.com/2519 … however with this case it is tough to provide such a wrapper around it because it is a generic stored procedure meant for anyone's use.  They just need to understand that the error log stores data in UTC format, that's all…

Comments are closed.