Playing with CDC in Katmai
Change Data Capture (CDC) is something DBAs around me have been screaming about for ages. For auditing and other reasons, we have needed to rely on 3rd party tools from companies like Lumigent, whose motto for their auditing product previously named Entegra went something like, "Who did what to which data when?" It's a slogan I don't think I'll ever forget, and not just because I have it plastered on a T-Shirt I grabbed at PASS or SQL Connections that year…
Much to the delight of SQL Server DBAs everywhere (and surely to the chagrine of 3rd party vendors like Lumigent), there is now CDC functionality coming in Katmai (sorry, SQL Server 2008), and you can play with it in the June CTP. (If you haven't read in one of the hundreds of blog posts on this site and elsewhere, the June CTP is available here.)
I started playing with this feature, and it is very easy to set up.
CREATE DATABASE CDCTest; GO USE CDCTest; GO EXEC sys.sp_cdc_enable_db_change_data_capture; GO CREATE TABLE dbo.Test ( id INT PRIMARY KEY, tx NVARCHAR(MAX), dt DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ); EXEC sys.sp_cdc_enable_table_change_data_capture @source_schema = N'dbo', @source_name = N'Test', @role_name = NULL; — there are many more options to this procedure; this is the default — behavior, including capturing all columns in the table.
Now, in two out of three tries from scratch, I've found that I needed to restart SQL Server Agent for data changes to start being captured (I filed a suggestion on Connect). So, to be sure, before we put any data into the table, let's go to SQL Server Agent in Object Explorer, expand jobs, right-click "cdc.CDCTest_capture" and choose "Start Job at Step…" and get the job running once. Now:
INSERT dbo.Test(id,tx) SELECT 1, 'foo'; INSERT dbo.Test(id,tx) SELECT 2, 'bar'; WAITFOR DELAY '00:00:05'; UPDATE dbo.Test SET tx = 'splunge' WHERE id = 2;
In order to capture the data changes, there is a new table created in a schema called 'cdc', named <schema>_<table>_CT. In this case, it contains all of the columns from the table, as well as other information including pointers to the log and the type of operation that was captured.
SELECT * FROM cdc.dbo_test_CT;
You should see something like this:
__$start_lsn __$end_lsn __$seqval __$operation __$update_mask id tx dt 0x0000001A000001360013 NULL 0x0000001A000001360012 2 0x07 1 foo 2007-06-21 11:29:43.670 0x0000001A0000013A0003 NULL 0x0000001A0000013A0002 2 0x07 2 bar 2007-06-21 11:29:43.670 0x0000001A0000013B0004 NULL 0x0000001A0000013B0002 3 0x02 2 bar 2007-06-21 11:29:48.670 0x0000001A0000013B0004 NULL 0x0000001A0000013B0002 4 0x02 2 splunge 2007-06-21 11:29:48.670
For the __$operation column, 1 is a delete, 2 is an insert, 3 is the before image of an update, and 4 is the after image of an update. (SELECTs are not captured.) So, we can see my initial two inserts there in the first two rows, and then the last two rows show the before and after image of the update (this is just like the deleted and inserted columns available within a DML trigger).
[An important item to note here is that if you make schema changes to the table, you need to disable CDC at the table level and then re-enable it in order to capture changes to the new columns. This "feature" is documented but I don't think it will be very intuitive.]
This gives you a bit of information, but for me, it wasn't quite enough. What if I want to track (easily!) what user performed that update, changing 'bar' to 'splunge', and when it happened? Well, I created my own logging table, with a representation of the three columns from the unique index on dbo_test_CT, and two additional columns (username and eventDate):
CREATE TABLE cdc.dbo_test_CT_MoreInfo ( startlsn BINARY(10), seqval BINARY(10), operation INT, username SYSNAME NOT NULL DEFAULT SUSER_SNAME(), eventDate DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (startlsn, seqval, operation) );
To capture my own data (essentially, lumping CDC on top of CDC) I created the following trigger on the capture table:
CREATE TRIGGER cdc.LogMoreCDCInfo ON cdc.dbo_test_CT FOR INSERT AS BEGIN IF @@ROWCOUNT > 0 BEGIN INSERT cdc.dbo_test_CT_MoreInfo(startlsn,seqval,operation) SELECT __$start_lsn, __$seqval, __$operation FROM inserted; END END GO
And I can retrieve the information using a query like this:
SELECT operation = CASE m.operation WHEN 1 THEN 'delete' WHEN 2 THEN 'insert' WHEN 3 THEN 'update (before)' WHEN 4 THEN 'update (after)' ELSE 'unknown' END, c.id, c.tx, c.dt, m.username, m.eventDate FROM cdc.dbo_test_CT_MoreInfo m INNER JOIN cdc.dbo_test_CT c ON c.__$start_lsn = m.startlsn AND c.__$seqval = m.seqval AND c.__$operation = m.operation ORDER BY m.eventDate;
[Note that change data is only kept around for 72 hours by default. This retention time is stored in minutes (4320) in the msdb.dbo.cdc_jobs.retention column. I've had reasonable (but not perfect) success in changing this value and having the cleanup job obey.]
Now, you can make the trigger get more information, e.g. host_name from sys.dm_exec_sessions or client_net_address from sys.dm_exec_connections. I just wanted to open the door.
Note that I did not stress test this solution… not that a stress test on an early CTP would be very realistic, anyway. I'm just playing around here, so far at least.