Playing with CDC in Katmai
Change Data Capture (CDC) is something DBAs around me have been screaming about for ages. For auditing and other reasons, we have needed to rely on 3rd party tools from companies like Lumigent, whose motto for their auditing product previously named Entegra went something like, "Who did what to which data when?" It's a slogan I don't think I'll ever forget, and not just because I have it plastered on a T-Shirt I grabbed at PASS or SQL Connections that year…
Much to the delight of SQL Server DBAs everywhere (and surely to the chagrine of 3rd party vendors like Lumigent), there is now CDC functionality coming in Katmai (sorry, SQL Server 2008), and you can play with it in the June CTP. (If you haven't read in one of the hundreds of blog posts on this site and elsewhere, the June CTP is available here.)
I started playing with this feature, and it is very easy to set up.
CREATE DATABASE CDCTest;
CREATE TABLE dbo.Test
id INT PRIMARY KEY,
dt DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
EXEC sp_cdc_enable_table_change_data_capture 'dbo', 'Test', @role_name = NULL;
— there are many more options to this procedure; this is the default
— behavior, including capturing all columns in the table.
INSERT dbo.Test(id,tx) SELECT 1, 'foo';
INSERT dbo.Test(id,tx) SELECT 2, 'bar';
WAITFOR DELAY '00:00:05';
UPDATE dbo.Test SET tx = 'splunge' WHERE id = 2;
In order to capture the data changes, there is a new table created in a schema called 'cdc', named <schema>_<table>_CT. In this case, it contains all of the columns from the table, as well as other information including pointers to the log and the type of operation that was captured.
SELECT * FROM cdc.dbo_test_CT;
You should see something like this:
__$start_lsn __$end_lsn __$seqval __$operation __$update_mask id tx dt
0x0000001A000001360013 NULL 0x0000001A000001360012 2 0x07 1 foo 2007-06-21 11:29:43.670
0x0000001A0000013A0003 NULL 0x0000001A0000013A0002 2 0x07 2 bar 2007-06-21 11:29:43.670
0x0000001A0000013B0004 NULL 0x0000001A0000013B0002 3 0x02 2 bar 2007-06-21 11:29:48.670
0x0000001A0000013B0004 NULL 0x0000001A0000013B0002 4 0x02 2 splunge 2007-06-21 11:29:48.670
[An important item to note here is that if you make schema changes to the table, you need to disable CDC at the table level and then re-enable it in order to capture changes to the new columns. This "feature" is documented but I don't think it will be very intuitive.]
This gives you a bit of information, but for me, it wasn't quite enough. What if I want to track (easily!) what user performed that update, changing 'bar' to 'splunge', and when it happened? Well, I created my own logging table, with a representation of the three columns from the unique index on dbo_test_CT, and two additional columns (username and eventDate):
CREATE TABLE cdc.dbo_test_CT_MoreInfo
username SYSNAME NOT NULL DEFAULT SUSER_SNAME(),
eventDate DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (startlsn, seqval, operation)
CREATE TRIGGER cdc.LogMoreCDCInfo
IF @@ROWCOUNT > 0
SELECT __$start_lsn, __$seqval, __$operation FROM inserted;
And I can retrieve the information using a query like this:
operation = CASE m.operation
WHEN 1 THEN 'delete'
WHEN 2 THEN 'insert'
WHEN 3 THEN 'update (before)'
WHEN 4 THEN 'update (after)'
ELSE 'unknown' END,
FROM cdc.dbo_test_CT_MoreInfo m
INNER JOIN cdc.dbo_test_CT c
ON c.__$start_lsn = m.startlsn
AND c.__$seqval = m.seqval
AND c.__$operation = m.operation
ORDER BY m.eventDate;
[Note that change data is only kept around for 72 hours by default. This retention time is stored in minutes (4320) in the msdb.dbo.cdc_jobs.retention column. I've had reasonable (but not perfect) success in changing this value and having the cleanup job obey.]
Now, you can make the trigger get more information, e.g. host_name from sys.dm_exec_sessions or client_net_address from sys.dm_exec_connections. I just wanted to open the door.
Note that I did not stress test this solution… not that a stress test on an early CTP would be very realistic, anyway. I'm just playing around here, so far at least. 🙂