Keeping sysdepends up to date in SQL Server 2008
sysdepends, sp_depends and sys.sql_dependencies
Since my very first experience with SQL Server (version 6.5), I have used the system table sysdepends and the system stored procedure sp_depends to track dependency information in my databases. The purpose of tracking dependency information is to allow me to determine which objects are required; this can prevent me from deploying only parts of a solution, and not realizing it until runtime. It is also a quick way to get a visual picture of a database's organization.
A new catalog view was introduced with SQL Server 2005, called sys.sql_dependencies. Other than the fact that this uses the new sys schema and has a different column set, I have yet to stumble upon any case where dependency information is tracked differently in this new view, when compared to sysdepends.
As you can imagine, based on the volume of activity in newsgroups and other forums involving sysdepends, this has not always been a smooth exercise. Sadly, these system objects are not perfect, due to several complications; I will first discuss some of these limitations, and then I will provide a better solution that will take care of a couple of them, at least.
Dropping / Renaming Dependent Objects
A problem that has been around since the 6.5 days (see KB #115333) is that when you create a table, then create a stored procedure that references the table, and then drop the table; the dependency information remains in sysdepends, but points to an object_id that no longer exists. The table may later get re-created, but it will almost certainly have a different associated object_id. Furthermore, in older versions of SQL Server, the old reference could remain orphaned in sysdepends permanently; thankfully, this has been fixed in modern versions. But even in SQL Server 2008, because dependency information is not refreshed during DDL events such as CREATE TABLE, a new reference is not created unless you force it to happen – by dropping and re-creating the procedure, or calling sys.sp_refreshsqlmodule.
This problem can also affect the ability of the system to track dependency on columns. Consider the case where you write a stored procedure that references columns 'a' and 'b' of table 't', then you drop column 'a' and rename column 'b' to 'c'. Some of the mess is left in sysdepends and sys.sql_dependencies. Thankfully, the system will not let you create a procedure that references a column that does not exist (unless the table also does not exist), but once the stored procedure is created, there is nothing stopping you from changing the column (such as renaming it) or dropping it from the table altogether. In this case there is nothing in sysdepends or sys.sql_dependencies that indicates the stored procedure is referencing a column that no longer exists… the user is left to find that out at runtime.
Cross-Database and Cross-Server Dependencies
The system table sysdepends does not maintain dependency information for any objects located outside of the local database. This means that if you want to check the validity of objects with three- or four-part names in SQL Server 2005 or earlier, you are going to have to manually parse the definitions of all of your objects. Native tools are never going to report the validity of cross-database or cross-server dependencies, but in SQL Server 2008, you will be able to get a report of the dependencies that are expected – without any nasty parsing.
Deferred Name Resolution
In SQL Server 7.0, Microsoft started a practice that many of us regret today: deferred name resolution. This allows you to create an object (such as a stored procedure) that references another object (such as a view or table) that does not yet exist. The assumption here is that you will create the procedure now, but won't actually call it until later, after the view or table has been created. The problem is that, by default, the dependency information is not updated when the view or table is created; so, even though your structure is intact, the dependency checks cannot be made (so, the table could later be dropped, and nothing would be checked). Deferred name resolution also allows you to create objects with circular references; for example, procedure A calls procedure B, B calls C, and C might conditionally call A.
Deferred name resolution does not allow you to reference an object at a linked server, or reference an invalid column for an object that exists on the current instance. But it does allow you to reference an object in any database on the current instance that does not yet exist. (To understand better why it works this way, you can read more about deferred name resolution here: http://msdn.microsoft.com/en-us/library/ms190686(SQL.90).aspx (SQL 2005) and here: http://msdn.microsoft.com/en-us/library/ms190686(SQL.100).aspx (SQL 2008).)
One of the greatest adversaries of deferred name resolution is fellow SQL Server MVP, Erland Sommarskog. He calls for a much more strict set of checks, similar to OPTION EXPLICIT in Visual Basic, which would prevent deferred name resolution all. You can read about this in his article at the following URL: http://www.sommarskog.se/strict_checks.html and a Connect item at the following URL:
To correct dependencies for a table that has been created after objects that reference it, you can manually call the system stored procedure sys.sp_refreshsqlmodule for each referencing object. The trick here still is to determine which objects might reference it; this part, at least, is solved to some extent in SQL Server 2008, and I will talk about this further on.
Another problem is when you have a stored procedure that refers to an object in dynamic SQL. The system does not parse the contents of dynamic SQL that is executed via EXEC() or sp_executeSQL, even if it is passed or created in its entirety in a variable; so you can have a stored procedure like this, and as of SQL Server 2008, it will never be caught by system dependency checking:
In this case, you are pretty much on your own to ensure that you reference valid objects. Nothing native in SQL Server will inspect this code for you, and surely there are ways you can write dynamic SQL that will prevent your own parsing routines or third party applications from getting the information correctly and reliably (for example, in many cases, dynamic SQL is used when you are passing in table or database names, so they could only be parsed at runtime anyway).
What do people do now to find dependencies not reported by sysdepends?
In addition to the information that they get from sysdepends, parsing is also often the route of choice for local dependency tracking, however this can be unreliable as well. For example, let's assume I are looking for references to '%dbo.table_name%', and I search the object_definition of all of your objects for this text. A few of cases where I will get incorrect results:
1) dynamic SQL that interrupts the name, e.g. EXEC('SELECT * FROM dbo.table' + '_name');
2) dynamic SQL that uses passed object names, e.g. EXEC('SELECT * FROM ' + @table);
3) poor but accepted format, e.g. SELECT * FROM dbo . table_name; and,
4) false positives, e.g. SELECT * FROM dbo.table_name_2.
So, parsing the definitions of all of your SQL modules will get you a little bit closer, but is still only a small part of the picture. If you want to see a very thorough T-SQL parsing routine, that not only finds many missing dependencies, but also generates a script for the objects in correct dependency order, see this article in SQL Server Magazine by Omri Bahat:
A Dependency Checker You Can Depend On
Third Party Tools
If you are not a do-it-yourself kind of DBA, there are several tools available that will do a slightly better job than SQL Server will do natively. Ironically, though, some of the 3rd party tools designed to compensate for SQL Server's weaknesses get much of their information from these same tables. The two that seem to receive the best praise are made by ApexSQL and RedGate.
Red-Gate SQL Dependency Tracker
New capabilities in SQL Server 2008
In SQL Server 2008, there is a new catalog view, called sys.sql_expression_dependencies. Before I begin, I will tell you first that, while this view is better at solving a few of the problems outlined earlier, it is not the answer to dependency problems. For example, it will store the name of a table referenced in a stored procedure, even if that table does not exist yet. And when the table is created, this view will be updated with the correct referenced_object_id. However, sysdepends is still out of date; for this, you still need to manually run sp_refreshsqlmodule against the stored procedure(s) that reference the table.
Note: when using SQL Server 2008's new feature in SQL Server Management Studio, IntelliSense, to write stored procedures against a SQL Server 2008 database, unintentional deferred name resolution is going to be less of a problem. This is because invalid objects will be highlighted with a red squiggly line:
Another great feature of sys.sql_expression_dependencies that is an improvement over sysdepends and sys.sql_dependencies is that it will also store the names of references to three- or four-part names, so you can now get a list of dependency objects to check, without having to parse the contents of each module. And while deferred name resolution will allow a procedure to be created even though it references a non-existent object in a different database on the same instance, it will not allow this for an object on a linked server; the only problems you will find with objects found on a linked server are that the server is down or otherwise unreachable, or the referenced object has been dropped, moved or renamed since the dependency information was recorded.
There are also two new dynamic management functions, called sys.dm_sql_referenced_entities and sys.dm_sql_referencing_entities, which will parse referenced or referencing objects for dependencies on demand, instead of looking them up statically in sysdepends. There are problems with this too, though; in the case of deferred name resolution, the views generate errors if you reference a column explicitly in a table that does not yet exist. Here is an example:
The result of the second call to the function:
Msg 207, Level 16, State 1, Procedure GetFooBar, Line 6
Invalid column name 'bar'.
Msg 2020, Level 16, State 1, Line 1
The dependencies reported for entity "dbo.GetFooBar" do not include references to columns. This is either because the entity references an object that does not exist or because of an error in one or more statements in the entity. Before rerunning the query, ensure that there are no errors in the entity and that all objects referenced by the entity exist.
Fellow SQL Server MVPs Simon Sabin and Louis Davidson filed bugs about this on Connect:
The Next Step
While SQL Server 2008 definitely gives us a better dependency story, we still do not have a foolproof method. My aim here is to give you a more automated way to compensate for some of these deficiencies. With that said, I am going to restrict the discussion to tracking object dependencies (not column dependencies) within a single SQL Server instance. With a little bit of code, I will help reduce the amount of time you spend tracking down this dependency information.
The main problem with the missing dependency information is that, in order to restore the information in sysdepends (which various tools rely on), you need to manually call sys.sp_refreshsqlmodule. But why does that need to be the case? As I was researching the new dependency objects earlier in the SQL Server 2008 beta cycle, I thought to myself, "There has to be a better way." It did not take long for me to come up with one; in fact, it took longer to write the first prototype.
My idea here is to create a DDL trigger that refreshes any SQL modules for which rows are found in sql_expression_dependencies and are without a match in sysdepends. The goal is to do as much as we can to help the system repair itself when a DDL event leaves an orphaned reference, and when it cannot fix itself, it can at least tell us it uncovered a problem … either through a custom log, raising a SQL Server error, sending an e-mail, etc. With a custom logging solution you can prevent the same error from notifying you too often.
This will catch pairs of actions that have been completed, such as:
- drop a table and re-create it;
- alter a procedure to point to a new name or a new schema, and then change the object; or,
- create a stored procedure that references an invalid object, and then create the object.
However, if you have something that is still incomplete, for example if you change a module to point to an invalid object and have not yet created that object, or if a referenced object in another database has since been dropped, the only thing the DDL trigger can do is write a log entry and/or alert you, since the dependency information cannot be corrected without further action.
Why search for all objects? You could write the trigger in such a way that it limits the search to the object that caused the DDL trigger to fire, but if you perform a global search, you can also be sure to catch changes that happened while the DDL trigger was disabled (which is often performed during deployments, either manually or by schema comparison / synchronization tools).
You could do something similar in SQL Server 2005, using a DDL trigger, and blindly execute sp_refreshsqlmodule for every object in the entire database. But, this would not be very efficient, and would miss something that DDL triggers can capture in SQL Server 2008: sp_rename. So, in SQL Server 2008, we can find the missing dependencies using sys.sql_expression_dependencies, and we can also capture renames, making the solution better targeted and a little more complete.
The following DDL trigger and stored procedure will refresh any sql module that is found with a valid reference in sys.sql_dependencies, but without a match in sysdepends, whenever any of the defined DDL events occurred. It will also log or alert you if it finds any references that do not exist, whether they are in the current database or in another database on the server. I have moved much of the logic to a stored procedure, so you can call it on demand as well; you should also change the logic to log to a table or send an e-mail in relevant cases, because this demonstration assumes that the one who is performing the DDL actions (you) is also the one who wants to know about any dependency problems those DDL actions might be leaving behind.
You can set up a different monitor to validate the entries in sys.sql_expression_dependencies that reference another server, and in addition, you can check which objects in another database are depended upon but which may have been dropped since the last time the local DDL trigger fired. If the local database is relatively stable, changes in the other database might have occurred which break your local dependencies, but there is not an easy way to do this, unless you want to create a server-wide trigger, or have certain databases use DDL triggers that check on each others' dependency views.
Having said that, I guess it could just as easily be a scheduled job, but again, I guess it depends on how often you want it to run, compared to how often you expect changes to be made to the local database, and compared still to how likely you expect it to find anything that needs to be fixed. If you make DDL changes once a month, it doesn't make much sense to run this every five minutes.
As I said before, the solution is far from perfect, and is only meant to get you a step closer to the "holy grail" of dependency tracking.
Note that we still cannot do anything about dynamic SQL, since sys.sql_expression_dependencies does not capture information that it doesn't parse correctly. This is why you may still need to rely on parsing and proper coding practices. If you can't rely on these, you may be down to grunt work.
Another case that is not easy to handle just yet is the case of the dropped column. Fellow SQL Server MVP Alex Kuznetsov filed a bug about this in the following feedback item, but the issue was incorrectly closed as fixed:
And finally, a case that I am not treating here is #temp table dependencies, which can cause problems for dependency tracking as well. Erland has some advice in this thread about how to deal with this issue, but you are probably not going to be too eager to try to implement it…