SQL Server v.Next (Denali) : Metadata enhancements
In my previous job, we had several cases where schema changes or incorrect developer assumptions in the middle tier or application logic would lead to type mismatches. We would have a stored procedure that returns a BIT column, but then change the procedure to have something like CASE WHEN <condition> THEN 1 ELSE 0 END. In this case SQL Server would return an INT as a catch-all, and if .NET was expecting a boolean, BOOM. Wouldn't it be nice if the application could check the result set of the stored procedure, and construct its data types using that information? Another case would be where the schema has changed, but the applications can't all be updated at once. Wouldn't it be great to be able to tell SQL Server what the column name should be, what data type you want, or in what collation, as part of the stored procedure call? How often have you cursed the bizarre behavior of SET FMTONLY ON? And finally, how many times have you been frustrated by the output of sp_who2, which returns SPID twice – and worse yet, as a CHAR(5) column instead of an INT?
You can deal with each of these scenarios with some new procedures and dynamic management functions introduced in Denali. We will have new functionality to either discover or control the metadata of queries, without actually having to run them, store temporary results elsewhere, or use SET FMTONLY ON.
<b>Inspect output of queries without piecing together catalog views</b> Picture a very simple table like this: <blockquote> CREATE TABLE dbo.x ( foo INT NOT NULL, bar DECIMAL(12, 2), blat VARCHAR(20), mort NVARCHAR(32), splunge NVARCHAR(MAX), whence DATETIME2(4) NOT NULL );
And then a query like this:
SELECT * FROM dbo.x;
Today, if I want the application to understand the data types that are coming back from this query, I would have to create queries against the catalog views sys.columns and sys.types, for example:
SELECT c.name, t.name, c.max_length, c.[precision], c.scale, c.is_nullable FROM sys.columns AS c INNER JOIN sys.types AS t ON c.system_type_id = t.system_type_id AND c.user_type_id = t.user_type_id WHERE c.[object_id] = OBJECT_ID('dbo.x') ORDER BY c.column_id;
This leaves some serious gaps, since I have to perform all kinds of manipulations to put these outputs into true data type definitions. For example, I have to check max_length for -1 and switch it to MAX, I have to check for nchar/nvarchar and cut max_length in half, and I have to piece together the precision and scale for *some* numeric- and date-based types.
There is a new dynamic management function in Denali, sys.dm_exec_describe_first_result_set, which will take a query and produce a much more concise description of the data types coming back in the query. While the function returns other columns, I'll focus on the ones that are most useful for describing resultsets and creating new tables based on them.
SELECT name, system_type_name, is_nullable FROM sys.dm_exec_describe_first_result_set ( N'SELECT * FROM dbo.x;', NULL, 0 ) AS f ORDER BY column_ordinal;
Clearly this is a lot less work and manipulation to get the data types consistent – all of the "compound" data types come back exactly as you need to define them. So let's imagine you want to generate CREATE TABLE scripts to store the data from a bunch of DMVs or your own views or other queries; again, you can do this without writing very complex queries against the catalog views or with piecing together the data types manually. A very simple example that generates a CREATE TABLE statement to store the data from sys.dm_exec_sessions in your own repository table:
SET NOCOUNT ON; DECLARE @sql NVARCHAR(MAX); SET @sql = N'SELECT * FROM sys.dm_exec_sessions;'; SELECT 'CREATE TABLE dbo.SessionsDMV('; SELECT CASE column_ordinal WHEN 1 THEN '' ELSE ',' END + name + ' ' + system_type_name + CASE is_nullable WHEN 0 THEN ' not null' ELSE '' END FROM sys.dm_exec_describe_first_result_set ( @sql, NULL, 0 ) AS f ORDER BY column_ordinal; SELECT ');';
The results (not exactly how I would want them formatted, but you can play with the query above so the code conforms better to your standards):
----------------------------- CREATE TABLE dbo.SessionsDMV( ------------------------------------------------------------------ session_id smallint not null ,login_time datetime not null ,host_name nvarchar(128) ,program_name nvarchar(128) ,host_process_id int ,client_version int ,client_interface_name nvarchar(32) ,security_id varbinary(85) not null ,login_name nvarchar(128) not null ,nt_domain nvarchar(128) ,nt_user_name nvarchar(128) ,status nvarchar(30) not null ,context_info varbinary(128) ,cpu_time int not null ,memory_usage int not null ,total_scheduled_time int not null ,total_elapsed_time int not null ,endpoint_id int not null ,last_request_start_time datetime not null ,last_request_end_time datetime ,reads bigint not null ,writes bigint not null ,logical_reads bigint not null ,is_user_process bit not null ,text_size int not null ,language nvarchar(128) ,date_format nvarchar(3) ,date_first smallint not null ,quoted_identifier bit not null ,arithabort bit not null ,ansi_null_dflt_on bit not null ,ansi_defaults bit not null ,ansi_warnings bit not null ,ansi_padding bit not null ,ansi_nulls bit not null ,concat_null_yields_null bit not null ,transaction_isolation_level smallint not null ,lock_timeout int not null ,deadlock_priority int not null ,row_count bigint not null ,prev_error int not null ,original_security_id varbinary(85) not null ,original_login_name nvarchar(128) not null ,last_successful_logon datetime ,last_unsuccessful_logon datetime ,unsuccessful_logons bigint ,group_id int not null ,authenticating_database_id int ---- );
Now, you can copy the entire result from the messages pane, and run it to create a table that will accept an INSERT statement like "INSERT dbo.SessionsDMV SELECT * FROM sys.dm_exec_sessions;"
I'll leave it as an exercise to the reader to try this out against different values of @sql.
Best guess at undeclared parameters
There is also a new stored procedure that will allow you to inspect a T-SQL batch and determine the data types of any undeclared parameters. As an example, what are the data types of the variables that will ultimately be passed into this query text? (This is often something an application developer will have to determine in order to program around these queries.)
SELECT * FROM sys.objects WHERE [object_id] = @ObjectID OR (name LIKE @ObjectName);
On first glance, most of us will know that the first parameter, @ObjectID, should be declared as an INT, and the second parameter is likely going to be an NVARCHAR(4000). However there is now way for the system to tell us these exact answers with a lot less uncertainty:
EXEC sys.sp_describe_undeclared_parameters @tsql = N'SELECT * FROM sys.objects WHERE [object_id] = @ObjectID OR (name LIKE @ObjectName);';
This returns a lot of other columns as well, but even the above should show how the stored procedure will be able to examine large and complicated T-SQL batches and describe all of the parameters that are used. Now, I say "best guess" because the parameter inspection is not perfect – but for most cases it is pretty spot on.
Convert data types explicitly
In this example I am showing a simple stored procedure. If we have two applications that call this stored procedure, we can call the procedure with two different WITH RESULT SETS options in order to return the data differently for the applications, allowing us to slowly change schema across our applications instead of changing them all at once. Imagine the following batch where the first column name is changing from 'Email' to 'Username' (the system no longer requires an e-mail address as the user identifier) and the second column, 'Status,' is changing from an INT to a BIT. The current procedure looks like this:
CREATE PROCEDURE dbo.Users_GetActiveList AS BEGIN SET NOCOUNT ON; SELECT Username, [Status] = CASE WHEN IsActive = 1 THEN 1 ELSE 0 END FROM dbo.Users ORDER BY Username; END GO
As we prepare for the schema changes, we could change the code above to just say:
CREATE PROCEDURE dbo.Users_GetActiveList AS BEGIN SET NOCOUNT ON; SELECT Email = Username, [Status] = CONVERT(BIT, CASE WHEN IsActive = 1 <and other conditions> THEN 1 ELSE 0 END) FROM dbo.Users ORDER BY Email; END GO
But of course unless we update all of our apps at the same time, this change will break at least one of them. Instead, we can use WITH RESULT SETS to change the shape of the resultset conditionally. From the application where we can change the app immediately to use the new column names, we can say:
EXECUTE dbo.Users_GetActiveList WITH RESULT SETS ( ( Email VARCHAR(320), [Status] BIT ) );
In the results, you can't really tell from the screen shot whether Status is a BIT or an INT, but you can certainly see that the Username column has been changed to Email without changing the stored procedure and without having to change all of the consuming apps at once:
The old application can remain unchanged, and continue calling the stored procedure in the default manner – it will still see the Username column, and it will still get Status back as an Int32 instead of a Boolean (we are talking .Net data types here, not SQL). If there really are only two applications, then the WITH RESULT SETS code can be removed from the first application when the schema has changed and the second application has been updated. Otherwise, each app can make use of WITH RESULT SETS until they are all synchronized. At my previous job, I could have used this technique in many scenarios.
Fix output of system stored procedures
Every time I run sp_who2, I cringe. There are two SPID columns for some reason, and both are CHAR(5). Arguably, these could be made SMALLINT, but I think INT plays better with the .NET layer in most cases. There is also a column called BlkBy, which - unlike the rest of the columns - has a name which is unnecessarily truncated. I'll leave this as a CHAR(5) because I know the "." result is appreciated by a lot of people (it lets the true numeric values stand out better than 0 or NULL would). That all said, here is an example of using WITH RESULT SETS to make the output more consumable: <blockquote> EXECUTE [master]..sp_who2 WITH RESULT SETS ( ( [SPID] INT, -- <-- fixed the data type [Status] NVARCHAR(32), [Login] SYSNAME, HostName SYSNAME, Blocker CHAR(5), -- <-- renamed this column [Database] SYSNAME, Command NVARCHAR(32), CPUTime VARCHAR(30), DiskIO VARCHAR(30), LastBatch VARCHAR(48), ProgramName NVARCHAR(255), Redundant_SPID INT, -- <-- renamed this column RequestID INT ) );
Make more usable output in Management Studio
In Denali there is a new system stored procedure called sys.sp_server_diagnostics, which returns some core system health metrics in XML format. Unfortunately, the implementation returns this data as string rather than XML data, which means the output is pretty useless in Management Studio – you either have to expand the result column and scroll horizontally forever in results to grid mode, or do all kinds of juggling to figure out where each row ends in results to text mode. (I complained about this in Connect #625262, but they didn't seem to agree with my reasoning – though I'm curious what other data type they think the output may change to in the future.) In order to make the output more useful, we can do the following to convert the last column to XML, making it clickable within SSMS grid results:
EXECUTE sys.sp_server_diagnostics WITH RESULT SETS ( ( create_time DATETIME, component_name VARCHAR(20), [state] INT, state_desc VARCHAR(20), data XML ) );
In the results, we can now click on the data column in any row, and it will open a new XML document that is much easier to read and parse:
Changing the output collation
This one is late to the party but since publishing I felt this nagging feeling that the article was incomplete. I can't think of an obvious and practical use for this off the top of my head, as thankfully I have not been exposed to too many collation issues over the years. But if a query is outputting a column in a certain collation, you can override that collation using WITH RESULT SETS. Here is an example that changes Albanian_BIN to Albanian_100_BIN, and I pass this batch to the DMV just to show that the output collation is obeyed:
DECLARE @sql NVARCHAR(MAX); SET @sql = N'EXEC(''SELECT N''''foo'''' COLLATE Albanian_BIN'') WITH RESULT SETS ( ( foo NVARCHAR(32) COLLATE Albanian_100_BIN ) );'; SELECT name, collation_name FROM sys.dm_exec_describe_first_result_set ( @sql, NULL, 0 );
Easier translation in SSIS
Finally, here is a blog post from fellow MVP James Rowland-Jones, where he demonstrates how EXECUTE … WITH RESULT SETS can be very handy in SSIS:
The DMVs and stored procedures for describing result sets are currently written to only inspect the *first* result set. So for something like sp_help, it will only assist in determining the very first set of output. I suspect they will correct this in the future, and am thankful they named the objects in an unambiguous way.
In addition, there are certain types of batches that will simply return an error when you try to examine them. For example, while a batch with a table variable can be passed to sys.sp_describe_first_result_set, a batch with a temp table cannot:
EXEC sys.sp_describe_first_result_set @tsql = N'CREATE TABLE #y (j INT);SELECT j FROM #y;';
Msg 11525, Level 16, State 1, Procedure sp_describe_first_result_set, Line 1
The metadata could not be determined because statement 'SELECT j FROM #y;' uses a temp table.
WITH RESULT SETS requires a double-nesting of parentheses. If you try the following:
EXECUTE ('SELECT foo = N''bar'';') WITH RESULT SETS ( -- ( foo NVARCHAR(3) -- ) );
Msg 102, Level 15, State 1, Line 4
Incorrect syntax near 'foo'.
To get around this, you will need to un-comment the commented parentheses. The reason is that WITH RESULT SETS is plural; you can dictate the shape of multiple result sets, as demonstrated above.
Also note that you can't use WITH RESULTS to add or remove columns from the output; you can only mess with the column names, data types, nullability and collation. So for example, if you try to eliminate columns from the result set as follows:
EXECUTE ('SELECT foo = N''bar'', x = 1;') WITH RESULT SETS ( (foo NVARCHAR(3)) );
Msg 11537, Level 16, State 1, Line 1
EXECUTE statement failed because its WITH RESULT SETS clause specified 1 column(s) for result set number 1, but the statement sent 2 column(s) at run time.
And if you try to convert to an incompatible data type, it will fail at runtime. For example, the following code will compile, but if you run it:
EXECUTE ('SELECT foo = N''bar'';') WITH RESULT SETS ( (foo INT) );
Msg 8114, Level 16, State 2, Line 1
Error converting data type varchar to int.
There are many other limitations and caveats to using these metadata enhancements; if you can see usefulness in some of these capabilities, I strongly recommend reviewing the official documentation and playing with the features for yourself.
Here are links to the official documentation about these new features: