I spent 20 minutes the other night thinking about possible ways to remove an ntext
column from a 28TB table, and I did all of this thinking for nothing.
I was investigating the table for other reasons. And because I'm lazy, I scanned the output of the system procedure sys.sp_columns
(instead of writing a proper query against more modern metadata). I spotted the legacy type in the output, and dropped everything. When I suggested to a colleague one approach, he replied:
What NTEXT column? Are you looking at the right table? Did you smoke your lunch?
I scratched my head. I was definitely looking at the right table. In the right schema, in the right database, on the right instance. I won't comment on lunch but when I checked the catalog views, sure enough, no trace of ntext
.
Was I insane?
Nah, there's just a little thing hiding in here that the docs don't tell you about. Let's take a look at what sys.sp_columns
shows, and compare that to the catalog views, using a simple table with three LOB columns:
CREATE TABLE dbo.blat ( x nvarchar(max), y varchar(max), z text ); GO EXEC sys.sp_columns @table_name = N'blat', @table_owner = N'dbo'; SELECT name, [type_name] = TYPE_NAME(user_type_id), max_length FROM sys.columns WHERE [object_id] = OBJECT_ID(N'dbo.blat'); SELECT COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = N'blat' AND TABLE_SCHEMA = N'dbo';
Some peculiarities in the output here, for sure:
Why does the system procedure tell me all of my types are legacy and deprecated? It takes some unpacking. First, some information from the sys.sp_columns
documentation, emphasis mine:
DATA_TYPE |
Integer code for ODBC data type. If this is a data type that cannot be mapped to an ODBC type, it is NULL. The native data type name is returned in the TYPE_NAME column. |
TYPE_NAME |
String representing a data type. The underlying DBMS presents this data type name. |
@ODBCVer |
ODBCVer Is the version of ODBC that is being used. ODBCVer is int, with a default of 2. This indicates ODBC Version 2. Valid values are 2 or 3. For the behavior differences between versions 2 and 3, see the ODBC SQLColumns specification. |
If we look at the SQLColumns topic (no link is provided in the previous topic, btw):
DATA_TYPE | Returns SQL_VARCHAR, SQL_VARBINARY, or SQL_WVARCHAR for the varchar(max) data types. |
TYPE_NAME | Returns "varchar", "varbinary", or "nvarchar" for the varchar(max), varbinary(max), and nvarchar(max) data types. |
Something doesn't line up here. The procedure doc says it uses SQLColumns, and the SQLColumns doc says it knows what max types are. This other SQLColumns topic is even less helpful; it says, "For a list of valid ODBC SQL data types, see SQL Data Types in Appendix D: Data Types." This document doesn't even have an Appendix A, never mind D, and I don't see text
or ntext
in any of the pages mentioned so far.
Next, I thought, let's try passing the @ODBCVer
argument in and see if there is any different behavior using 2 or 3.
EXEC sys.sp_columns @table_name = N'blat', @table_owner = N'dbo', @ODBCVer = 2; EXEC sys.sp_columns @table_name = N'blat', @table_owner = N'dbo', @ODBCVer = 3;
Nope, no difference, at least in my case:
I mean, what the hell? I need to go spelunking. I grab the definition for sys.sp_columns
and, after trying to ignore all the horrible things in there, I see this:
from sys.spt_columns_odbc_view s_cov
This is a hidden system view (and is obviously undocumented). How do I know?
SELECT TOP (1) * FROM sys.spt_columns_odbc_view; GO SELECT id = OBJECT_ID(N'sys.spt_columns_odbc_view');
Results:
Invalid object name 'sys.spt_columns_odbc_view'.
id
– – – –
NULL
This thing doesn't even have an object_id
. Same result in master
and in a restored copy of the resource database.
You can, however, query this view under the Dedicated Administrator Connection (DAC). (If you're having troubles, you're not alone; see this article.) Under the DAC you can now do this:
SELECT ODBCVER, COLUMN_NAME, TYPE_NAME, TYPE_NAME_28, LENGTH_28 FROM sys.spt_columns_odbc_view WHERE TABLE_NAME = N'blat' AND TABLE_OWNER = N'dbo' ORDER BY ODBCVer, COLUMN_NAME;
Results:
ODBCVER COLUMN_NAME TYPE_NAME TYPE_NAME_28 LENGTH_28 ------- ----------- --------- ------------ ----------- 2 x nvarchar ntext 2147483646 2 y varchar text 2147483647 2 z text text 2147483647 3 x nvarchar ntext 2147483646 3 y varchar text 2147483647 3 z text text 2147483647
Notice that the view does seem to know the true types of these columns. But why the procedure doesn't expose them properly is a great question, and I wish I had an answer. I can see what the code does; it literally says here, take this valid type name, and throw it out the window; instead, show this other type name that isn't accurate:
SELECT ... TYPE_NAME = s_cov.TYPE_NAME_28,
You can see the definition of the view under the DAC using:
SELECT OBJECT_DEFINITION(object_id('sys.spt_columns_odbc_view'));
For brevity and readability I'll tidy up and leave out a lot of the irrelevant bits:
create view sys.spt_columns_odbc_view as select ... TYPE_NAME = case when (t.system_type_id = 240 or t.user_type_id > 255) then -- CLR UDTs t.name else d.TYPE_NAME collate catalog_default end, TYPE_NAME_28 = case when (t.system_type_id = 240 or t.user_type_id > 255) then -- CLR UDTs t.name when (c.max_length = -1 and d.ss_dtype = 167) then -- varchar(max) N'text' when (c.max_length = -1 and d.ss_dtype = 231) then -- nvarchar(max) N'ntext' when (c.max_length = -1 and d.ss_dtype = 165) then -- varbinary(max) N'image' else d.TYPE_NAME end, ... LENGTH_28 = case when d.ss_dtype IN (108,106) then -- decimal/numeric types OdbcPrec(c.system_type_id,c.max_length,c.precision)+2 when (c.max_length = -1 and d.ss_dtype = 240) then -- Large UDT => image for non-SNAC clients 2147483647 when d.ss_dtype = 240 then -- Small UDT isnull(u.max_length, c.max_length) when d.ss_dtype = 241 then -- XML 2147483646 when (c.max_length = -1 and d.ss_dtype = 167) then -- varchar(max) 2147483647 when (c.max_length = -1 and d.ss_dtype = 231) then -- nvarchar(max) 2147483646 when (c.max_length = -1 and d.ss_dtype = 165) then -- varbinary(max) 2147483647 when d.ss_dtype IN (40, 41, 42, 43) then -- DATE/TIME/DATETIME2/DATETIMEOFFSET OdbcPrec(c.system_type_id,c.max_length,c.precision)*2 else isnull(d.length, c.max_length) end ... from sys.all_columns c inner join sys.all_objects o on o.object_id = c.object_id inner join sys.types t on t.user_type_id = c.user_type_id inner join sys.spt_datatype_info d on d.ss_dtype = c.system_type_id and d.AUTO_INCREMENT = c.is_identity left join sys.xml_schema_collections xsc on xsc.xml_collection_id = c.xml_collection_id left join sys.assembly_types u on u.user_type_id = c.user_type_id
You can read through that and see that there are arbitrary decisions about names and lengths, and they explicitly force type names to ntext
, text
, and image
. But this doesn't answer the $64,000 question: Why? The only answer I have today is:
Do not trust sys.sp_columns.
My colleague suggested this was an attempt at backward compatibility at some point, and that may be true, but it shouldn't have been abandoned. I feel like the documentation should warn that this isn't the best way to get information about the columns for table, and Microsoft should probably mark it as deprecated.
Alternatives?
The output of sys.columns
is not ideal, and I do not recommend INFORMATION_SCHEMA
at all, so what should you use? I like the new metadata functions introduced in SQL Server 2012 (which I first blogged about here back in 2010):
SELECT name, system_type_name FROM sys.dm_exec_describe_first_result_set (N'SELECT * FROM dbo.blat', N'', NULL);
Results:
name system_type_name ---- ---------------- x nvarchar(max) y varchar(max) z text
That's not easy to remember, but it's much better than all the CASE
logic and math you need to do to get the right output from sys.columns
, sys.types
, etc.
1 Response
[…] Aaron Bertrand went down a rabbit hole: […]