Please don't use sys.sp_columns!
September 21st, 20201
Please don't use sys.sp_columns!
September 21st, 20201
 
 

I spent 20 minutes the other night thinking about possible ways to remove an ntext column from a 28TB table, and I did all of this thinking for nothing.

I was investigating the table for other reasons. And because I'm lazy, I scanned the output of the system procedure sys.sp_columns (instead of writing a proper query against more modern metadata). I spotted the legacy type in the output, and dropped everything. When I suggested to a colleague one approach, he replied:

What NTEXT column? Are you looking at the right table? Did you smoke your lunch?

I scratched my head. I was definitely looking at the right table. In the right schema, in the right database, on the right instance. I won't comment on lunch but when I checked the catalog views, sure enough, no trace of ntext.

Was I insane?

Nah, there's just a little thing hiding in here that the docs don't tell you about. Let's take a look at what sys.sp_columns shows, and compare that to the catalog views, using a simple table with three LOB columns:

CREATE TABLE dbo.blat
(
  x nvarchar(max), 
  y varchar(max), 
  z text
);
GO
 
EXEC sys.sp_columns @table_name = N'blat', @table_owner = N'dbo';
 
SELECT name, [type_name] = TYPE_NAME(user_type_id), max_length
  FROM sys.columns 
  WHERE [object_id] = OBJECT_ID(N'dbo.blat');
 
SELECT COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH 
  FROM INFORMATION_SCHEMA.COLUMNS
  WHERE TABLE_NAME = N'blat'
    AND TABLE_SCHEMA = N'dbo';

Some peculiarities in the output here, for sure:

Why does the system procedure tell me all of my types are legacy and deprecated? It takes some unpacking. First, some information from the sys.sp_columns documentation, emphasis mine:

Output column:
DATA_TYPE
Integer code for ODBC data type. If this is a data type that cannot be mapped to an ODBC type, it is NULL. The native data type name is returned in the TYPE_NAME column.
Output column:
TYPE_NAME
String representing a data type. The underlying DBMS presents this data type name.
Input parameter:
@ODBCVer
ODBCVer Is the version of ODBC that is being used. ODBCVer is int, with a default of 2. This indicates ODBC Version 2. Valid values are 2 or 3. For the behavior differences between versions 2 and 3, see the ODBC SQLColumns specification.

If we look at the SQLColumns topic (no link is provided in the previous topic, btw):

DATA_TYPE Returns SQL_VARCHAR, SQL_VARBINARY, or SQL_WVARCHAR for the varchar(max) data types.
TYPE_NAME Returns "varchar", "varbinary", or "nvarchar" for the varchar(max), varbinary(max), and nvarchar(max) data types.

Something doesn't line up here. The procedure doc says it uses SQLColumns, and the SQLColumns doc says it knows what max types are. This other SQLColumns topic is even less helpful; it says, "For a list of valid ODBC SQL data types, see SQL Data Types in Appendix D: Data Types." This document doesn't even have an Appendix A, never mind D, and I don't see text or ntext in any of the pages mentioned so far.

Next, I thought, let's try passing the @ODBCVer argument in and see if there is any different behavior using 2 or 3.

EXEC sys.sp_columns @table_name = N'blat', @table_owner = N'dbo', @ODBCVer = 2;
EXEC sys.sp_columns @table_name = N'blat', @table_owner = N'dbo', @ODBCVer = 3;

Nope, no difference, at least in my case:

I mean, what the hell? I need to go spelunking. I grab the definition for sys.sp_columns and, after trying to ignore all the horrible things in there, I see this:

from sys.spt_columns_odbc_view s_cov

This is a hidden system view (and is obviously undocumented). How do I know?

SELECT TOP (1) * FROM sys.spt_columns_odbc_view;
GO
SELECT id = OBJECT_ID(N'sys.spt_columns_odbc_view');

Results:

Msg 208, Level 16, State 1
Invalid object name 'sys.spt_columns_odbc_view'.

id
– – – –
NULL

This thing doesn't even have an object_id. Same result in master and in a restored copy of the resource database.

You can, however, query this view under the Dedicated Administrator Connection (DAC). (If you're having troubles, you're not alone; see this article.) Under the DAC you can now do this:

SELECT ODBCVER, COLUMN_NAME, TYPE_NAME, TYPE_NAME_28, LENGTH_28
  FROM sys.spt_columns_odbc_view 
  WHERE TABLE_NAME = N'blat'
    AND TABLE_OWNER = N'dbo'
  ORDER BY ODBCVer, COLUMN_NAME;

Results:

ODBCVER  COLUMN_NAME  TYPE_NAME  TYPE_NAME_28  LENGTH_28
-------  -----------  ---------  ------------  -----------
2        x            nvarchar   ntext         2147483646
2        y            varchar    text          2147483647
2        z            text       text          2147483647
3        x            nvarchar   ntext         2147483646
3        y            varchar    text          2147483647
3        z            text       text          2147483647

Notice that the view does seem to know the true types of these columns. But why the procedure doesn't expose them properly is a great question, and I wish I had an answer. I can see what the code does; it literally says here, take this valid type name, and throw it out the window; instead, show this other type name that isn't accurate:

SELECT ... TYPE_NAME = s_cov.TYPE_NAME_28,

You can see the definition of the view under the DAC using:

SELECT OBJECT_DEFINITION(object_id('sys.spt_columns_odbc_view'));

For brevity and readability I'll tidy up and leave out a lot of the irrelevant bits:

create view sys.spt_columns_odbc_view
as
select
  ...
  TYPE_NAME    = case
                 when (t.system_type_id = 240 or t.user_type_id > 255) then -- CLR UDTs
                     t.name
                 else
                     d.TYPE_NAME collate catalog_default
                 end,
  TYPE_NAME_28 = case
                 when (t.system_type_id = 240 or t.user_type_id > 255) then -- CLR UDTs
                     t.name
                 when (c.max_length = -1 and d.ss_dtype = 167) then -- varchar(max)
                     N'text'
                 when (c.max_length = -1 and d.ss_dtype = 231) then -- nvarchar(max)
                     N'ntext'
                 when (c.max_length = -1 and d.ss_dtype = 165) then -- varbinary(max)
                     N'image'
                 else
                     d.TYPE_NAME
                 end,
  ...
  LENGTH_28    = case
                 when d.ss_dtype IN (108,106) then    -- decimal/numeric types
                     OdbcPrec(c.system_type_id,c.max_length,c.precision)+2
                 when (c.max_length = -1 and d.ss_dtype = 240) then -- Large UDT => image for non-SNAC clients
                     2147483647
                 when d.ss_dtype = 240 then -- Small UDT
                     isnull(u.max_length, c.max_length)
                 when d.ss_dtype = 241 then -- XML
                     2147483646
                 when (c.max_length = -1 and d.ss_dtype = 167) then -- varchar(max)
                     2147483647
                 when (c.max_length = -1 and d.ss_dtype = 231) then -- nvarchar(max)
                     2147483646
                 when (c.max_length = -1 and d.ss_dtype = 165) then -- varbinary(max)
                     2147483647
                 when d.ss_dtype IN (40, 41, 42, 43) then -- DATE/TIME/DATETIME2/DATETIMEOFFSET
                     OdbcPrec(c.system_type_id,c.max_length,c.precision)*2
                 else
                     isnull(d.length, c.max_length)
                 end
  ...
  from  sys.all_columns c inner join
        sys.all_objects o 
          on o.object_id = c.object_id
        inner join sys.types t 
          on t.user_type_id = c.user_type_id
        inner join sys.spt_datatype_info d 
          on d.ss_dtype = c.system_type_id 
          and d.AUTO_INCREMENT = c.is_identity
        left join sys.xml_schema_collections xsc 
          on xsc.xml_collection_id = c.xml_collection_id
        left join sys.assembly_types u 
          on u.user_type_id = c.user_type_id

You can read through that and see that there are arbitrary decisions about names and lengths, and they explicitly force type names to ntext, text, and image. But this doesn't answer the $64,000 question: Why? The only answer I have today is:

Do not trust sys.sp_columns.

My colleague suggested this was an attempt at backward compatibility at some point, and that may be true, but it shouldn't have been abandoned. I feel like the documentation should warn that this isn't the best way to get information about the columns for table, and Microsoft should probably mark it as deprecated.

Alternatives?

The output of sys.columns is not ideal, and I do not recommend INFORMATION_SCHEMA at all, so what should you use? I like the new metadata functions introduced in SQL Server 2012 (which I first blogged about here back in 2010):

SELECT name, system_type_name
  FROM sys.dm_exec_describe_first_result_set
       (N'SELECT * FROM dbo.blat', N'', NULL);

Results:

name    system_type_name
----    ----------------
x       nvarchar(max)
y       varchar(max)
z       text

That's not easy to remember, but it's much better than all the CASE logic and math you need to do to get the right output from sys.columns, sys.types, etc.

By: Aaron Bertrand

I am a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. I am a long-time Microsoft MVP, write at Simple Talk, SQLPerformance, and MSSQLTips, and have had the honor of speaking at more conferences than I can remember. In non-tech life, I am a husband, a father of two, a huge hockey and football fan, and my pronouns are he/him.

1 Response

  1. September 21, 2020

    […] Aaron Bertrand went down a rabbit hole: […]