February 27, 2012 | SQL Server

Bad Habits to Kick : Believing everything you hear or read

I read a statement on a forum recently that said something like:

IDENTITY columns are the primary key, and primary keys are clustered.

Wow. My first thought was, "I hope people aren't learning from this." Unfortunately, I suspect a few already have, as no matter what you write on the Internet or say in passing, at least one person is going to believe you. I suspect the statement above came from someone's opinion, based on seeing a handful of tables in their life, which always had an IDENTITY column that was also the PK. While you may be able to attribute this to correlation (e.g. there are many IDENTITY columns that are also the clustered primary key), this is not necessarily causality (e.g. they are the clustered primary key because they are the IDENTITY columns, or vice versa). Besides, it is not a universally true statement anyway (since there are many variations possible where an IDENTITY column is not the primary key, where the primary key is not clustered, and so on).

This is just an exercise of how to dispel such a myth, and battle the blatant disregard for how someone's limited experience might differ from the true spectrum of reality, since they may need to be convinced with actual examples. (As an aside, I remember several years ago when I interviewed for a position at Microsoft, where the interviewer told me that you couldn't create a self-referencing foreign key. I still laugh about it that he refused to believe me until I grabbed his laptop and showed him. He wasn't the reason I didn't take the job, by the way.)

I'm not trying to be mean here, but statements intended to be perceived as fact should not be made so casually and without any background in the topic. And I've talked here before about why generalizations are dangerous – even when they're not about issues that can easily be proven one way or another, or where those generalizations appear to be true in all the cases you've seen. Whenever you hear someone say "always" or "never" when it comes to SQL Server, you should at least have your tin foil hat within reach, because it's possible someone is trying to brain-wash you. There are very few scenarios that are absolute, very few statements that are 100% true 100% of the time, and a large majority of statements can and should be suffixed with, "…but it depends on x, y and z." I know how much everyone hates "it depends" but, quite frankly, it almost always does.

Anyway, back to the specific issue at hand. Let's examine the example this person mentioned, and a few counter-examples:

A table with a clustered primary key on the identity column:

 CREATE TABLE dbo.foo
(
    ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
    name NVARCHAR(32)
    -- ,... other columns
);

You can even leave out the CLUSTERED keyword, as the default implementation of a primary key in SQL Server is CLUSTERED:

 CREATE TABLE dbo.foo
(
    ID INT IDENTITY(1,1) PRIMARY KEY,
    name NVARCHAR(32)
    -- ,... other columns
);

So this is the table the person above based their conclusion. But what about tables that violate his definition?

A table with an IDENTITY column but no primary key or clustered index:

 CREATE TABLE dbo.foo
(
    ID INT IDENTITY(1,1),
    name NVARCHAR(32)
    -- ,... other columns
);

A table with an IDENTITY column but the primary key is on another column and is clustered:

 CREATE TABLE dbo.foo
(
    ID INT IDENTITY(1,1),
    name NVARCHAR(32) PRIMARY KEY --CLUSTERED
    -- ,... other columns
);

A table with an IDENTITY column but the primary key is on another column and the primary key is not clustered:

 CREATE TABLE dbo.foo
(
    ID INT IDENTITY(1,1),
    name NVARCHAR(32) PRIMARY KEY NONCLUSTERED
    -- ,... other columns
);

A table with an IDENTITY column represented by a unique clustered index, but the primary key is on another column:

 CREATE TABLE dbo.foo
(
    ID INT IDENTITY(1,1),
    name NVARCHAR(32) PRIMARY KEY NONCLUSTERED
    -- ,... other columns
);
CREATE UNIQUE CLUSTERED INDEX x ON dbo.foo(ID);

So clearly there are plenty of variants where IDENTITY columns are not necessarily the clustered primary key.

Conclusion

If you want your IDENTITY column to be the primary key, say so. If you want the primary key to be clustered, say so. But most importantly, understand that just because you see something in one place, don't assume that it is always true, or that it is the rule you have to follow – even if you've only ever seen it to be true. Some other areas where this comes up: the "natural order" debate (where people assume that because some query without an ORDER BY returns in a specific order today, that it will continue to order that way tomorrow) and the "short-circuit" debate (where people believe that SQL Server will always process your query in the order it was written). I hope I've dispelled both of those myths in a previous Bad Habits topic and the short-circuiting one more recently in this question on dba.stackexchange; but if you believe any of these are universally true, please feel free to leave a comment. šŸ™‚
 

5 comments on this post

    • GrumpyOldDBA - February 28, 2012, 10:44 AM

      One of my stock questions for a DBA is around identities and clustered primary keys. You'd be amazed at how many I've interviewed, some with many years of "experience", who get this wrong.
      I guess it can be tricky but I remember a long post about SQL2005 just after it had been released and I was confused by some of the answers to an engine question, not mine, eventually I managed to get most of those posting "solutions" to admit they hadn't actually got any sql 2005 servers. Obviously that's an extreme example and not every opinion/post etc. falls under that – but you're right you do read some interesting statements and I also find they tend to reappear about every 3 months or so.
      Good post.

    • Rick Willemain - February 28, 2012, 7:37 PM

      Nicely formed quick review that dispells funny notions, I think. Thank you.
      Mingle in the word "constraints", and feel the screens go blank !
      šŸ™‚

    • Ian Stirk - March 3, 2012, 2:05 PM

      I suspect if the line was changed to "IDENTITY columns are typically used as the primary key, and primary keys are clustered BY DEFAULT." it would be (generally) acceptable.
      As an aside, generalizations are typically very useful (there's one there!) A book or article would be perhaps 50+ times longer if it catered for all conditions. Then who would want to read it? or have the time? We live in a time of learn quickly, use, drop, move on to the next thing.
      Aaron, I do wonder if you are making something out of nothing… Yes the initial statement can be deemed as wrong, sometimes.
      Thanks
      Ian

    • AaronBertrand - March 3, 2012, 5:40 PM

      Ian, my point was that people can go years basing their understanding of how a certain part of SQL Server works based on one sentence or observation that was incorrect. Since we are sharing ideas in public places with a wide variety of audience, we need to be careful about our wording when we say something that is a generalization – not everyone is going to see the implied "usually" or "most of the time." I'm not saying that we need to document every single counter-example, but we shouldn't be hiding that they exist, either.

    • unclebiguns - March 9, 2012, 9:25 PM

      My favorite thing that a lot people believe about IDENTITY columns is that they are unique.  That isn't true unless you pur an unique constraint on it.

Comments are closed.