Bad Habits to Kick: Believing everything you hear or read

This is part of more than a decade of posts involving Bad Habits and Best Practices.
See the full index.

I read a statement on a forum recently that said something like:

Identity columns are the primary key and are always clustered.

Wow. My first thought was, "I hope people aren't learning from this." Unfortunately, I suspect a few already have, as no matter what you write on the Internet or say in passing, at least one person is going to believe you. I suspect the statement above came from someone's opinion, based on seeing a handful of tables in their life, which always had an IDENTITY column that was also the PK. While you may be able to attribute this to correlation (e.g. there are many IDENTITY columns that are also the clustered primary key), this is not necessarily causality (e.g. they are the clustered primary key because they are the IDENTITY columns, or vice versa). Besides, it is not a universally true statement anyway (since there are many variations possible where an IDENTITY column is not the primary key, where the primary key is not clustered, and so on).

This is just an exercise of how to dispel such a myth, and battle the blatant disregard for how someone's limited experience might differ from the true spectrum of reality, since they may need to be convinced with actual examples. (As an aside, I remember several years ago when I interviewed for a position at Microsoft, where the interviewer told me that you couldn't create a self-referencing foreign key. I still laugh about it that he refused to believe me until I grabbed his laptop and showed him. He wasn't the reason I didn't take the job, by the way.)

I'm not trying to be mean here, but statements intended to be perceived as fact should not be made so casually and without any background in the topic. And I've talked here before about why generalizations are dangerous – even when they're not about issues that can easily be proven one way or another, or where those generalizations appear to be true in all the cases you've seen. Whenever you hear someone say "always" or "never" when it comes to SQL Server, you should at least have your tin foil hat within reach. Very few scenarios are absolute, nothing is 100% true 100% of the time, and most statements should be suffixed with, "…but it depends on x, y, and z." I know how much everyone hates "it depends" but, quite frankly, it almost always does.

Anyway, back to the specific issue at hand. Let's look at the example this person mentioned, and a few counter-examples:

CREATE TABLE dbo.foo
(
    ID   int IDENTITY(1,1) PRIMARY KEY CLUSTERED,
    name nvarchar(32)
    -- ,... other columns
);

This type of table was the basis for their conclusion. But what about tables that violate their definition? Some examples:

A table with an IDENTITY column but no primary key or clustered index:

CREATE TABLE dbo.foo
(
    ID   int IDENTITY(1,1),
    name nvarchar(32)
    -- ,... other columns
);

A table with an IDENTITY column, but the clustered primary key is on another column:

CREATE TABLE dbo.foo
(
    ID   int IDENTITY(1,1),
    name nvarchar(32) PRIMARY KEY CLUSTERED
    -- ,... other columns
);

A table with an IDENTITY column, but the non-clustered primary key is on another column:

CREATE TABLE dbo.foo
(
    ID   int IDENTITY(1,1),
    name nvarchar(32) PRIMARY KEY NONCLUSTERED
    -- ,... other columns
);

A table with an IDENTITY column represented by a unique clustered index, but the non-clustered primary key is on another column:

CREATE TABLE dbo.foo
(
    ID   int IDENTITY(1,1),
    name nvarchar(32) PRIMARY KEY NONCLUSTERED
    -- ,... other columns
);
 
CREATE UNIQUE CLUSTERED INDEX x ON dbo.foo(ID);

So clearly there are plenty of variants where IDENTITY columns are not necessarily the clustered primary key.

Conclusion

If you want your IDENTITY column to be the primary key, say so. If you want the primary key to be clustered, say so. But most importantly, understand that just because you see something in one place, don't assume that it is always true, or that it is the rule you have to follow – even if you've only ever seen it to be true. Some other areas where this comes up: the "natural order" debate (where people assume that because some query without an ORDER BY returns in a specific order today, that it will continue to order that way tomorrow) and the "short-circuit" debate (where people believe that SQL Server will always process your query in the order it was written). I hope I've dispelled both of those myths in a previous Bad Habits topic and the short-circuiting one more recently in this question on dba.stackexchange.

This is part of more than a decade of posts involving Bad Habits and Best Practices.
See the full index.

By: Aaron Bertrand

I am a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. I am a long-time Microsoft MVP, write at Simple Talk, SQLPerformance, and MSSQLTips, and have had the honor of speaking at more conferences than I can remember. In non-tech life, I am a husband, a father of two, a huge hockey and football fan, and my pronouns are he/him.

unclebiguns says:

March 9, 2012 at 9:25 PM

My favorite thing that a lot people believe about IDENTITY columns is that they are unique. That isn't true unless you pur an unique constraint on it.
AaronBertrand says:

March 3, 2012 at 5:40 PM

Ian, my point was that people can go years basing their understanding of how a certain part of SQL Server works based on one sentence or observation that was incorrect. Since we are sharing ideas in public places with a wide variety of audience, we need to be careful about our wording when we say something that is a generalization – not everyone is going to see the implied "usually" or "most of the time." I'm not saying that we need to document every single counter-example, but we shouldn't be hiding that they exist, either.
Ian Stirk says:

March 3, 2012 at 2:05 PM

I suspect if the line was changed to "IDENTITY columns are typically used as the primary key, and primary keys are clustered BY DEFAULT." it would be (generally) acceptable.
As an aside, generalizations are typically very useful (there's one there!) A book or article would be perhaps 50+ times longer if it catered for all conditions. Then who would want to read it? or have the time? We live in a time of learn quickly, use, drop, move on to the next thing.
Aaron, I do wonder if you are making something out of nothing… Yes the initial statement can be deemed as wrong, sometimes.
Thanks
Ian
Rick Willemain says:

February 28, 2012 at 7:37 PM

Nicely formed quick review that dispells funny notions, I think. Thank you.
Mingle in the word "constraints", and feel the screens go blank !
🙂
GrumpyOldDBA says:

February 28, 2012 at 10:44 AM

One of my stock questions for a DBA is around identities and clustered primary keys. You'd be amazed at how many I've interviewed, some with many years of "experience", who get this wrong.
I guess it can be tricky but I remember a long post about SQL2005 just after it had been released and I was confused by some of the answers to an engine question, not mine, eventually I managed to get most of those posting "solutions" to admit they hadn't actually got any sql 2005 servers. Obviously that's an extreme example and not every opinion/post etc. falls under that – but you're right you do read some interesting statements and I also find they tend to reappear about every 3 months or so.
Good post.