I read a statement on a forum recently that said something like:
IDENTITY columns are the primary key, and primary keys are clustered.
Wow. My first thought was, "I hope people aren't learning from this." Unfortunately, I suspect a few already have, as no matter what you write on the Internet or say in passing, at least one person is going to believe you. I suspect the statement above came from someone's opinion, based on seeing a handful of tables in their life, which always had an IDENTITY column that was also the PK. While you may be able to attribute this to correlation (e.g. there are many IDENTITY columns that are also the clustered primary key), this is not necessarily causality (e.g. they are the clustered primary key because they are the IDENTITY columns, or vice versa). Besides, it is not a universally true statement anyway (since there are many variations possible where an IDENTITY column is not the primary key, where the primary key is not clustered, and so on).
This is just an exercise of how to dispel such a myth, and battle the blatant disregard for how someone's limited experience might differ from the true spectrum of reality, since they may need to be convinced with actual examples. (As an aside, I remember several years ago when I interviewed for a position at Microsoft, where the interviewer told me that you couldn't create a self-referencing foreign key. I still laugh about it that he refused to believe me until I grabbed his laptop and showed him. He wasn't the reason I didn't take the job, by the way.)
I'm not trying to be mean here, but statements intended to be perceived as fact should not be made so casually and without any background in the topic. And I've talked here before about why generalizations are dangerous – even when they're not about issues that can easily be proven one way or another, or where those generalizations appear to be true in all the cases you've seen. Whenever you hear someone say "always" or "never" when it comes to SQL Server, you should at least have your tin foil hat within reach, because it's possible someone is trying to brain-wash you. There are very few scenarios that are absolute, very few statements that are 100% true 100% of the time, and a large majority of statements can and should be suffixed with, "…but it depends on x, y and z." I know how much everyone hates "it depends" but, quite frankly, it almost always does.
Anyway, back to the specific issue at hand. Let's examine the example this person mentioned, and a few counter-examples:
A table with a clustered primary key on the identity column:
CREATE TABLE dbo.foo ( ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, name NVARCHAR(32) -- ,... other columns );
You can even leave out the CLUSTERED keyword, as the default implementation of a primary key in SQL Server is CLUSTERED:
CREATE TABLE dbo.foo ( ID INT IDENTITY(1,1) PRIMARY KEY, name NVARCHAR(32) -- ,... other columns );
So this is the table the person above based their conclusion. But what about tables that violate his definition?
A table with an IDENTITY column but no primary key or clustered index:
CREATE TABLE dbo.foo ( ID INT IDENTITY(1,1), name NVARCHAR(32) -- ,... other columns );
A table with an IDENTITY column but the primary key is on another column and is clustered:
CREATE TABLE dbo.foo ( ID INT IDENTITY(1,1), name NVARCHAR(32) PRIMARY KEY --CLUSTERED -- ,... other columns );
A table with an IDENTITY column but the primary key is on another column and the primary key is not clustered:
CREATE TABLE dbo.foo ( ID INT IDENTITY(1,1), name NVARCHAR(32) PRIMARY KEY NONCLUSTERED -- ,... other columns );
A table with an IDENTITY column represented by a unique clustered index, but the primary key is on another column:
CREATE TABLE dbo.foo ( ID INT IDENTITY(1,1), name NVARCHAR(32) PRIMARY KEY NONCLUSTERED -- ,... other columns ); CREATE UNIQUE CLUSTERED INDEX x ON dbo.foo(ID);
So clearly there are plenty of variants where IDENTITY columns are not necessarily the clustered primary key.
If you want your IDENTITY column to be the primary key, say so. If you want the primary key to be clustered, say so. But most importantly, understand that just because you see something in one place, don't assume that it is always true, or that it is the rule you have to follow – even if you've only ever seen it to be true. Some other areas where this comes up: the "natural order" debate (where people assume that because some query without an ORDER BY returns in a specific order today, that it will continue to order that way tomorrow) and the "short-circuit" debate (where people believe that SQL Server will always process your query in the order it was written). I hope I've dispelled both of those myths in a previous Bad Habits topic and the short-circuiting one more recently in this question on dba.stackexchange; but if you believe any of these are universally true, please feel free to leave a comment. 🙂