Bad Habits to Kick : Using alias types
See the full index.
To clarify what I am talking about here, alias types (more formally, "user-defined data type aliases") used to be called "user-defined data types." Then, when CLR came around in SQL Server 2005, they gave us the ability to create our own CLR user-defined types (UDTs) – a different beast altogether. I won't get into the differences, but to avoid confusion, they came up with a new name (an alias, if you will) for the old-style user-defined data types, which better reflects what they actually do, I guess.
In general, the thinking behind
user-defined data types user-defined data type aliases is that you can use this central definition to declare the properties of a column type across your entire database. For example, if you wanted to always store e-mail addresses in a
varchar(320) column, you could do this:
CREATE TYPE [email] FROM varchar(320) NOT NULL;
Then you could create a table and procedure like this:
CREATE TABLE dbo.Contacts ( ContactID int PRIMARY KEY, EmailAddress [email] UNIQUE ); GO CREATE PROCEDURE dbo.Contact_Create @ContactID int, @EmailAddress [email] AS BEGIN SET NOCOUNT ON; INSERT dbo.Contacts ( ContactID, EmailAddress ) SELECT @ContactID, @EmailAddress; END GO
This would ensure that your developers always use a
varchar(320) column for e-mail addresses, because they are instructed to use the [email] data type instead of picking their own at random. This provides consistency throughout your tables, and prevents you from losing data when someone accidentally declares a variable or parameter as
varchar(255). Of course someone can still do that, if they "forget" that they are supposed to use the
[email] data type… but SQL Server can't help you solve that problem.
One issue with this approach is that the definition of e-mail address (or at least how you understood it at the time you created the type) can change over time. I know that many people use
varchar(255) for e-mail addresses, even though those are far too short (at least according to the standard) and should be Unicode. In this example, if you picked a big enough size (e.g.
nvarchar(320)), it is unlikely to affect you, as the standard hasn't really changed.
Do you want a real example of a supposedly fixed size data element that has changed? ISBN. I would bet that at one point Barnes & Noble thought it might be a good idea to create an alias type called
char(10) so that people wouldn't be using
nvarchar(max), and other silly ways to define the ISBN column. Then when ISBN-13 codes became standard, they would have regretted doing so. Why? Well, let's try modifying an alias type once it's in use; let's say we need to expand our email type from above to support 350 characters. Our instinct would be to use DDL like this:
ALTER TYPE [email] FROM varchar(350) NOT NULL;
But we can't:
Incorrect syntax near 'TYPE'.
In fact, there is no DDL for
ALTER TYPE. (And no,
sp_rename or using synonyms won't help you here, nor is there a system procedure called
sp_altertype.) So can't we just drop the type and re-create it? You would think you could do the following:
DROP TYPE [email]; GO CREATE TYPE [email] FROM varchar(350) NOT NULL;
But you get this error message:
Cannot drop type 'email' because it is being referenced by object 'Contacts'.
There may be other objects that reference this type.
You need to follow this process instead:
- find *all* of the objects that use this alias type (this includes parameters definitions and variable declarations);
- modify them all so that they point at the "regular" type (or a different alias type);
- drop the alias type using DROP TYPE;
- create the alias type with the new definition;
- modify all of your objects to point back at the original alias type again.
Now take the case with ISBN, which might also serve as the primary key and have related tables scattered throughout the database. In addition to the above steps, before you start step 2 above, you will also have to temporarily drop all of the constraints, and drop them in the correct order.
Once you have changed all the references back to the native type, why not just leave them that way? You can point your users at your data dictionary and/or documentation to ensure that they choose the correct data type. At least then when you have data type changes next time, you can phase the new type definition in gradually, instead of an all-or-nothing change.
Further to this, what if you need to use the same custom data type in multiple databases? Now, in addition to the above complications, you must also synchronize the alias types common to all of your databases, or just hope and pray that their definitions will never change.
There are other complications with using alias types that you won't come across when using the built-in types.
CONVERT / CAST
You cannot use CONVERT() or CAST() to change a column, variable or literal to an alias type; if you do, you get:
Type email is not a defined system type.
You also don't get syntax highlighting in Management Studio, so unlike other data type names, your alias type names will appear black instead of blue in query editor windows.
You may also experience issues when you use external or 3rd party providers or drivers to access stored procedures with these alias types, or try to data-bind, or implement Entity Framework, Linq, or other ORM solutions. I'll admit I haven't tried it, but I envision the integration will not be 100% seamless.
Another anomaly with alias types is that you cannot declare #temp tables that use them, unless the alias type also exists in tempdb (but you can use them in @table variables, which also use tempdb; go figure):
Column, parameter, or variable #1: Cannot find data type email.
For further clarification, you can select *into* a new #temp table from a source that includes a column with an alias type, however the #temp table inherits the properties of the base data type (so in this example, the column would be defined as
I hope I have demonstrated that, while tempting, the benefits of alias types are far outweighed by the potential problems they can cause down the road. In fact, I am hard-pressed to think of a single implementation of an alias type that would provide any tangible benefits at all.
CLR UDTs are a slightly different story, but personally I would stay away from those too.
See the full index.
If the email column has changed from nvarchar(128) to varchar(128) then you have to change all the objects referring the "email" anyway.
Having it as an alias simply forces you to do so which is good.
For large scale developments we apply datatype aliases wherever we can to make sure all the developers are using the required data type.
And when an alias has to change then all the developers are forced to review their code.
It does take time, and yes, it is annoying but it helps troubleshooting.
It may not be convenient on small scale projects where you can do changes on the fly, but it is a must on the large scale ones.
Thanks for taking the time to write it, I found the information useful.
I wouldn't say it's a bad habit but I agree with the rest of the comments. UDTs seem like such a great idea but the lack of support in SQL Server make them a pain to use long-term in big production databases.
For example I can personally attest to how requirements often change on a UDT. It's not because the initial requirement gathering was poor, it's that while software stagnates the business (and the world) continues to move on and before you know it there's a new requirement.
However the checklist that the author provided about changing a UDT really only scratches the surface of the pain involved. Go Google for free scripts to change a UDT. Didn't find any? There's a good reason.
If you're using UDTs in the first place it's because you need to standardise on a data type a LOT, so it's going to end up in dozens or hundreds of tables. Add in constraints referencing those columns (indexes, foreign keys, etc).
In some cases you also need to include all views, procedures and functions that reference the data type as well. Of course the DMVs to identify which of those objects use that data type aren't very accurate, you will probably need to repeatedly try dropping the UDT and pick up the blocking object name from the error message, rinse wash and repeat in a loop.
So by the end of this effort you have a list of hundreds of things to "touch" and you're going to have to save, drop, and recreate each one in the correct order. Many will have references that throw innocuous warning messages, which need to be weeded from the real errors.
Even with a lot of fancy scripting (or PowerShell) it's a nightmare. And I don't care how awesome your QA process is, if you even have one, deploying a mass change touching hundreds of things over 5% of the database on possibly dozens of databases are stomach ulcer inducing events.
Cool that some people use diagramming tools that can handle it. I wonder if they can deal with all those edge cases – the ones I dealt with were particularly tricky and so it might not be so difficult for all of them.
Still, I have not had fun with them. I'm uncertain in my confidence of any tool really being able to handle all the edge cases.
The actual "bad habit" is NOT using alias types, it is simply using MSSQL!!! Come on, what were the product designers thinking about ?
Every other database product makes it possible to define data types and change them!
And this is not the only terrible flaw.
Functions ? No side effect allowed!
Arrays ? What is it ???
Table variable instead of arrays ? No way to use them in a expression!
Oh and in MSSQL 2012, they JUST discovered the notion of date formatting, rather than the grotesque and scarse list of allowed date formats still used in MSSQL 2008.
We use alias types all over our data model. (4000+ tables). To change them we just cooked up a procedure that does it.
With sp_refreshsqlmodule it is not so hard to get right. I agree that MS should ship functionality to change alias types out of the box though.
Right, Aaron, but it's frustrating that Alias data types are *this close* to being very useful. If we could redefine them (realizing that this may cause lots of I/O, page splits, etc.) then they would be terrific.
Besides, you don't expect to redefine an alias data type every day.
Yes David, but like a lot of things, we have to tailor our behavior for how the product works, not for how it probably should work.
The problem here, as I see it, is that we SHOULD be allowed to change the definition of an Alias Type even when it is being used by tables. If SQL is smart enough to see that an Alias Type is in use in one or more tables, it should be able to propagate that change.
Of course, this could fail for secondary reasons, such as making an alias type smaller which would cause truncation. But there is no intrinsic reason that changing, say, ISBN from Char(10) to Char(13) or Varchar(13) or NVarchar(13) should fail. The functionality to propagate the change to all affected tables just hasn't been implemented (yet) in SQL Server.
If this were done, alias types would be a WHOLE, WHOLE lot more useful. I think I created a Connect item to suggest this, but I can't find it now.
But David, the difference is that with an alias type you must change *everything* at once… in fact you must change everything first before you can even start to make the actual change. With a native type, you can change things as needed. For example, let's say your UI already imposes a limit on 10-character strings. Now you don't need to edit the UI code immediately, or the stored procedures it passes data to, in order to support 12 characters, until after all the back-end edits are in place. Searching for instances of the data type is the first benefit I've heard that actually makes a little sense. But for me this still isn't worth the potential pain points. All the up-front design in the world isn't going to save you from a manager or client who can't make up his/her mind. Even without that, things change that are beyond your control, such as ISBN.
Let's look at the subject from a different angle.
When making a change of a column from varchar(10) to varchar(12) how do you find out all of the implications of changing it?
The implications will be there whether or not you have used an alias.
You will need to change the column wherever it is used. You will need to make sure all foreign keys and indexes still work. You will need to cover all of the items listed as problems.
From this point of view, having an alias for a data type makes it easier to find each place the column is used. Instead of searching for all the possible names for the column itsel, you just search for all columns with the alias as data type.
Ricardo's comment 'If you are skilled in database design and use the right tools' flies in the face of the real requirement for doing what he and JJEugene have done . . . _having_ the right tools. If your organization has not realized that there is a _need_ for those tools, then you won't _have_ them; if you don't _have_ them, you can't _use_ them.
In other words, it may be fine and dandy to say that the proper way to deal with the problem is to have a neat set of tools that let you easily rebuild your entire database when needed; however, not everyone is blessed with the budget that lets them _get_ the tools, much less letting them have their staff _trained_ on the tools.
For my money, until MS fixes the "Alias" types so that one can redefine them as needed in a convenient manner, I see them as more of a "bad habit" than a "boon to development".
I must concur with JJEugene's and dbaBill's comments. If you are skilled in database design and use the right tools (I also use the ERwin platform) most of the negatives regarding user-defined data types go away. I agree with Aaron that the implementation of this feature is incomplete and should be dealt with in the next version of SQL Server, but if you understand that you won't try to CAST to a user-defined data type, etc.
Most of the problems I've seen with UDTs are rooted in poor up-front analysis, resulting in rework.
Follow Up: Just because we found a way to make it work for our agency doesn't mean that I don't 100% agree with others who have posted that MS should fix this feature. The current implementation *is* shameful. And this post is great because it so clearly explains to people the drawbacks of the current implementation.
We've been using alias types for years without problems. It's been great. The difference for us is that we have no problem re-generating databases a couple times a year/as needed from ERwin/data modeling software. So, if we need to change the definition of an alias type, it is no big deal. We DO change it in one place and it automatically gets applied through the whole model/database.
In other words, I'm suggesting that there is a different way to make changes to alias types than the list of 6 or so steps listed above. Works for us.
Note: Our databases are less than 5 gigabites. Maybe this functionality is only relevant to relatively small databases that can have down time over weekends. But also note that while the dbs are relatively small, they are vital for our agency. They are the important stuff that we live on (our fiscal system, etc.). So, using alias types can be viable for major business needs.
dbaBill, I guess the bad habit I'm pointing out, is that a lot of people blindly implement alias types without being fully aware of the complications. The title can only be so many characters. 🙂
Thanks for the article. I've often wodered if there were any pitfalls to using UDTs and have searched on occassion for any such information without any success other than explaining the 'benefits of UDTs. Thank you for pointing out the other side of the picture,
Because of the fumctionality afforded, I wouldn't call using alias types a bad habit, but one that should be used with full awareness of the complications that you point out. The functionality promised by "alias" types can be extremely valuable. Alias types was pushed heavily by Microsoft back in the day as rules and defaults could be defined once in a single place for a specified type and be enforced throughout the db. It would be nice if MS could fix the issues so its customers could take advantage of needed functionality instead of relying on base types with a proliferation of redundant constraint definitions. One helpful addition might be to copy Oracle's functionality of defining a type by referencing the way it's stored in a table.
I agree that this type of functionality would be useful, if it worked well. What I meant in my last paragraph is that there isn't an implementation case I could think of that makes sense, given the way the functionality currently works. 🙂
That's a great series, Aaron!
Actually we badly need the ability to define a type once and use it many times. To accomplish that, I use macros and run a standard C++ preprocessor against all my SQL code to expand them.
For example, to convert in my include file I used to define
#define MAX_VARCHAR VARCHAR(8000)
In my code, I can write this all over my system:
which will expand to
Converting to 2005, I replaced the macro with:
#define MAX_VARCHAR VARCHAR(MAX)
and let SQL Compare take care of all the details
noeldr, understood why you *want* to put it in model (and I stated as much in my response). Just clarifying that you do not need the type to exist in model or in tempdb at all to handle only the "select into" case, at least in 2008.
Not sure about 2000, and I don't have an instance handy to test on.
This behavior was introduced in 2000.
I stand corrected with 2005/2008
The reason that model should have it is that once you start using tempdb you don't want to check for the existence of it every time.
Yeesh! What a can o' worms!
Excellent post… Excellent series!
noeldr, that is not my experience. Here is a simple repro where I create a type called noeldr, then select into a #temp table, without first creating that type in either tempdb or model. This is SQL Server 2008 SP1.
CREATE DATABASE noeldr;
CREATE TYPE noeldr FROM VARCHAR(10) NOT NULL;
CREATE TABLE dbo.noeldr(noeldr noeldr);
INSERT dbo.noeldr SELECT 'noeldr';
SELECT * INTO #noeldr FROM dbo.noeldr;
SELECT * FROM #noeldr;
DROP DATABASE noeldr;
Now, if you want to use CREATE TABLE #foo(noeldr noeldr); the type will have to exist in tempdb. And if you want to ensure it will be in tempdb tomorrow, and next week, and next year, you should probably put it in model too, but strictly speaking that isn't required (e.g. you could always create the type in tempdb, create the #temp table, then drop the type when you are done and have dropped the #temp table).
Let me add one more to the pile:
If you use select into #t from … You will need those types in MODEL and tempdb… any failure to do that and those statements just won't work.
I think we were investigating this at one point, used it once for some minor field in one or two tables, and hit this exact same problem. At that point we converted those columns to base types and stopped using user defined types altogether. I've avoided them since.
If making a single change to the UDT reflected that change everywhere, that would be cool, but as they work now they are not very useful.
Wow, I'm sold.
I never realized how chiseled-in-stone alias datatypes could be.