Bad Habits to Kick : Using dashes and spaces in entity names
In my last post in this series, I talked about defining varchar columns, parameters, or variables without length. Next I want to talk about using "bad" characters, like spaces or dashes, in entity names.
Every once in a while, I see people who have issues with T-SQL code, or the Management Studio UI, or 3rd party applications – because their server / instance / database / object / column name has a dash, or a space, or starts with a number. Why do we keep doing this to ourselves?
Of course, the easy workaround is to use a square bracket around the offending name. Double quotes are also acceptable, depending on your QUOTED_IDENTIFIER setting, but I prefer square brackets – both because the double quotes look like string delimiters to me, and because you can't always control settings like QUOTED_IDENTIFIER in everyone else's environments.
Using square brackets might be fine for your own T-SQL – though your co-workers will likely curse you every time they have to code around it. But what if it's someone else's code that you can't control? There have been several bugs in the SSMS UI where functionality was broken when it came across a non-standard name. Most of these have been fixed (let me know if you want me to dig a few of these up for you), but who's to say we won't come across another one tomorrow?
I typically add [square brackets] around any questionable names I end up using anyway, even when they are not necessary. For example, the word "Domains" lights up green in SSMS, and "Description" lights up blue, but these are both names that I prefer to use over others that don't cause this visual issue. I also add square brackets around column names that have special meaning, for example [object_id] is present in a lot of DMVs, and lights up pink if you don't use square brackets; I don't want it to be visually confused with the OBJECT_ID() function.
So try to avoid object names with dashes or spaces (including trailing spaces, see #483553), and in general follow the rules for identifiers. Even though this Books Online topic states that you can use them in delimited identifiers, I would strongly recommend staying away from hyphen, tilde, percent, etc. The benefits of using these characters in entity names are dwarfed by the associated costs.
I am working on a series of "Bad habits to kick" articles, in an effort to motivate people to drop some of the things that I hate to see when I inherit code. Up next: using SELECT or RETURN instead of OUTPUT.
Good luck being consistent. It doesn't work in a place where dogs wag more than tails. Lovely observations though.
Meant to add, I'm enjoying reading your articles, Aaron.
Hmmm. I come from an Oracle background and am learning SQL Server for my current project. Interesting discussion, camel case vs. underscore…and both sides make sense but not for cross-over. I am slowly realizing camel case works for SQL Server, and will always express my horror at finding it in Oracle.
My horror comes from Oracle's habit of storing all object names in upper case, with the major exception of Java objects, unless objects are named with double-quotes–then anything goes. The unfortunate result is that SQL Server-based developers will code DDL with table names like 'myVeryLongTableNameThatILike' but that is stored in the data dictionary as 'MYVERYLONGTABLENAMETHATILIKE', which is of course, very unreadable. No GUI client can help with auto-complete formatting as the object name actually is upper case.
Case formatting rules can be like trying to use a scimitar to cut bread and spread the butter; best to use a bread knife for the former and a butter knife for the latter.
MSDN is the last place where I'd expect to learn best practices, but YMMV. 🙂 Whether you should keep the trailing $ or not, I don't know, I guess it depends on whether having it in the column name just because it was the CSV makes sense. To me it is cumbersome to introduce non-alpha (or alphanumeric at worst) characters into column and other entity names. But that is not because it violates some best practice, just that it is more tedious to type (and to talk about in a conversation – "hey do you know why total space value dollar sign is negative for some rows?").
Thanks for the extremely quick answer!
I saw it in a CSV file like "Total Value$" and I was wondering if I want to use it or just replace with the Dollar word.
Perhaps it would be useful if was specified in the column name the currency of the data. (I know that that goes against normalization and beside that you can rename it in a report as well)
Normally I wouldn't use it but I wanted to know the best practice, and hardly found information related to the topic. For instance MDSN says that you are allowed to use it.
Rudy no, I don't think so. Can you explain the benefit of using a dollar sign in an entity name? Also note that it could be confused with things like $partition functions.
Aaron, what about dollar sign? Do you advice to use it and why?
RE: Upper case
While upper case *letters* mark the start of a sentence, upper case words do not. Thus, your argument speaks to using Pascal case rather than all upper case or camel case.
Further, your are arguing against your own point. Upper case words all have similar shapes where as Pascal case words have more unique shapes. Beyond conservation of space, lower case and Pascal cased words are easier to read.
IMO, the reasons for using upper cased keywords has long since past. We have color monitors, color IDEs and even color in our posting of code.
Too many times have I run into a situation where the underscore was obscured by some other element on the screen. Furthermore, the lined paper angle would argue *against* underscores. Try writing words with underscores on lined paper; especially lined paper with dark lines. Can't see them can you? As developers we do not want to simply "pass over" the underscore and knowing of its existence is crucial to understanding the code. Is this a single variable or something else? As Aaron implied, anyone that has written a lot of code knows that writing underscores is a pain in the butt.
"he is right"? About a preference? Give me a break, Celko. If he prefers the Rolling Stones, should I go throw out all my Pink Floyd albums? This is exactly the kind of crap I did *not* want to get into when I decided to put this series together. I don't need you telling me how I should name my databases objects. My point is, and always has been, pick a convention that makes sense for YOU, and stick with it. Don't profess that some convention is "better" than another because it serves your interests in selling more copies of your book. I have a copy but that doesn't mean I have to agree with every single rule you have somehow determined is the one and only way to do something.
>> Aaron, for the lengthy object names, better readability is with the usage of underscore than the camel case <<
Aaron, he is right. Look at the research and not your own habits. Users of Latin, Greek and Cyrillic alphabets have certain reading habits that are incredibly strong.
1) We read Left-to-right, then top-to-bottom. This ordering of 2D symbols shows up in comic strip panels (Western versus Manga versus Chinese), circuit diagrams, maps, etc. and you cannot see it any more than a fish thinks about water.
2) The eye jumps to UPPERCASE letters. They mark the start of a sentence, a proper noun, etc. This means that you can measure the eye movements when someone reads "someThing" — the eye twitches to the "T", jerks to the initial space and tries to continue. Random capitalization is classic WWII readability experiment. It stinks.
3) A Bouma is word-shape that we automatically correct to what we think it should be. In a programming language, we need to UPPERCASE keywords (the compiler will correct spelling) so we benefit from Boumas.
4) Underscores work because we learn to read and write on LINED PAPER. The eye movement is fluid over them and the brain puts them into a whole.
I cover this in more detail in my book SQL PROGRAMMING STYLE. It is based on my years at AIRMICS (US Army Institute for Research in Management Information, Communications, and Computer Sciences).
>> How difficult will it be for Microsoft to make SQL skip non-alphanumeric chars to compare names? <<
Since a data element name is part of a global data model, appears in a data dictionary and is used by ALL other software tools — PRESENT AND FUTURE — how do you suggest that they re-write all that software to accommodate your bad programming practices?
ISO-11179 uses alphas, digits and underscores. That is all. A data element name is "[<role>_]<attribute>_<property>" so you can parse it easily.
Theses rules work with all the X3J languages. So do all the other programming languages I know. I think that most compilers are at 128 characters these days, but I would have to check that to be sure.
Madhivanan, that's your opinion, and you are more than welcome to it! A recurring theme in these posts is, neither is "right" or "the only choice", just be consistent. I will use what I prefer (which is not to have to type underscores); whether that is less readable to you is irrelevant (unless we are working on the same project), because you will use what you prefer. I'm trying to *avoid* religious, emotional and my-syntax-can-kick-your-syntax's-butt types of conversations here, and am not trying to tell anyone what type of syntax is more readable to them. Maybe re-read my entire comment about underscores vs. camel case.
Aaron, for the lengthy object names, better readability is with the usage of underscore than the camel case
dd, I don't like that idea at all. Do you really want all of your code to be interpreted with fuzzy logic? I don't.
Again this is a feature stuck in the last century. The database should allow us to use OrderDetails to refer to a table called "Order details" or Order-details. How difficult will it be for microsoft to make sql skip non alphanumeric chars to compare names? duh?
This is a great series, Aaron! I can feel your pain with each and every one of your suggestions…
I think that using Unicode characters as the separator has to be worse. It has to fall into one of those "just because you can" topics.
We have a slew of customers who name their columns using some unicode character that LOOKS like a space in ANSI, but actually is not. Man was that every frustrating to track down.
Good series Aaron. Convincing people to write good, clean T-SQL is a worthy cause.
I acquired a database that has '-' in the name. I dealt with this for sometime and I have had enough and planning to rename the database by the end of the year.
Chad, I think that follows more under a subjective heading than one of bad habits. Personally, if I have a relational table for customer addresses, I would rather see:
Than any of these:
All in one case makes it very hard to read. The underscore is an unnatural character to type into an entity name. It all depends on your background, so YMMV. For things like this, whatever works for you (and for those around you who also have to maintain your code) is fine with me. Paramount to what convention you use, is whether or not you are logical in your choice, and consistent about using it. I think naming conventions could easily become a huge religious type of battle, whereas avoiding dashes is a relative no-brainer.
I don't like seeing mixed case for database object names either. I have several case sensitive servers where mixed case objects are a problem. Make a standard either all lower case or all upper case and stick to it.
Definitely, using keywords, reserved words, type names etc. is a big topic all on its own.
Another thing about identifier choice, it's best to avoid choosing column names that show up in this query:
from INFORMATION_SCHEMA.COLUMNS c
join sys.types t on c.COLUMN_NAME = t.name
What do you think?
I absolutely agree with this sentiment. The list client I was at had this issue and it made life so much more 'interesting'. And by interest I sure don't mean fun.