See the full index.
When developing stored procedures, there seems to be a lot of emphasis on "get it done fast." Which means type all lower case, pay little attention to formatting, and sometimes throw best practices out the window. Personally, I would rather front-load my development time; I think that the costs I pay in initial development are much lower than what I might have paid in maintenance down the road. Making readable and maintainable code that also performs well and is delivered in a timely manner is something that a lot of us strive for, but we don't always have the luxury. But I have found that it is very easy to fall into the good kind of development habits.
A popular adage:
You can have it good, cheap, or fast. Pick two.
I contend that if you develop habits like these and use them in all of your database programming, the time difference between following those methods and doing it the "lazy" way will be negligible at most; and so, fast and good go hand in hand, rather than trade off for one another.
Once in a while this "disorder" slows me down. I come across code that someone else wrote, and I can't even bear to look at it without first re-writing it. Here is a realistic example of the kinds of procedures I see:
create proc foo(@i int,@bar int=null,@hr int output,@xd datetime) as declare @c varchar declare @s nchar(2) declare @x int set @grok='Beverly' set @korg='MA' set @x=5 select customers.customerid,firstname,lastname,orderdate from customers join orders on customers.customerid=orders.customerid where status=@i or status<=@bar and orderdate<=@xd set @hr = @@rowcount select customers.customerid,count(*) from customers left join orders on customers.customerid=orders.customerid where customers.city=@c and customers.state=@s group by customers.customerid having count(*)>=@x return (@@rowcount)
This kind of feels like the 5th grade all over again, but when I get handed code like this, I start immediately visualizing one of those "what's wrong with this picture" exercises, and feel compelled to fix them all. What is wrong with the above sample, you may ask? Well, let me go through my own personal (and quite subjective) subconscious checklist of best practices when I write my own stored procedures. I have never tried to list these all at once, so I may be all over the place, but hopefully I will justify why I choose to have these items on my checklist in the first place.
Upper casing keywords
I always use CREATE PROCEDURE
and not create procedure
or Create Procedure
(and definitely not CREATE PROC
). Same goes for all of the code throughout my objects… you will always see SELECT, FROM, WHERE
and not select, from, where
. I just find if much more readable when all of the keywords are capitalized. It's not that hard for me to hold down the shift key while typing these words, and there are even IDEs that will do this kind of replacement for you. This is probably one of the few areas where Celko and I actually agree. 🙂
Now, however, I try to use lower case for all data types, and talk about my reasons in this post.
Sensible, consistent naming scheme
Obviously "foo" is a horribly ridiculous name for a procedure, but I have come across many that were equally nondescript. I like to name my objects using {target}_{verb}
. For example, if I have a dbo.Customers
table, I would have procedures such as:
dbo.Customer_Create dbo.Customer_Update dbo.Customer_Delete dbo.Customer_GetList dbo.Customer_GetDetails
This allows them to sort nicely in Object Explorer / Object Explorer Details, and also narrows down my search quickly in an IntelliSense (or SQL Prompt) auto-complete list. If I have stored procedures named in the style dbo.GetCustomerList
, they get mixed up in the list with dbo.GetClientList
and dbo.GetDentistDetails
.
I NEVER name stored procedures using the sp_
prefix; see my post here. Or just ask anybody. 🙂 I also avoid other identifying object prefixes (like usp_
). I don't know that I've ever been in a situation where I couldn't tell whether some object was a procedure, or a function, or a table, and where the name really would have helped me more than the context. This is especially true for the silly (but common) tbl
prefix on tables. I don't want to get into that here, but I've always scratched my head at that one. Views may be the only place where I think this is justified, but then it should be a v
or View_
prefix on the views only; no need to also identify tables… if it doesn't have a v
or View_
prefix, it's a table!
More important than coming up with a proper naming scheme (because that is mostly subjective), it is much more important that you apply your naming scheme consistently. Nobody wants to see procedures named dbo.Customer_Create
, dbo.Update_Customer
, and dbo.GetCustomerDetails
.
Specifying schema
I always specify the schema prefix when creating stored procedures. This way I know that it will be dbo.procedure_name
no matter who I am logged in as when I create it or call it. Similarly, my code always has the schema specified when referencing any object bound to a schema (except #temp tables). This prevents the database engine from checking for an object under my schema first, and also avoids the issue where multiple plans are cached for the exact same statement/batch just because they were executed by users with different default schemas. See this post and this post for more details.
Parentheses around parameter lists
I am not a big fan of using parentheses around the parameter list. I can't really explain it, as I am a proponent of consistency, and this is the syntax required when creating user-defined functions. But I wanted to mention it because you will not see any of my stored procedures using this syntax (it's just extra fluff that harms my readability). I'm open to change if you can suggest a good enough reason for me to do so.
Lining up parameters, types, values
For a little more work up front, I find this much easier to read:
CREATE PROCEDURE dbo.User_Update @CustomerID int, @FirstName nvarchar(64) = NULL, @LastName nvarchar(64) = NULL, @EmailAddress nvarchar(320), @Active bit = 1, @LastLogin smalldatetime = NULL AS ...
…than this:
CREATE PROCEDURE dbo.User_Update @CustomerID int, @FirstName nvarchar(64) = NULL, @LastName nvarchar(64) = NULL, @EmailAddress nvarchar(320), @Active bit = 1, @LastLogin smalldatetime = NULL AS ...
Liberal use of whitespace
This is a simple one, but in all comparison operators I like to see spaces between column/variable and operator. So instead of @foo int=null
or where @foo>1
I would rather see @foo int = NULL
or WHERE @foo > 1
.
I also tend to place at least a carriage return between individual statements, especially in stored procedures where many statements spill over multiple lines.
Both of these are just about readability, nothing more. While in some interpreted languages like JavaScript or parsed text like CSS, size is king, and compressing / obfuscating code to make it as small as possible does provide some benefit, in T-SQL you would be hard-pressed to find a case where this comes into play (I talk about how an obnoxious amount of comments can affect performance here, but how little tabs vs. spaces can matter here). So, I lean to the side of readability.
Prefixing names with data type
I often see prefixes like @iCustomerID
, @prmInputParameter
, @varLocalVariable
, and @strStringVariable
. I realize why people do it, I just think it muddies things up. It also makes it much harder to change the data type of a column when not only do you have to change all the variable/parameter type declarations but you also have to change the actual name of the parameter as well (or leave it with an incorrect data type prefix). This filters down to the data layer, too.
Just name the variable for what it means, and not how it is implemented. If you have a column EmailAddress nvarchar(320)
then make your variable/parameter declaration @EmailAddress nvarchar(320)
. No need to use @strEmailAddress
or @varchar320_EmailAddress
. If you need to find out the data type, just go to the declaration line (or the metadata)!
Specifying lengths on parameters
I occasionally see people define parameters and local variables as char or varchar, without specifying a length. This is very dangerous, as in some situations you will get silent truncation at 30 characters, and in others you will get silent truncation at 1 character. This can mean data loss, which is not very good at all; I talk more about this in this post. I have asked that this silent truncation at least become consistent throughout the product (see Connect #267605), but nothing has happened yet. Fellow MVP Erland Sommarskog has gone so far as to ask for the length declaration to become mandatory (see Connect #244395) and, failing that, feels that this should be something that raises a warning when using his proposed SET STRICT_CHECKS ON
setting (more details).
Listing output parameters last
My habit is to list OUTPUT
parameters last. I am not sure why that is exactly, except that it is the order that I conceptually think about the parameters… in then out, not the other way around.
Using BEGIN / END liberally
I have seen many people write procedures like this:
CREATE PROCEDURE dbo.ProcedureA AS SELECT * FROM foo; GO SELECT * FROM bar; GO
They create the procedure, and then wonder why they only get results from foo
when they run the procedure. If they had done this:
CREATE PROCEDURE dbo.ProcedureA AS BEGIN SELECT * FROM foo; GO SELECT * FROM bar; END GO
GO
is not a T-SQL keyword, but rather a batch separator for tools like Query Analyzer and SSMS. So with the BEGIN/END
wrapper, they would have received these error messages, one from each batch:
Incorrect syntax near ';'.
Msg 102, Level 15, State 1, Line 2
Incorrect syntax near 'END'.
Yes, errors are bad, and all that, but I would rather have this brought to my face when I try to compile the procedure, than later on when the first user tries to call it.
Another variation of this is when someone initially writes this code:
IF ([some condition]) EXEC dbo.SomeProcedure;
Later, someone may want to add another call to the outcome of this branch:
IF ([some condition]) EXEC dbo.SomeProcedure; EXEC dbo.SomeOtherProcedure;
Then they wonder why dbo.SomeOtherProcedure
gets called no matter the result of [some condition]
. It may seem tedious to write this for one line of code, but it's worth it in the long run IMHO:
IF ([some condition]) BEGIN EXEC dbo.SomeProcedure; END
Using statement terminators
I have quickly adapted to the habit of ending all statements with proper statement terminators (;
). This was always a habit in languages like JavaScript (where it is optional) and C# (where it is not). But as T-SQL gets more and more extensions (e.g. CTEs) that require it, I see it becoming a requirement eventually. Maybe I won't even be working with SQL Server by the time that happens, but if I am, I'll be ready. It's one extra keystroke, and guarantees that my code will be forward-compatible. More details here.
Using SET NOCOUNT ON
I always add SET NOCOUNT ON;
as the very first line of the procedure (after BEGIN
, of course). This prevents DONE_IN_PROC
messages from needlessly being sent back to the client after every row-affecting statement, which increases network traffic and can fool some applications into believing there is an additional recordset available for consumption.
I do not advocate blindly throwing SET NOCOUNT ON
into all of your existing stored procedures (in fact, in modern versions of SQL Server, I struggled to demonstrate any gain). Existing applications might actually already be working around the "extra recordset" problem by explicitly skipping past it, or may even be using its output. If you code with SET NOCOUNT ON
from the start, and keep track of rows affected in output parameters when necessary, this should never be an issue. Roy Ashbrook got beat up about this topic at a Tampa Bay code camp, and wrote about it here.
Declaring local variables
When possible, I always use a single DECLARE
statement to initialize all of my local variables. Similarly, I try to use a single SELECT
to apply values to those variables that are being used like local constants. I see code like this:
declare @customerid INT declare @orderid INT set @CustomerId = 5 set @OrderID = 6 declare @Status INT set @status = -1
And then some more declare and set statements later on in the code. I find it much harder to track down variables in longer and more complex procedures when the declarations and/or assignments can happen anywhere… I would much rather have as much of this as possible occurring in the beginning of the code, and also would really expect to see the case being consistent. So for the above I would rather see:
DECLARE @CustomerID int, @OrderID int, @Status tinyint; SELECT @CustomerID = 5, @OrderID = 6, @Status = -1;
As a bonus, in SQL Server 2008, the syntax now supports changing the above into a single statement:
DECLARE @CustomerID int = 5, @OrderID int = 6, @Status tinyint = -1;
So much nicer.
Also, some people like listing the commas at the beginning of each new line, e.g.:
DECLARE @CustomerID int ,@OrderID int, ,@Status int;
Not just in variable declarations, but also in parameter lists, columns lists, etc. While I will agree that this makes it easier to comment out individual lines in single steps, I find the readability suffers greatly.
Using table aliases
I use aliases a lot. Nobody wants to read (never mind type) this, even though I have seen *many* examples of it posted to the public SQL Server newsgroups, Stack Overflow, and other forums:
SELECT dbo.table_X_with_long_name.column1, dbo.table_X_with_long_name.column2, dbo.table_X_with_long_name.column3, dbo.table_X_with_long_name.column4, dbo.table_X_with_long_name.column5, dbo.table_H_with_long_name.column1, dbo.table_H_with_long_name.column2, dbo.table_H_with_long_name.column3, dbo.table_H_with_long_name.column4 FROM dbo.table_X_with_long_name INNER JOIN dbo.table_H_with_long_name ON dbo.table_X_with_long_name.column1 = dbo.table_H_with_long_name.column1 OR dbo.table_X_with_long_name.column1 = dbo.table_H_with_long_name.column1 OR dbo.table_X_with_long_name.column1 = dbo.table_H_with_long_name.column1 WHERE dbo.table_X_with_long_name.column1 >= 5 AND dbo.table_X_with_long_name.column1 < 10;
As long as you alias sensibly, you can make this a much more readable query:
SELECT X.column1, X.column2, X.column3, X.column4, X.column5, H.column1, H.column2, H.column3, H.column4 FROM dbo.table_X_with_long_name AS X INNER JOIN dbo.table_H_with_long_name AS H ON X.column1 = H.column1 OR X.column2 = H.column2 OR X.column3 = H.column3 WHERE X.column1 >= 5 AND X.column1 < 10;
The AS
when aliasing tables is optional; I have been trying very hard to make myself use it (only because the standard defines it that way). When writing multi-table queries, I don't give tables meaningless shorthand like a, b, c or t1, t2, t3. This might fly for simple queries, but if the query becomes more complex, you will regret it when you have to go back and edit it. (Also, keep in mind you should always qualify column names with their table alias if your query does (or may someday) reference more than one table.)
Using column aliases
I buck against the trend here. A lot of people prefer to alias expressions / columns using this syntax:
[column expression] AS alias
I much prefer:
alias = [column expression]
The reason is that all of my column names are listed down the left hand side of the column list, instead of being at the end. It is much easier to scan column names when they are vertically aligned.
In addition, I always use column aliases for expressions, even if right now I don't need to reference the column by an alias. This prevents me from having to deal with multiple errors should I ever need to move the query into a subquery, CTE, or derived table.
Using consistent formatting
I am very fussy (some co-workers use a different word) about formatting. I like my queries to be consistently readable and laid out in a predictable way. So for a join that includes a CTE and a subquery, this is how it would look:
WITH cte AS ( SELECT t.col1, t.col2, t.col3 FROM dbo.sometable AS t ) SELECT cte.col1, cte.col2, cte.col3, c.col4 FROM cte INNER JOIN dbo.Customers AS c ON c.CustomerID = cte.col1 WHERE EXISTS ( SELECT 1 FROM dbo.Orders o WHERE o.CustomerID = c.CustomerID ) AND c.Status = 'LIVE';
Keeping all of the columns in a nice vertical line, and visually separating each table in the join and each where clause. Inside a subquery or derived table, I am less strict about the visual separation, though I still put each fundamental portion on its own line. And I always use SELECT 1 in this type of EXISTS() clause, instead of SELECT * or SELECT COUNT(*), to make it immediately clear to others that the query inside does NOT retrieve data.
Matching case of underlying objects / columns
I always try to match the case of the underlying object, as I can never be too certain that my application will always be on a case-sensitive collation. Going back and correcting the case throughout all of my modules will be a royal pain, at best. Going forward, IntelliSense makes this really easy, but to see why it's important, see this post.
Qualifying column names with table/alias prefix
I mentioned this above but, to reiterate, I always qualify column names when there is more than one table in the query. Heck, sometimes I even use aliases when there is only one table in the query, to ease my maintenance later should the query become more complex. I won't harp on this too much, as Alexander Kuznetsov has treated this subject quite well.
Using RETURN and OUTPUT appropriately
I never use RETURN
to provide any data back to the client (e.g. the SCOPE_IDENTITY()
value or @@ROWCOUNT
). This should be used exclusively for returning stored procedure status, such as ERROR_NUMBER()
. If you need to return data to the caller, use an existing resultset or, better yet, an OUTPUT
parameter. See this post for more details.
Avoiding keyword shorthands
I always use full keywords as opposed to their shorthand equivalents. BEGIN TRAN
and CREATE PROC
might save me a few keystrokes, and I'm sure the shorthand equivalents are here to stay, but something just doesn't feel right about it.
Same with the parameters for built-in functions like DATEDIFF()
, DATEADD()
, DATENAME()
, and DATEPART()
. Why use WK
or DW
when you can use WEEK
or WEEKDAY
? Why use D
or YYYY
when you can use DAY
or YEAR
? Make your code self-documenting, and prevent careless bugs – see this video and this post for more details.
Finally, I always explicitly say INNER JOIN
, LEFT OUTER JOIN
, and CROSS JOIN
… never just JOIN
, LEFT JOIN
, or a comma. Again, no real good reason behind that, just habit and the latter is more descriptive of intent (if I write out CROSS JOIN
, the reader – who may be future me – knows I did that on purpose).
Parentheses around AND / OR logic
I always group my clauses when mixing AND
and OR. Leaving it up to the optimizer to determine what "x=5 AND y = 4 OR b = 3" really means is not my cup of tea. I wrote a very short article about this a few years ago.
Summary
So, after all of that, given the procedure I listed at the start of the article, what would I end up with? Assuming I am using SQL Server 2008, and that I can update the calling application to use the right procedure name, to use sensible input parameter names, and to stop using return values instead of output parameters:
CREATE PROCEDURE dbo.Customer_GetOlderOrders @OrderStatus int, @MaxOrderStatus int = NULL, @OrderDate smalldatetime, @RC1 int OUTPUT, @RC2 int OUTPUT AS BEGIN SET NOCOUNT ON; DECLARE @City varchar(32) = 'Beverly', @State char(2) = 'MA', @MinOrderCount int = 5; SELECT c.CustomerID, c.FirstName, c.LastName, c.OrderDate FROM dbo.Customers c INNER JOIN dbo.Orders o ON c.CustomerID = o.CustomerID WHERE ( o.OrderStatus = @OrderStatus OR o.OrderStatus >= @MaxOrderStatus ) AND o.OrderDate <= @MaxOrderDate; SET @RC1 = @@ROWCOUNT; SELECT c.CustomerID, OrderCount = COUNT(*) FROM dbo.Customers c LEFT OUTER JOIN dbo.Orders o ON c.CustomerID = o.CustomerID WHERE c.City = @City AND c.State = @State GROUP BY c.CustomerID HAVING COUNT(*) >= @MinOrderCount; SET @RC2 = @@ROWCOUNT; END GO
Okay, so it LOOKS like a lot more code, because the layout is more vertical. But you tell me. Copy both procedures to SSMS or Azure Data Studio, and which one is easier to read / understand? And is it worth the three minutes it took me to convert the original query? It took me a few hours to convert this list from my subconscious to you, so hopefully I have helped you pick up at least one good habit. And if you think any of my suggestions are BAD, please drop a line and let me know why!
See the full index.
I agree with Rajen. This is a great reference even 8 years out. We recently had a problem come up that would go into your "Matching case of underlying objects / columns" paragraph. In the stored procedures (SProcs), the underlying column name case should be used and is done easily using inteli-sense (at least in 2012/2014).
When importing an SProc using entity framework (EF), it uses the names (AND CASE) of the select statement to create the entities for complex types. In SSMS, you can have a select on mytable.id when the column in the table is "ID"…it works all the same. Of course c# cares – "id", "ID", "Id" are all different. So if someone else along the way decides to "fix" the SProc with the correct case, it gets reimported into EF in a project, any code referencing property.id (what it was) now gets the squigglies because it is now property.ID.
The ultimate fix would be EF doing more of the work and looking at the underlying column properties including the name and creating the complex types for the edmx using that capitalization instead of going with what is in the SProc.
Well, that was longer than expected. Point being…good resource.
Agree totally with Rajen, even after all this time! Find it very interesting that I've been writing my sql almost exactly like in the checklist. Specify the type of join, capitalization and even commas not in front.
Great minds think alike… or fools never differ!
Hi Aaron
Even after all these years this is an ace article as a reference.
Shows how well it is written.
Thanks
Excellent post! I do almost everything in this article except I don't like to capitalize keywords (like SELECT). I also don't use begin and end in stored procedures (though I always do with looping), and I don't use the words "inner" and "outer" in joins. Your code looks exactly like mine. I really dig this article.
Cheers and Happy Holidays,
Armen Alexanian
Las Vegas, NV
Great article
Nice article! Thank you.
@Ed I agree, 6 of one, half dozen of the other. The argument I get back though is that people are more commonly adding/removing columns from the end of the list than the beginning, but *shrug* it's just not as common such as to override readability for me, anyway.
"Also, some people like listing the commas at the beginning of each new line… While I will agree that this makes it easier to comment out individual lines in single steps,"
It doesn't make it easier to comment out the first line. So it's not really an improvement IMHO. If you have trailing commas on each line but the last, then you can comment out any line except the last one. Either way there is one annoying line which is different.
Hi Aaron,
I am using MySQL from the command line (Korn shell on RHEL 6.2) to execute the stored proc. The user that needs to execute runs it from cron and posts the results to a web page – they want to see the time it takes for each of the 4 queries. I *could* split the queries into 4 separate stored procedures, but this seems like over kill.
Thanks for the quick response!
Joe
@Sebastien, I don't think there is any value in having a bunch of stored procedures that just in turn call the big procedure – after all, if you give your parameters sensible names and actually use them explicitly when calling stored procedures, there's not much difference between:
EXEC dbo.GetCompetitionsForSeason @SeasonID = 10;
…and…
EXEC dbo.GetCompetitions @SeasonID = 10;
As for the big multi-function stored procedure, I call that the kitchen sink.
Do you think it is better to have one single stored procedure; for example GetCompetitions with many many parameters that are also dependent on each other and also mutually exclusive (for example if you give it a parameter season and competition type and year, there will be no results)? Or is it better to create many more descriptive stored procedures that call this huge one (not to repeat the code) and have a smaller set of parameters in each. For example; GetCompetitionsForSeason where season is the only parameter?
Kind regards,
Sebastijan
Joe, that looks like you are using some tool other than Management Studio to run your queries. Can you provide some more details?
Enjoyed the blog – great points! I found it while searching for an answer that I never found after over an hour of searching. If you are so inclined and know the answer, I'd appreciate the help! I have a stored proc that executes four queries (2 deletes and 2 updates). After all three complete, I only see the status from the LAST query:
Query OK, 5 rows affected (0.00 sec)
Rows matched: 5 Changed: 5 Warnings: 0
If anyone knows how to get the status message from each query, I'd be very grateful!
Thanks Aaron, this is really good stuff. Looks to me like you should have been an Oracle PL/SQL guy as most of the things that you recommend are standard practice on this platform 😉
Hi guys,
I agree with most of your suggestions and actually use them daily. However, my point (suggestion) is that I line up the keywords sort of from the right as below.
SELECT col1, col2, col3, col4
FROM Table1 da
INNER JOIN Table2 db
ON col3 = col4
AND col5 = col6
INNER JOIN (SELECT pk
FROM Table99 xc
INNER JOIN Table77 xd
ON xc.col4 = xd.col5) v
ON da.col3 = v.col1
WHERE da.col99 = 'abc123'
AND v.col88 = 'whatever'
This makes it very easy for me to quickly analyse the joins and especially the "AND"'s and what line they actually belong to (note the AND in the first inner join is clearly visible to go with the JOIN and nothing else.
The indentation is the most important to me, because I have to be able to very quickly analyse existing sp's with 1000's of lines, hence legibility is of utmost importance to me.
nice post
Great article, thanks for sharing some great tips! One more thing I'd like to add:
I used to use the ALL CAPS convention for keywords. However, I've had to work with some really complicated queries involving many layers of subqueries (not by choice, editing other people's queries). So I found it useful to switch case to denote which level of subquery I'm working on, for example:
SELECT –1st level ALL CAPS
a.col1
, b.col2
FROM
TableA AS a
INNER JOIN (
Select –2nd level Initial Caps
x.col1
, y.col2
From
TableX As X
Inner Join (
select –3rd level lower case
d.col1
, e.col2
from
TableD as D
inner join (
SELECT –4th level back to ALL CAPS, and so on
z.col1
FROM
TableZ AS z
WHERE
z.col1 IS NOT NULL
) as e on d.id = e.id
where
e.id IS NOT NULL
) As y On y.id = z.id
Where x.col2 IS NOT NULL
) AS b ON a.id = b.id
WHERE
a.col2 > 2
Good one to learn..
<html><ahref="www.oracletrainingchennai.in">Oracle Training</a>
</html>
Nice artical…
Very good article Aaron.
I want to ask if I want to create three stored procedures and they all return the same data but accept different parameters for example
dbo.EmployeeAccountResetCodeRead(@EmployeeId int)
dbo.EmployeeAccountResetCodeRead2(@ResetCode nvarchar(100)
dbo.EmployeeAccountResetCodeRead3(@Email nvarchar(100),@ResetCode nvarchar(100),UserNameId int)
so in that case what naming convention would you recommend as suffix 2,3,4… for stored procedure is a kind of mess and i don't think in SQL Server databases we can create overloaded stored procedures?
Thanks
Here it is 2014 and this is the first time I've ever seen this article, then again this is the first time the topic has ever been apropos to my work situation. Excellent article and as true today as ever!
Table aliases: "But as long as you alias sensibly, you can make this a much more readable query".
I disagree. I find it MUCH harder to mentally juggle two sets of identifiers: The real table/view names, and the aliases that are only temporary for this expression.
Also, the alias that's given to each table can vary from proc to proc; from programmer to programmer in the same database; and from section to section in the same proc, while the actual table name stays constant.
I typically don't use any table aliases in my expressions, even when there are multiple JOIN Statements. Code without table aliases is NOT any less "readable", even when the table names are long. Especially when working a lot with the same database, we KNOW what the table names mean. We can all read pretty fast, can't we? Table names are not usually as "fake long", and as similar, as the example shows.
If typing speed is the issue, use copy and paste. (Is anyone's speed of creating quality SQL code actually limited to how fast they can type? I doubt it.)
vvv
Great work and a great to learn
@Disappointed, SQL is a low level language and "pretty" is very important when it comes to maintainable code. Well written! Still getting hits after 5 years!
Charlie, I don't understand your comment. (1) There are plenty of reasons to use stored procedures that have absolutely nothing to do with performance. (2) In modern versions of SQL Server, I'm not sure how switching to stored procedures helps fix performance problems. Maybe you could blog about this to elaborate?
All good advice, but you forgot the most important rule of all: Don't use stored procedures unless you're trying to solve a performance issue.
nice
Great Post!
Very nice article! Thanks Aaron!
Thanks Aaron Bertrand, your post is really nice and helpful. We are applying the important things in our project.
Great post, I might as well show my JOIN syntax as well:
SELECT
columns
FROM
table1 AS t1
INNER JOIN table2 AS t2 ON
t1.column = t2.column
AND t1.anotherColumn = t2.anotherColumn
INNER JOIN table3 AS t3 ON
t2.column = t2.column
…
I like having the JOIN clauses indented, it seems to make it easier to zero in on ts happening.
I like to keep my JOIN clauses indented.
Great Post! Thank you!
Epic thread: comments still going strong after 5 years!
Great stuff, Aaron, thank you. Here's one tip that I gleaned from a talk once; though it's at odds with most published code, I like it b/c it visually aligns all the table names that have been joined.
Of course, you have to look to the end of the lines for INNER vs. LEFT vs RIGHT etc., but to me it's worth it to quickly have all the tables lined up in a column:
SELECT columnlist
FROM
FirstTable ft INNER JOIN
SecondTable st ON ft.col=st.col INNER JOIN
ThirdTable tt ON st.col2=tt.col2
etc.
I'd hoped this would be best practices for procedures that run better, not that look pretty – and you've shown us all that "pretty" is in the eye of the beholder. I've got enough RMD from scrolling through procs already, I certainly don't need to stretch them out any more. Like you, I have strong feelings about consistency and readability, but I think there's also a lot to be said for flexibility and your rigid conformity to your standards is a little too obsessive/compulsive.
Thanks Aaron. My path is clear, my conscience is still.
🙂
Carl,
I would name them both CustomerID. The same entity should have the same name (and same data type, same precision, same constraints, etc.) throughout the model whenever possible, that way there is never any confusion.
That said, there are exceptions. For example, let's say you have a refer-a-friend program, you would potentially have a table like this, where you couldn't call both CustomerID, and it actually doesn't make sense to call either of them CustomerID:
CREATE TABLE dbo.Referrals
(
ReferringCustomerID INT NOT NULL FOREIGN KEY …,
ReferredCustomerID INT NOT NULL FOREIGN KEY …,
PRIMARY KEY(ReferringCustomerID, ReferredCustomerID)
);
But those exceptions should be relatively limited, or not present at all, in a large number of typical models. In most cases you should be able to call a CustomerID a CustomerID no matter where it appears…
Point taken 🙂
So, for example, if I have a Customers table and a Sales table, how would you name the CustomerID column in each of those tables? I ask not to start a holy war (that started as soon as there were two programmers in the world) but rather to use best practices as I continue to learn.
Thanks very much Aaron,
Carl
Thanks Carl. I'd argue that leaving the code that way is a terrible justification for wanting different column names depending on the table. Customer.Id = Sales.CustomerId makes me want to vomit. Id = CustomerId makes me want to jump off a cliff. Column references in multi-entity queries should have table/alias prefixes in all cases, even when the relationship is obvious to you. This is even more important in the select list and where clause, where it is even less obvious which column(s) belong to which table(s).
Great article Aaron. Thanks for taking the time.
One question and an observation.
I didn't see mention of column naming conventions; since this was a stored procedures checklist, that kind of makes sense. 🙂
I have been a proponent of using the same column name in every instance of the same data across tables; i.e., Table1.CustomerID = Table2.CustomerID = Table3.CustomerID, etc. That way, I can know that the column I need to join to in Table2 is the same name as the column in Table1.
That is, until we were handed a project that required looking for a specific instance of a specific column across 4 servers and hundreds if not thousands of stored procedures, views, triggers, functions, etc. The kicker was that most of these objects did not reference the table where the column exists; the code (for example) was just col1 instead of table1.col1. Anyway, I am now rethinking my philosophy of matching column names across tables.
Your thoughts about that?
And the observation:
Thank you sincerely for your input, advice, suggestions, help, encouragement, etc. on the Microsoft SQL Server newsgroups through the years. Very deeply appreciated!
Carl
Cheers Aaron.
I don't use prefixes for column names. I see these are 'public' or 'on show' for for non-tech folk. So need to be obvious/English.
But in code I do like to see what I am doing, especially when setting one variable to another variables value, so know if conversions are needed etc. So (respectfully) I will keep on doing it in T-SQL. ha ha!
@amedo depends on the length of the expressions I suppose. Sometimes in one line, sometimes indented, etc.
@Daryl no I don't, for the same reason I don't use column names like intFoo, intBar, etc. If the data type changes later, why go back and refactor all that code? @CustomerID is a customer ID. If the underlying data type is important, the person writing the code should be aware of the data model as opposed to relying on a variable name (which may or may not be accurate at the time).
As a newbie to coding SQL, I am looking to see what conventions other are using to make for good clean, readable code. I'm from a VBA background and quite strict on myself & my code, so this article is great. Really like it. Good tips. Thanks!
One question. Do you use variable naming conventions? So that the prefix for a variable denotes its type. This is something I use & like in VBA. What say you?
eg
DECLARE
@intFoo INT,
@intBar INT,
@vcName varchar(100);
etc?
What are you doing (outlining) in case of using WHEN CASES and subqueries?
Don, thanks for the comments. Just one nit – alias = column/expression syntax has not been deprecated. It is only when alias is enclosed in string delimiters that it is deprecated. Microsoft themselves say:
<snip>
The only syntax we are deprecating is 'column_alias' = expression. All other syntaxes are still valid:
column_alias = expression
expression [as] column_alias
expression [as] "column_alias"
expression [as] [column_alias]
expression [as] 'column_alias'
</snip>
From:
http://web.archive.org/web/*/http://web.archive.org/web/*/https://connect.microsoft.com/SQLServer/feedback/details/125027/unclear-deprecation-of-column-alias-in-select-list
I talk a lot more about it here:
/blogs/aaron_bertrand/archive/2012/01/23/bad-habits-to-kick-using-as-instead-of-for-column-aliases.aspx
This confuses a lot of people. This guy said on his blog that he has to switch to AS:
http://thepursuitofalife.com/string-literals-as-column-aliases-deprecated-in-sql-server/
His blog doesn't allow me to comment, but I want to say, no you don't, you just need to remove the single quotes!
Hah, I was just browsing around google in relation to an SSMS issue I am having, and "Aaron Bertand's stored procedure best practices" happened to appear in the list.
I just had to read it. You and I think in very similar ways, so I was not surprised to see you advocating just about all the same things that I advocate to the devs at my work.
I do use parentheses for procedure parameters, but I can't give a compelling reason why someone else should. I just do it because I started as a C programmer many years ago. And I don't uppercase keywords. Purely aesthetic preference.
I also prefer the alias = column syntax for the same reason you do. Too bad it's been deprecated (yeah, I know, major necro…)
I think I might link this page to the guys in my team.
perfect even i was following this naming convention for stored procedures.
Hi Aaron,
Thanks for this.
I am new to T-SQL and as I learn I'm trying to teach myself strict formatting so it becomes habit. I'm not happy with a query, procedure, etc until it is correct formatted and easily readable.
I had wondered if I was being to fussy as a lot of code I've seen tends to be in all lowercase, messy or just all over the place, but it's good to know a lot of people agree with neat organised code.
Great read!
Cheers
Jason
Hey Aaron,
I must say your article is really great and have really great stuff which most of developer realize when then feel the pain of debugging someone else code without proper formatting.
Thanks for sharing this.
One more thing in the initial comments you mentioned about defensive database programming. It would be great if you can share some link for it.
Thanks.
Vijay
@sandeep just about everything I've written about here is not about performance, but about creating readable, maintainable and consistent code.
Maintainability and readability wise it is better. Great Read!
great stuff, aaron, just want to know what is the performance advantage of all these formatting stuff
I love to do everything you have specified. I am glad to see someone else that is very meticulous on all these points.
You made a comment with Vern Rabe above that I agree with where you said that with the new line terminator, commas at the beginning of the line to help commenting kinda goes out the window; for though, I tried to switch back (to comma at the end), but quickly went back to commas at the front because for me, since I learned Speed-reading and the concept of Skim and Scan, commas at the front help me to know that THAT particular comma is line break among all the other commas in a complex [column expression] where I can't quickly identify which one is the comma that breaks the line in a multi-line statement.
I also agree with Manoj Pandey when he said that it makes Source Control differences much easier to read.
I also like the [alias] = [field] to match programming styles as an assignment and I like seeing what the output will be first, like this …
SELECT alias = [column expression]
rather than this way
SELECT [column expression] as alias
Few differences I agree with readers, like aligning the datatype and default value. I used to do it, and found it tedious. I wish RedGate did something with that and I would be a happier clam, but that's my OCD.
My priorities are to skim and scan consistently for high-speed reviews and to achieve high code understanding quickly. The cost of time to use a standard should be lower compared to the cost of time to review and achieve understanding.
One more thing to mention:
Matter of fact is that for an inexperienced developer like me, there are tons of various tutorials and classic How-tos on the Web for any given subject. However, when it comes to professional tips for performing a certain task, and how it is done in a professional/production environment, a shortage is felt. This article and any other notes like this, REALLY help and are appreciated.
Thanks for sharing your very useful experience & thoughts.
Hi friends,
You should check this one too.
http://www.bigator.com/2012/03/14/best-practices-tips-while-writing-stored-procedures/
Thanks, ron
Very nice and to the point article.
Thanks. Really very useful information.
COMMAS:
I prefer to add commas at the start of line while specifying columns in SELECT list. Otherwise if you add comma at the end and when some day you add new columns to the SELECT list the Source Control assumes that the existing last column is also modified.
EXISTS():
Even if you use "SELECT * FROM " inside EXISTS() it is not going to retrieve data, will just check existence of records. It ignores the columns list if provided or *. That's why "SELECT 1/0 FROM" also works inside EXISTS().
I've seen many people using "SELECT 1 FROM" or even "SELECT '1' FROM", some some add "TOP 1" to it. So for simplicity I use plain "SELECT * FROM" when using EXISTS().
Hi,
Are there any performance benefits for most of the points mentioned in the article?
Regards,
Ganesh
Great article, thx a lot.
JamieC, I think you took my formatting here way too literally. I took real examples of code I've seen, and corrected only the item mentioned in that section. So for example, in the long table alias section, I corrected the alias but I didn't correct any of the other formatting issues. I tried to do the same thing we like to do when modifying applications and testing them – only change one thing at a time. As for the three and four spaces, that could be a combination of where the tabs were produced, where the original code was taken from, and how Community Server interpreted and displayed them when I wrote this article three years ago. The AS missing from the final procedure is just a miss – that was a new habit at the time that I'm much more consistent about now. BEGIN / END don't get semi-colons for the same reason that BEGIN TRY / END TRY / BEGIN CATCH / END CATCH don't – they're obvious visual block separators (at least when indented and used correctly) and in at least one of those cases a semi-colon as a termintator brings about a syntax error.
The point here was to present a set of individual concepts, not to try to make Community Server turn <pre> HTML into a "you must format exactly this way" doctrine.
Stumbled upon this article while searching for Stored Procedure best practices this morning.
Great article. I couldn't agree more about keeping everything consistent and readable. I think we all sometimes do, or have fallen in the "doing it quickly" mode….
But readability also makes for some much preferable maintenance time if one ever needs to make code changes. I think taking extra time just to read through spaghetti lines is where a waste of ressources often happens.
Great article!
I enjoy reading such style guides and thinking "I do that" and "Ooh no, not that!" etc. As ever with style, no one is ever right or wrong. However, I feel I must point out that you've failed to use consistent formatting in your article 🙂 Sometimes you left-align the `ON` clause with its `INNER JOIN` and other times it is indented; ditto when left-aligning `FROM` and `WHERE` with `SELECT`. Sometimes you indent by four spaces, other times by three. You seem to start a new line after `WHERE` and `ON` only when more than one predicate is involved but single-predicate `HAVING` clauses are always on a new line. Sometimes you use `<table_name> AS <correlation_name>` and sometimes you omit `AS` (though I think you effectively confessed up to this 🙂 p.a. any reason for omitting semicolon terminators on `BEGIN` and `END`?
Naomi, do you mean where should you put a comment block at the beginning of a stored procedure (e.g. to describe what the procedure does)? I think that is quite subjective. Some people prefer it before the CREATE PROCEDURE line but I think that is too prone to be left out accidentally, so I would say it should go somewhere after that. Whether it be before or after the parameters, or within the actual procedure body… I'm not sure. I don't tend to have a big comment block describing a procedure, but rather comment fragments describing hairier pieces of logic within the procedure itself. If it isn't obvious from the name alone what a procedure generally does, then there is probably some other kind of failure going on. 🙂
Aaron,
Do you have a sample of a good comment block in the beginning of the SP? Also, I scanned this article briefly and I didn't notice you mention commenting (I saw it in the comments to this article only).
Thanks so much, extremely useful hints for an SP newbie!
I used to work with someone who put the whole SELECT query on one line due to cutting and pasting to/from one line of code in VB – horrid!! My own queries were so much easier to read, as I did similar layouts to yours.
Could you update us on what you now do differently, please?
There are practical reasons for ending all statments with a semi-colon, prefixing objects with schema name, and sticking to a consistent naming convention. I also agree with the advice about noun/verb naming conventions and not using Hungarian style prefixes.
However, I don't like to think about capitalization rules while I type. Proper Casing object names and applying different capitalization rules to different types of words would be tiresome to enforce, especially if there are multiple T-SQL developers on the project. Sticking to ALL UPPER CASE looks retro '80s. Therefore, I prefer to do everything lower case.
There's lots of good advice in this, but I personally don't recommend this one:
Lining up parameter names, data types, and default values
That looks real pretty until you rename or add one parm that doesn't fit, at which point you have to reformat a lot of lines of code. For me, the benefit just isn't worth it. If someone likes it, that's fine, but they should understand the pitfalls. We have sprocs here with several dozen parms. Reformatting them all is very time consuming for a marginal readability benefit.
Isn't the direct usage of parameter values to be avoided? To avoid parameter sniffing that is.
gold, Gold, GOLD!
Thanks so much for the insights, there's GOLD in them thar hills.
Emtucifor,
For every convention of mine you don't like, you will find ten other users with a different convention that you also don't like. You are never going to get everyone to universally agree on your standards, and that wasn't what I was trying to achieve with this post, either. Conventions are subjective, and different people have different reasons for their preference (it's not always about scan speed, "too-far-to-the-right"ness, etc). The nice thing about T-SQL is that unlike a few archaic languages it is actually very flexible about indenting, whitespace, etc. so you can write your stored procedures your way, and I can write them my way. My conventions are evolving all the time. I wrote this a year ago and I already spot a few things I do slightly differently today. They're probably still not in line with your standards, but since you don't review and approve my code, I'm cool with that.
1. I don't like the stray BEGIN/END in stored procedures any more than you like stray parentheses around the parameter list (which I agree with). I don't care if someone *could* some day get an extra GO in there without noticing–that's an easily correctable problem. What I do care about is not typing more unnecessary junk and not having an extra blank column for every single line in my stored procedure, pushing everything to the right just that much more.
2. Why do you put your JOINs all the way to the left? They don't start a new clause with the same weight as the FROM clause–if they did then your first table ought to be all the way to the left as well. Plus, I think that using up extra vertical space to put the table on the next line ultimately reduces the scannability of the proc, not increases it. Finally, the ON clause should not line up with the table name because it is again another sub-clause that does not have the same weight as the table. So DO indent JOIN, put table name on same line, and indent ON again. Try it in some huge query you have and get used to it a little, and I think you'll like it. One more thought: would you put AND on a line by itself in the WHERE clause? Then why do it for JOIN?
3. Why type OUTER when that's the only brand that LEFT, RIGHT, and FULL come in? I believe in keeping things short when they aren't needed… no one will get confused that it was a LEFT INNER JOIN or a RIGHT UPPER JOIN or something. I do use INNER JOIN and never just JOIN because that one is not unambiguous.
4. I follow similar rules to your AND and OR parentheses preferences, but my parentheses always share the line with the next conjunction like so:
WHERE
(
Condition1
OR Condition2
) AND (
Condition3
OR Condition4
)
With appropriate indents of course. I do the same for joins to derived tables:
5. For single-line main query sections I do move the second line up to join the first line, and move them down to a new line if a second item gets added. I like saving space when it's still clear enough.
WHERE OnlyOneCondition
GROUP BY OnlyOneGroupBy
ORDER BY OnlyOneOrderBy
6. I prefer DELETE TableName and INSERT TableName because again, INTO and FROM are just extra words that reduce scanning speed.
Great piece.
Personally I think what’s important is acknowledge regardless of the little things that are closer to preferences then more commonly accepted Best Practices, is to acknowledge that the wide range use of SQL today makes it impossible to have any 1 set of formatting protocols that is best for all scenarios.
As long as you’ve made a good effort to be as clear as possible with your code, using indentation and case along with proper use of whitespace to make the code more legible and understandable as well providing comments when necessary (and to err on the side of too many comments then too few if you have to err on one side or the other) then you should have produced something that most if not all will be able to read/use even if they have to make a few format changes first to meet their personal preferences.
What is bad is to take the attitude of “that’s dumb” towards something that is more of a personal preference then something that could cause problems. For example if you prefer Tabs but someone else prefers using spaces instead then neither is “dumb” unless done in some environment where using Tab verses Spaces could actually cause some kind of problem.
Several have commented about the use of commas at the beginning of a line verses at the end and how they find commas at the start of the line much easier to work with the code, I like Aaron prefer placing the commas at the end. I know it’s just a personal preference and fault no one for taking the opposite stance so long as they aren’t harsh towards me for mine. As to the point about commas at the end being harder to use because commenting out the last item in the SELECT clause then requires once to remove or move the comma before, I have overcome that by always adding on a new line and at the end of my SELECT clause the text ‘END’ . By doing this I can comment out any one line in my SELECT and not have to make any other edits. Granted if the actual number of columns in my query can change the results then I skip this but more often than not while in a development/testing environment the addition of ‘END’ at the end works fine.
There is another SQL Formatter at http://www.sqlinform.com
Dennis, you still have to change all the tables whether you used VARCHAR(30) or ProductDescriptionType. So what have you gained, except you need developers to now look up how ProductDescriptionType is defined if they want to know the constraints around it (is that VARCHAR or NVARCHAR? does that allow 30 characters, or 32, or 64, or 255, or 2048, or (MAX)?).
Alias types seem like a shortcut from certain angles, but I don't see the benefit at all, sorry.
I don't use types to define the table columns themselves, just in code.
You can delete types that are "in use" by stored procedures. The procedure will have a runtime error until you recreate the type. It's a lot easier than changing a lot of code.
But Dennis, I still don't understand why you think it's a good idea to use a user-defined type for this. Have you ever tried to change a user-defined type that is in use? You can't. You need to change all the tables, procedures, functions etc. that declare the type, change them to use something else, and only then can you drop the type and re-create it. Then go back and change all your code again. Refactoring can be a pain but most of the pain can be avoided with consistent naming and formatting (refactoring tools can help you easily identify all uses, if you can clearly identify what they need to find). But refactoring with a user-defined type adds an order of magnitude of pain, IMHO. I used them in a project once and I vowed to never use them again.
Aaron, there's a typo in my original post.
I meant to say that the typical situation is to write a lot of statements like "DECLARE @desc varchar(30);" – I left the parenthsis out. Were you pointing out that creating a user defined type of "varchar30" is no better? If so, I agree.
In this example, a better name for the type would be something like "ProductDescriptionType".
Dennis, did you mean "Worst Practice"? Or are you being sarcastic? You seem to describe the situation (which is all too common) that makes *not* doing what you suggest the "best practice."
I propose a new Best Practice: Create a user defined type for every column (except foreign keys) in the database and then use them in variable declarations and anywhere else they can be used.
A typical situation: You have a [description] column that is 30 columns long and a lot of T-SQL statement like "declare @descr varchar30;". After a year of using the table requirements change an you need to increase the size of the column. Now the developrs have to find each and every statement and change it. Not my idea of fun.
Great article, I agree with most points, but is is most important to follow a consistant standard that is readable.
I like to use all lower case for keywords and mixed case for tables, columns etc. Also, I prefer the use of as for column aliases.
Example:
select
a.ColumnOne as One,
b.ColumnTwo as Two
from
TableA a
left inner join
TableB b
on
a.ID = b.ForeignID
where
a.Value = @X and
b.Value in ('A', 'B') and
(a.Xyz = 1 or b.Abc = 2)
Great article, Personally I prefer periods as separators rather than underscores. It gives a more familiar feel when I switch between SQL and .NET.
Also, brackets brackets brackets. I always use proper brackets as separators(i.e. [schema].[tablename].[column]) this prevents from any possibility of running into a some obscure keyword and also makes using periods as separators possible. 🙂
All SSMS needs now is the formatting hotkey like Visual Studio has, I just love when I get messy code from a coworker because I just hit ctrl+ED and wala, its just how I like it. If there's a plugin for SSMS that does this please let me know.
-Russ
Interesting read.
1) Didn't realize that the parentheses around proc parameters was a non-standard construct. I've always used it. Not sure I'm going to stop, but nice to know. 🙂
2) oddly enough, I prefer all my built in sql functions to be all capital, but I prefer my datatypes to be lowercase. I also as a matter of practice always have my sql objects be propercase with words ran together such as CustomerStatusCode, but on the other hand all my variables and parameters are all lower case with underscores between words such as @customer_status_code. I agree, that as long as it's consistent and legible for the majority of people, then it probably doesn't matter too much.
3) I'm also a fan of the three line join syntax, yet indented…
<br>
FROM [Schema].[Customer] [c]<br>
INNER JOIN [Schema].[Address] [a]<br>
ON [c].[CustomerID] = [a].[CustomerID]<br>
4) Obviously, I'm not a devout AS alias user. I also do not like and would not want (as Ted T mentions) to maintain code without the INNER, OUTER statements in the joins.
5) I have become a fan of Alexanders not allowing objects in the dbo schema, making schema declaration a lot more important.
6) I have a naming convention of s_ for procs, f_ for functions, v_ for views. I also believe that all table names should be singular. It's a database for crying out loud. That's the purpose of a table, to hold data (meaning plural) So a table would be Customer, not Customers. Now on the other hand, a view can be v_Customers. This naming convention allows me to distinguish easily between a view, table, proc, or function name.
7) I do have exceptions to the rules. Sometimes we have systems which we created, but then have to import source data from somewhere else. That source data is not modified, manipulated, or reformatted at the table level. A lot of times these tables come from Oracle, DB2, Teradata, etc. I will usually mimic the exact table names and field names and case for those tables. But then I will create a view of my own with my own conventions and use that within other code.
8) I like your recommendation against using RETURN for anything except for success/errors.
9) It's going to be a long while before you get me to use statement terminators though…just haven't quite got there yet.
Another Tip re comments
Commenting the start and end of large loops and conditional structures makes the logic more readable. Indenting alone is not as easy to read when the structure is long, nested and complex.
e.g.
IF If @complete = 1
BEGIN
…
END — If @complete = 1
Cracking article!
Must admit, I used to be one of those tblX, txtY, intZ people – maybe it's my age and a hangup from my early days and less visualy pleasant data repositories…
It's funny how simple things like this seem to be missed by the most intelligent of colleagues…
:o)
A best practice I also use is catching null paramneter values as well as setting defaults on values:
The layout is something like this:
CRTEATE PROC usp_example @test int,
@test2 varchar(20)='default'
AS
BEGIN
DECLARE @i int,
@value float
Select @test=isnull(@test,0),
@test2=isnull(@test2,'default'),
@i=0,
@value=0.0
…..
PS: All the indents in the code examples above got striped so now it looks like crud. (So here is 2nd attempt, with extra '.' chars to attempt to keep the whitespace.
.WITH cte AS (
. SELECT t.col1
. ,t.col2
. ,t.col3
. FROM dbo.sometable AS t
.)
.SELECT cte.col1
. ,cte.col2
. ,cte.col3
. ,c.col4
.FROM cte
.INNER JOIN dbo.Customers AS c ON c.CustomerID = cte.col1
.WHERE EXISTS (
. SELECT 1
. FROM dbo.Orders o
. WHERE o.CustomerID = o.CustomerID
.)
.AND c.Status = 'LIVE';
Great Article.
My personal preference is "Commas at the start of line". Not for commenting out lines, as you point out the benefit is marginal. But because it forces me to realise that this line is a continuation from the line above.
Sure, this is less of an issue if you space out your code "extremely vertically" as you've done here. However, to me that reduces readibility as I need to scroll to understand the code.
So to me this is easier to comprehend
WITH cte AS (
SELECT t.col1, t.col2, t.col3
FROM dbo.sometable AS t
)
SELECT cte.col1, cte.col2, cte.col3, c.col4
FROM cte
INNER JOIN dbo.Customers AS c ON c.CustomerID = cte.col1
WHERE EXISTS (
SELECT 1
FROM dbo.Orders o
WHERE o.CustomerID = o.CustomerID
)
AND c.Status = 'LIVE';
====
And if the select lists aren't trivial, then break them up to a column per line ie:
WITH cte AS (
SELECT t.col1
,t.col2
,t.col3
FROM dbo.sometable AS t
)
SELECT cte.col1
,cte.col2
,cte.col3
,c.col4
FROM cte
INNER JOIN dbo.Customers AS c ON c.CustomerID = cte.col1
WHERE EXISTS (
SELECT 1
FROM dbo.Orders o
WHERE o.CustomerID = o.CustomerID
)
AND c.Status = 'LIVE';
I do use the tbl prefix for my tables but it's not because I can't recognize a table for what it is, nor am I afraid anyone else will be scratching their heads.
I use the tbl prefix as a validation hook for RegEx. I figure any string passed in that contains tbl is an injection attempt and I wipe it. I have never yet needed to pass a string value that had a legitimate word with tbl in it.
I know the arguments against allowing strings at all, but sometimes it happens and I'd rather have another hedge available.
great post, I plan to use some of these suggestions at work.
Great list Aaron. Thanks for taking the time to put this down – it's well reasoned and explained (and has generated some useful discussion).
Vern, I guess it goes back to the days where statement terminators were not necessary and very seldom used, so the last line is no longer a concern in that case (and often when testing we add parameters to the end and not the middle of the list). The statement terminator is still not a concern to many people using the current version, but as Michael reminded me, they better start caring soon…
Aside from that, I will confess that I am just guessing as to whether or not that is in fact the most compelling reason for the "leading comma" syntax. Maybe Tom Moreau or others can share their reasoning and enlighten us all? We must be missing something. 🙂
Good list, Aaron. I have never understood, however, why it was considered easier to comment out individual lines when listing the commas at the beginning instead of the end. When at the beginning, you can usually easily comment out all but the first. When at the end, you can usually easily comment out all but the last. And when listing variables you cannot easily comment out two of the columns when the comma is at the beginning if you want to retain the semicolon statement teminator. For instance, in your example of variable declarations/assignments (below), only the @bar line can be easily comment out while retaining the terminator.
DECLARE
@foo INT = 5
,@bar INT = 6
,@x INT = -1;
Vern Rabe
Rob H,
Commenting is probably one of the more subjective topics. Personally I don't think there can be too much commenting unless you have very basic code that really shouldn't need any support. I try my best to make my code as self-documenting as possible (part of this comes from using descriptive object names, parameter names, variable names, etc). I use — when I have one or two lines of comments, and /* */ for anything more.
Nigel,
Again, it is all about preference. I think lower-case keywords look lazy, and I appreciate the upper-case when I am on a server without a proper editor (some of my clients refuse to put client tools on their servers).
I have to say that while I agree with most of what Aaron says here, I have to add my voice to the lowercase keyword camp. While upper case was useful while when we only had monochrome editors, I think it has lost its attraction with colour editors. It just seems like the equivelant of usenet shouting to me. And with all the keywords in lower, I find the Camel case of TheRestOfTheVariables not so overwhelmed.
Nice article.
Any comments about making comments w/in code? How much is to much and when do you use the double-dash (–) line comments as opposed to using the start/stop comment delimiters (/* blah blah… */)?
The next version of SQL Server (that is, after 2008) will not support T-SQL statements that are not semicolon-terminated.
http://msdn.microsoft.com/en-us/library/ms143729.aspx
you might like this…I did
http://www.wangz.net/cgi-bin/pp/gsqlparser/sqlpp/sqlformat.tpl
i must chime in here and do so in all lower case. the point of much of what you speak of in this article centers around readability. personally i find that sql keywords are MORE readable when they are lowercase. because reading sql for the last 15 years has made things more readable to me when keywords take a back seat to object names and typically criteria which i represent most often in mixed case.
okay and there's a critical point you're missing on the tbl prefix. where this really comes in handy for me is in separating fact tables from code or lookup tables. fact tables get a tbl prefix and code tables get a tlkp prefix. this groups them together in mgmt studio and whenever i see a tlkp table embedded deep in a from clause i know exactly what its role is.
we could split hairs here about personal preference all day. like how you structure your from clause. i find your method much less readable than putting the table name first on every line of a from clause unless it's a carry over from the previous line and it would be indented in that case. i can look down the left hand side after my from clause and instantly know exactly what tables are used in that sql statement.
so, again, i think most of what you argue for here is personal preference and i appreciate you sharing your opinion but don't say that the other ways are wrong because they're not. beauty is in the eye of the beholder so whatever an experienced developer finds most readable for him (or within his team) is what should be used.
you do make very valid points on the use of table aliases but that point is a bit elementary. i thinking using the inner and as keywords are a waste of time but if it helps you read code better than you should use it. again, all personal preference but i like a healthy debate….keep 'em coming!!!!
I like the leading comma, I am pretty sure I was the first to start using it several years ago in posts to the usenet (anonymously).
I also don't like the join in three lines. I will break the join into extra lines if there are extra conditions on the join with proper indentation (yes I like Javascript too!).
Case is not important to me because keywords are prettized but I do try to maintain case with the referenced object names.
The object naming standard is mandatory IMHO. TableName_CRUDType is the only way to go.
I am sold on the use of ; as a statement delimiter though.
I review thousands of scripts a month prior to release to production for over a dozen different applications and for the most part they are produced from the Query Designer which means they are formatted for heck. Is there any hope of seeing progress on this front?
Thanks,
Walter
Aaron,
I agree with all your points except your preference of using
SELECT alias = [column expression]
than
SELECT [column expression] as alias
I have two reasons
(1) I am not sure if forst method is supported in all RDBMSs
(2) It looks like update statement where columns are updated
I also pointed out using this
http://sqlblogcasts.com/blogs/madhivanan/archive/2007/11/14/should-alias-names-be-preceded-by-as.aspx
Jeff, my whole point is, use what makes sense. What might be unreadable to you is what someone else likes, and I certainly could say that your stuff is unreadable as well, for my own reasons (I don't like table and column names in all caps, for example, as "MYOTHERTABLE" is really hard to read). I was trying to explain why I do it the way I do it, and highlight some things you might want to think about when you're not the only one maintaining the code that you write.
As some others have echoed, consistency is king; it's not about whether your formatting can beat up my formatting, but rather whether it makes sense to you (and the people you work with).
I agree, SQL Prompt and SQL Refactor can save a lot of time (of typing and formatting). Now the problem is that what to do when the product is not installed on the server. SSMS 2008 IntelliSense can only do so much
Consistency is the key!!
Aaron, great post!
I use RedGate's SQLReformatter whenever I have to look at anyone else's code just to get it into a consistent layout. Because I'm essentially lazy (as I often state in my sessions) I'll write out queries I'm building in a less formal way, and when I get it working I'll use SQLReformatter to lay it out properly. I then paste that code into my stored procedures.
Jeff, 'SELECT *' is bound to get you in trouble when the underlying object changes. Listing out all columns you need in your procedure will prevent problems like that, and also minimize the time necessary to transfer the data to your procedure, because you're only requesting the data you need.
Allen
Jeff:
With a simple example like that it probably makes no difference. Show us some of your more complex code that uses CASE expressions, derived tables, etc.
Some of this code is unreadable to me. why not just issue statements like this
SELECT *
FROM MYTABLE M1
INNER JOIN MYOTHERTABLE M2 ON M1.ID = M2.ID
LEFT OUTER JOIN ANOTHERTABLE M3.ID = M2.ID
WHERE M3.SOMECOLUMN IS NULL
good grief i think this fomatting business is overkill and anal retentive.
I agree that you should have a standard. I don't necessarily agree with everything you do. I don't do keywords in all caps, I do first letter in caps. I rarely spend time in anything but SSMS or Visual Studio and they colorize the key words which, IMO, eliminates the need for caps.
I also prefer the column AS alias instead of alias = column as I prefer to have the actual column names lined up instead of the alias lined up with column names.
I like to do joins this way:
table1 Join
table2 On
pk1 = fk2 Join
table3 On
pk2 = fk3
Although I do see the reason you do it on multiple lines to keep all the key words left aligned.
Lastly I just like I like commas at the end of the line I also like AND's and OR's at the end of the line instead of the beginning. I prefer that my first column names in a Where or Join clause be left aligned.
In reality, I could learn to do it your way. The key is to have a standard and stick with it.
Great article.
There's an on-line sql formatter that has saved me a lot of time: http://www.wangz.net/cgi-bin/pp/gsqlparser/sqlpp/sqlformat.tpl
Of course it won't enforce all coding standards, but it's a good start.
Hi Aaron,
You make a very goood point that can't be repeated often enough: making your code readable is one of the most important -if not THE most important- habit for all developers to get into. And that includes SQL developers.
I agree with most of your points. Some loose thoughts:
* Avoiding parentheses around the parameter list for a stored procedure isn't that hard to explain. Just check BOL and you'll see that they are not even allowed by the documented syntax. 🙂
* I understand your reasons for using "Alias = Expression" instead of "Expression AS Alias" for column aliases. But the other side of the equation is that the latter is more consistent within the query since the "… AS alias" syntax is also used in the FROM clause.
* I do use the shorthand for some keywords. Mainly for outer joins. I find that all my joins line up veryy nicely if I qualify them all with one of "INNER", "LEFT ", "RIGHT", of "FULL ". (Note the extra space after LEFT and FULL!)
* I'm sure you know that I have a different style of query formatting than you do. But I must admit that yours is much more wide-spread – however, I still prefer to have my code laid out as a few neatly aligned columns. But the most important point is of course to have SOME standard and then follow that religiously!
Finally – the link to Alex Kusnetsov's article is broken. There is an extra space (%20) at the end that I had to remove to get to his post.
Thanks for a great post!
Best, Hugo
Yes Whitney, I find it much easier to read. If you have long table names and column names then putting it all on one line makes you read (and sometimes scroll!) horizontally. # of actual lines does not really scare me too much, but I like the visual consistency this convention provides me.
Great post Aaron…
Do you really make all join conditions three lines? I think that might be my only difference of opinion.
I completely agree with the dislike of the "tbl" table prefix. Having worked on a project that was originally coded that way, if I never see it again I will be a happy consultant.
BTW, I didn't mean to sound so harsh, I do agree that declaring a variable just to assign a constant value and then only use it once is silly and pointless. However, if the variable is used many times in a complex stored procedure, or is likely to change the next time you modify / recompile the procedure, I would be less willing to hard-code that constant everywhere, and instead will investigate workarounds if and when they cause a performance problem. Search and replace is easy enough, but still tedious.
Well Jason, unless you are talking about moving the declaration and default value assignment to the parameter list instead of in the body of the procedure, which isn't even always possible (e.g. optional date parameters that default to something relative to CURRENT_TIMESTAMP, or other values that are populated by functions that can't be placed on the default value line), I think that is more of a business requirements question than a "best practices" question. I have seen cases where taking a parameter out of the param list and defining it locally has also had similar effects (e.g. improved performance due to changes in estimation).
Anyway, my point wasn't really to say or imply, "you should use local variables!" But rather, "when I use local variables, here is what I do with them."
"
And if you think any of these are BAD habits, please drop a line and let me know why!
"
Local variables and\or changing the values of parameters could be considered a bad practice. http://www.microsoft.com/technet/prodtechnol/sql/2005/qrystats.mspx#EHKAE
Lot of times it doesn't matter but when it does, it really does.
Thanks Alex, and I have to say, your series on defensive database programming is what inspired me…
Aaron,
This is great! Thank you for sharing. A cannot agree more.
Also I just do not allow objects in dbo schema, so all objects must be qualified. Besides, I did some benchmarking, and apparently a single SELECT even works faster that a series of SETs. Also I think enforcing that all aliases must be preceded by AS prevents another error
SELECT col1 col2 –comma omitted
–so col2 is interpreted as an alias
Thanks again!