T-SQL Tuesday #156 : Production Code
It's hard to come up with a precise definition of what production code looks like. As with a lot of things, "I know it when I see it," but that isn't a very useful thing to share. This month, Tom Zika invites us to talk about production code for T-SQL Tuesday #156. Specifically, Tom asked:
Which quality makes code "production grade"?
In a lot of programming languages, efficiency is almost always the guidepost. Sometimes, minimizing character count or line count is a "fool's gold" measure of the quality of the code itself. Other times, unfortunately, engineers are judged not by quality at all, but rather the sheer number of lines of code produced – where more is, somehow, obviously better. Over my career, "how much code there is" has never been a very meaningful measure in any language.
But I'm here to talk about T-SQL, where certainly efficiency is a good thing to measure – though there are some caveats to that:
- What efficiency means can differ – does it run fast? does it use a minimal amount of memory? does it cause the least I/O? does it take advantage of parallelism? does it minimize logging? does it allow for high concurrency? Speed is often important, but not always – each of these might be the most important measure for any given query.
- Does it do what's expected? – of course code that makes it to production should do what it's supposed to. This is really only an issue where you develop in production.
- Efficiency and accuracy aren't the only thing to measure – for me, production-grade code must be readable and self-documenting, and I can share many examples (and counter-examples) that can make it so (or not). Styles can vary so don't use my "better" example as what you have to do; just take a subjective look at the differences between these 5 examples of "bad" and "better."
Is it smart about white space?
Nobody wants to scroll 14 screen widths to the right to find a table name or a join condition. Make it easy to read!
SELECT a.OrderID, ProductID FROM OrderDetails a INNER JOIN Orders b ON a.OrderID = b.OrderID WHERE OrderDate >= '2022-01-01' AND Quantity >= 32
SELECT o.OrderID, od.ProductID FROM dbo.OrderDetails AS od INNER JOIN dbo.Orders AS o ON od.OrderID = od.OrderID WHERE o.OrderDate >= '20220101' AND od.Quantity >= 32;
I talk about being generous with whitespace in My stored procedure "best practices" checklist, way back in 2008.
Does it use unreferenced columns?
I often see people use aliases like
c instead of something that reflects the entities they point to, and I also often see aliases only included on columns that require them. Always using sensible aliases makes it much easier for the next reader to parse and understand your query.
SELECT a.OrderID, ProductID FROM dbo.OrderDetails AS a INNER JOIN dbo.Orders AS b ON a.OrderID = b.OrderID WHERE a.Quantity > 32;
SELECT o.OrderID, od.ProductID FROM dbo.OrderDetails AS od INNER JOIN dbo.Orders AS o ON od.OrderID = o.OrderID WHERE Quantity > 32;
I bring this up over and over again because I see it every day in questions on on Stack Overflow, and it always requires further prodding from the user (which table does
columnX come from?). I talk about aliasing in more detail in Bad Habits to Kick : Using table aliases like (a, b, c) and Bad Habits to Kick : Inconsistent table aliasing.
Does it use ambiguous date formats?
Regional date formats are bad news. When I see this kind of thing in code, I remind folks that it might work in their current language or regional settings, but it can break for others (even if, currently, nobody using your code lives outside of your region or speaks any other language). Typically I see American (
M/d/y) or British (
d/M/y) formats, but even
yyyy-MM-dd is a problem – only
yyyyMMdd is safe. This doesn't affect the
date data type, but if you can't trust it for one data type, why use it for any?
SET LANGUAGE British; GO -- fails: DECLARE @d datetime = '1999-05-13'; SELECT DATEPART(MONTH, @d); GO -- wrong month: DECLARE @d datetime = '1999-05-09'; SELECT DATEPART(MONTH, @d); GO
SET LANGUAGE British; GO /* succeeds: */ DECLARE @d datetime = '19990513'; SELECT DATEPART(MONTH, @d); GO /* succeeds: */ DECLARE @d datetime = '19990509'; SELECT DATEPART(MONTH, @d); GO
This too I see every day in questions on Stack Overflow, and when someone says "range from 01/01/2022 to 02/01/2022" it can make the requirement confusing. Do you want Jan 1st to Jan 2nd, or Jan 1st to Feb 1st? Do I need to check your profile to figure out where you live? Lots more on date handling in Dating Responsibly.
Does it use shorthand to save characters?
My favorite examples of this are the abbreviations for date parts. People will type
YYYY instead of
YEAR, and it makes no sense to me – if you're going to type four characters, why not type the word that self-documents itself? Or
DD instead of
DAY, less readable for the sake of one or two whole characters. Anyway, this example should illuminate why shorthand is a problem here (and if you can't use shorthand for
WEEK, why use it anywhere else?):
SELECT DATEPART(Y, '20221108'), DATEPART(W, '20221108');
SELECT DATEPART(YEAR, '20221108'), DATEPART(WEEK, '20221108');
I see this in answers every day on Stack Overflow, and I feel like it's just the wrong message to send to people reaching out for help. Even seasoned pros who love shortcuts don't always understand the shorthand:
Is it cute or clever for the sake of it?
PARSENAME – a handy function against actual metadata and IP addresses, for sure – misused all the time. People use it to split strings in other forms as well, by replacing some delimiter with
. first. It is dangerous to recommend without disclaimers about the length supported; as soon as one of the elements exceeds 128 characters,
PARSENAME returns NULL. I'm guilty of shoehorning
PARSENAME where it doesn't belong to be clever, but it's actually only useful in a very limited number of cases, and wouldn't put it in production without comments/disclaimers.
Basically, if you have to explain a piece of T-SQL to someone who isn't brand new to T-SQL, it's probably too cute, too clever, or too complicated. If it's complicated out of necessity – or you're doing something unpopular, like starting a CTE with a statement terminator – comments are your friend.
These are just a few of the kinds of things I consider during code review before letting T-SQL head to production. I have plenty more of these that, admittedly, not everyone agrees with; for further reading, see my Bad Habits index.
That ambiguous dates section was an eye opener! I had no idea that using the separator in the date would make it behave that way. I tried your code with datetime2 to see if it would behave in the same way, but it appeared to fix the issue (same with the dynamic SQL in your dating responsibly article), do you know why?
I feel vaguely vindicated that I use yyyyMMdd regardless, at least. But now I have an actual reason for why I do! 🙂
Thanks for a great article.
Back in the before times, when SQL Server only had
smalldatetime, Microsoft made the curious choice to interpret string literals in ISO format as
yyyy-dd-MM, depending on language, with the latter being the case for the majority of the languages supported by the product. I can't explain why they did that, because none of the people in any of those countries have ever used
yyyy-dd-MMas a valid date format. But this was relatively unknown because
When they introduced the new date-related types (
datetimeoffset), they seized the opportunity to fix that inconsistency. They made the interpretation invariant to culture, and always
But this is a double-edged sword because, while I should be pushing people to using the more modern types (and by extension allowing the more readable format), the truth is that people will continue using the older types for at least the rest of my lifetime. I don't like writing code that needs asterisks or will break for some people, so I always use
yyyyMMddor, when time is required,
yyyy-MM-ddThh:mm:ss[.nnnnnnn]. It's hard to justify any other format in production code.
It is often difficult to get developers to document their code, and even more difficult to get any style "buy in" for SQL DBAs. I come from a 40 year, MIL-STD indoctrinated background, where every line of code was reviewed, tested, and nothing longer than 50-60 lines was allowed. Multi million dollar aircraft bring that sort paranoia, and rightfully so. So much of software today is deployed without rigorous testing, since the quickest to deploy a new solution gains the higher profits. I find it refreshing that more and more often I am seeing commitment to these types of guides.
Some very nice topics and examples here, thank you!
In many ways, what we're really talking about here is… maintainability of code.
I think when looking at "Does it use ambiguous date formats?", in both the Bad and Better examples, the comments (e.g.
--fails) are in the wrong place. It looks like the comment belongs to the
GO(or perhaps to the statement above the
GO). It would be better to have the comment on a separate line, and for there to be a line space above it. Also it might be better to let the
GOlive with the statement it applies to. Finally, the last
GOis missing (it's important to be consistent, people might copy this template approach).
I know it's very easy to criticize, but hopefully these suggestions are helpful.