T-SQL Tuesday #156 : Production Code
It's hard to come up with a precise definition of what production code looks like. As with a lot of things, "I know it when I see it," but that isn't a very useful thing to share. This month, Tom Zika invites us to talk about production code for T-SQL Tuesday #156. Specifically, Tom asked:
Which quality makes code "production grade"?
In a lot of programming languages, efficiency is almost always the guidepost. Sometimes, minimizing character count or line count is a "fool's gold" measure of the quality of the code itself. Other times, unfortunately, engineers are judged not by quality at all, but rather the sheer number of lines of code produced – where more is, somehow, obviously better. Over my career, "how much code there is" has never been a very meaningful measure in any language.
But I'm here to talk about T-SQL, where certainly efficiency is a good thing to measure – though there are some caveats to that:
- What efficiency means can differ – does it run fast? does it use a minimal amount of memory? does it cause the least I/O? does it take advantage of parallelism? does it minimize logging? does it allow for high concurrency? Speed is often important, but not always – each of these might be the most important measure for any given query.
- Does it do what's expected? – of course code that makes it to production should do what it's supposed to. This is really only an issue where you develop in production.
- Efficiency and accuracy aren't the only thing to measure – for me, production-grade code must be readable and self-documenting, and I can share many examples (and counter-examples) that can make it so (or not). Styles can vary so don't use my "better" example as what you have to do; just take a subjective look at the differences between these 5 examples of "bad" and "better."
Is it smart about white space?
Nobody wants to scroll 14 screen widths to the right to find a table name or a join condition. Make it easy to read!
SELECT a.OrderID, ProductID FROM OrderDetails a INNER JOIN Orders b ON a.OrderID = b.OrderID WHERE OrderDate >= '2022-01-01' AND Quantity >= 32
SELECT o.OrderID, od.ProductID FROM dbo.OrderDetails AS od INNER JOIN dbo.Orders AS o ON od.OrderID = od.OrderID WHERE o.OrderDate >= '20220101' AND od.Quantity >= 32;
I talk about being generous with whitespace in My stored procedure "best practices" checklist, way back in 2008.
Does it use unreferenced columns?
I often see people use aliases like
c instead of something that reflects the entities they point to, and I also often see aliases only included on columns that require them. Always using sensible aliases makes it much easier for the next reader to parse and understand your query.
SELECT a.OrderID, ProductID FROM dbo.OrderDetails AS a INNER JOIN dbo.Orders AS b ON a.OrderID = b.OrderID WHERE a.Quantity > 32;
SELECT o.OrderID, od.ProductID FROM dbo.OrderDetails AS od INNER JOIN dbo.Orders AS o ON od.OrderID = o.OrderID WHERE Quantity > 32;
I bring this up over and over again because I see it every day in questions on on Stack Overflow, and it always requires further prodding from the user (which table does
columnX come from?). I talk about aliasing in more detail in Bad Habits to Kick : Using table aliases like (a, b, c) and Bad Habits to Kick : Inconsistent table aliasing.
Does it use ambiguous date formats?
Regional date formats are bad news. When I see this kind of thing in code, I remind folks that it might work in their current language or regional settings, but it can break for others (even if, currently, nobody using your code lives outside of your region or speaks any other language). Typically I see American (
M/d/y) or British (
d/M/y) formats, but even
yyyy-MM-dd is a problem – only
yyyyMMdd is safe. This doesn't affect the
date data type, but if you can't trust it for one data type, why use it for any?
SET LANGUAGE British; GO -- fails: DECLARE @d datetime = '1999-05-13'; SELECT DATEPART(MONTH, @d); GO -- wrong month: DECLARE @d datetime = '1999-05-09'; SELECT DATEPART(MONTH, @d); GO
SET LANGUAGE British; GO /* succeeds: */ DECLARE @d datetime = '19990513'; SELECT DATEPART(MONTH, @d); GO /* succeeds: */ DECLARE @d datetime = '19990509'; SELECT DATEPART(MONTH, @d); GO
This too I see every day in questions on Stack Overflow, and when someone says "range from 01/01/2022 to 02/01/2022" it can make the requirement confusing. Do you want Jan 1st to Jan 2nd, or Jan 1st to Feb 1st? Do I need to check your profile to figure out where you live? Lots more on date handling in Dating Responsibly.
Does it use shorthand to save characters?
My favorite examples of this are the abbreviations for date parts. People will type
YYYY instead of
YEAR, and it makes no sense to me – if you're going to type four characters, why not type the word that self-documents itself? Or
DD instead of
DAY, less readable for the sake of one or two whole characters. Anyway, this example should illuminate why shorthand is a problem here (and if you can't use shorthand for
WEEK, why use it anywhere else?):
SELECT DATEPART(Y, '20221108'), DATEPART(W, '20221108');
SELECT DATEPART(YEAR, '20221108'), DATEPART(WEEK, '20221108');
I see this in answers every day on Stack Overflow, and I feel like it's just the wrong message to send to people reaching out for help. Even seasoned pros who love shortcuts don't always understand the shorthand:
Is it cute or clever for the sake of it?
PARSENAME – a handy function against actual metadata and IP addresses, for sure – misused all the time. People use it to split strings in other forms as well, by replacing some delimiter with
. first. It is dangerous to recommend without disclaimers about the length supported; as soon as one of the elements exceeds 128 characters,
PARSENAME returns NULL. I'm guilty of shoehorning
PARSENAME where it doesn't belong to be clever, but it's actually only useful in a very limited number of cases, and wouldn't put it in production without comments/disclaimers.
Basically, if you have to explain a piece of T-SQL to someone who isn't brand new to T-SQL, it's probably too cute, too clever, or too complicated. If it's complicated out of necessity – or you're doing something unpopular, like starting a CTE with a statement terminator – comments are your friend.
These are just a few of the kinds of things I consider during code review before letting T-SQL head to production. I have plenty more of these that, admittedly, not everyone agrees with; for further reading, see my Bad Habits index.