T-SQL Tuesday #151 : Controversial Coding Conventions
For this month's T-SQL Tuesday, Mala Mahadevan asks us to talk about coding standards:
I decided to take it one step further, and talk about controversial coding standards. I could take the easy way and direct you at existing content expressing my opinions about tabs vs. spaces, table and column aliases, starting CTEs with
;WITH, or leading commas. I haven't talked about that last one, but Benn Stancil seems to have written exactly what I would have, albeit in a much more poetic way.
Instead, let's talk about…
CONVERT vs. CAST
I wrote recently about how T-SQL lets you write things very differently and still get the same answer – without compromising logic or performance.
I see a lot of people using
CAST(col AS something) and it makes me wonder why they choose that syntax over
CONVERT(something, col). In this twitter thread, I mentioned how
CAST doesn't let you influence the conversion in any way, with vague hand-waving about this very specific example. Greg Low responded:
I use CAST always unless I need a style value
The thought that immediately came to mind is, "I wear two left shoes unless I need to dance."
Maybe there are reasons other than the loose "
CAST is more standard" arguments we sometimes cling to, but I think that, for the most part, we cling to those artificially. More on that in a bit.
A realistic example
Imagine I have dozens or hundreds of procedures that say:
CREATE PROCEDURE dbo.something @date_string varchar(30) AS SELECT CAST(@date_string AS datetime);
They're this way because they accept a string parameter, using
yyyy-MM-dd format, from one of many datepickers or calendar controls in some JS or C# library I can’t change. The
CAST is there because the author learned from some other part of the codebase (or from online examples or from a different platform) to use
CAST instead of
CONVERT. This works fine:
EXEC dbo.something @date_string = '2022-05-31';
But yyyy-MM-dd isn't universally safe
A new user comes onboard, and starts writing unit tests against those procedures. They are based in Germany, or France, or England, or wherever, and have their language settings configured accordingly. So they run:
SET LANGUAGE British; EXEC dbo.something @date_string = '2022-05-31';
The conversion of a varchar data type to a datetime data type resulted in an out-of-range value.
Whoops! This language happens to interpret
yyyy-dd-MM (as do more than half of the languages SQL Server supports).
No, it doesn't make any sense, because no human would ever look at, say,
2022-06-12 and think, "December 6th!" But this is how SQL Server works. And it could happen in other scenarios, too, like failing over to a different data center, building a new server, or someone adding a contradictory
SET DATEFORMAT command somewhere.
Use datetime instead?
I bet you're thinking, "why not just change the parameter to
datetime?" Arguably, sure, the argument never should have been a string in the first place:
ALTER PROCEDURE dbo.something @date_string datetime AS SELECT CAST(@date_string AS datetime);
But remember, we can't change the caller; it's still going to pass its unfortunate
SET LANGUAGE British; EXEC dbo.something @date_string = '2022-05-31';
Whoops! Now it just gets interpreted incorrectly at a different spot:
Error converting data type varchar to datetime.
We could take advantage of the fact that
date isn't subject to this bizarre interpretation, but that doesn't help us if the application ever passes in time along with the date. Or
datetime2, but maybe the library doesn't support the newer types or just can't be updated to pass a different type. So we may be stuck with a string parameter anyway.
Instead of these fixes, if I know the incoming format is
yyyy-MM-dd, I can keep the incoming string parameter as is, and use the much more versatile
CONVERT with a style number:
ALTER PROCEDURE dbo.something @date_string varchar(30) AS SELECT CONVERT(datetime, @date_string, 23);
If the datepicker or library changes and passes a different format, I can just update the style number (see the full list here). For example, if time is included, I can use
120 instead of
23. If the input changes to
MM/dd/yyyy, I can use
101. If it's
dd/MM/yyyy, I can use
103. And so on.
In the meantime, I should be seeking out a new datepicker that uses an unambiguous string (or can use proper, strongly-typed parameters), or submitting a pull request to the one I'm using. Because ultimately I want to stop passing strings I have to convert at all.
CONVERT avoids chaos
CAST to interpret values consistently is a sloppy mess, and not very friendly to people who don’t yet know when they should use
CONVERT and when they can get away with its pretty cousin. So, my suggestion: just always use
CONVERT from the start, even in cases where you won't potentially need a style number later. Then your code is consistent and you're not suddenly changing styles on the fly or only in specific scenarios.
And about the artificial clinging…
Someone once told me that they preferred
CAST because it is fewer characters to type. While generally I think these arguments are totally bogus anyway, I immediately reminded them they also have to type
AS. There went that theory.
Greg in the thread above actually likes that it stands out when a style had to be used:
I also rarely use a style and I don't mind at all that the situations where I do so, stand out as different. Every time I use convert, I know a style was used.
Okay, fair enough; to each their own. Personally, I prefer consistency.
Everyone else claims portability:
"What if we switch to PostgreSQL or MySQL or Oracle or <NoSQL du jour>? I'll have to change all of my code that uses proprietary SQL Server syntax!"
CASTis also unsupported in several destination platforms anyway.
- In my career I have only seen two true migrations away from SQL Server to something incompatible.
- In both of those cases, syntax like
CONVERTwas the least of their worries; the application was entirely rewritten anyway.
I'd love to hear about a real scenario where you have avoided any and all proprietary syntax in SQL Server just in case you decided to switch platforms later (or where you used proprietary syntax and this was the blocker). It's just not that common, and when it happens, there's a lot to tackle in terms of updating all the queries (and metadata, and monitoring tools, and drivers, and so on). I feel like
IDENTITY columns and
EOMONTH are much bigger and more widespread blockers than
TO_CHAR) is probably the simplest among them, and avoiding that work simply isn't going to change the project timeline.
Just one example from recent memory: Have you ever tried to get WordPress up and running with SQL Server as the backend? I tried a while back and the experience convinced me I will never try again. Those are all allegedly simple CRUD-style queries, but even that is a middleware nightmare. Now imagine something even remotely complex. Shudder.