For the first T-SQL Tuesday in 2023, Raul Gonzalez invites us to talk about cases where we have knowingly implemented worst practices.
Well, I have done it a lot. Most of the posts in my bad habits series are cautionary tales based on my own "learning the hard way." There are always trade-offs with doing something correctly – maybe proper design is less efficient, or takes longer to write, or has to pass more checks. Over time, though, you start getting a feel for where it makes sense to cut these corners, and where it doesn't.
At Stack Overflow, engineers spent a decade optimizing for one thing: page load time. This isn't a knock, and it's hard to argue with the results. David Fullerton talks about creative shortcuts in this keynote and Roberta Arcoverde talks about how the architecture just works in this HanselMinutes podcast.
But from the database side, this definitely seems like a one-sided goal that can potentially introduce technical debt.
For example, in nearly famous fashion, the core application supporting our Q & A sites does not use a single stored procedure. We are still tied to this Jeff Atwood advice, circa 2004. Sure, embedding all queries in the application code means quicker delivery and less gatekeeping and, in modern versions of SQL Server, doesn't perform any worse than a stored procedure would. But it can make troubleshooting very complex from our perspective, since the code is generated dynamically in C# code that is often less grok-able to database folk. And when we find opportunities for improvement, all we can do is pass them on to another team and wait for them to check the changes in and deploy.
As another example, and something that actually just came up this week, we don't always have foreign keys in our Q & A schema where you might expect them. The schema was designed long before any of us arrived, so I don't know the actual reason(s) behind not implementing explicit relationships in some cases. But I can imagine some of the objections:
- overstatement of the performance impact of write-time validation
- understatement of the performance benefit of trusted constraints
- unwillingness to consider additional indexes to better support joins (which is constant whether there is an explicit key or not, I'm just repeating what I've heard)
- overconfidence in the application's ability to maintain perfect state and handle exceptions
But this means referential integrity needs to be handled by the app, or triggers, or both – which leads to splintering of business logic. And as I said in the (internal Slack) thread:
Personally, I want the constraint as close to the data as possible.
I can never trust the app to manage and validate a relationship, because I know the app isn’t always the thing that’s touching the data.
And someone asked if we were going to go back and add the constraints. My simple response:
If it ain't broke, don't break it.
I'm not against adding them, of course. But I know that doing so now carries some risk, and it doesn't necessarily solve a tangible performance problem we currently have. One thing Andy mentioned we could do is to define the foreign keys in the schema, and enable them in dev and local testing environments, and just have them be disabled in production. This means that at least during development the constraint will actively identify flaws in logic, without risking stability in production. But then production doesn't get to benefit from the upside.
It's hard, right? This is why nothing in SQL Server is 100%. And sometimes you just smile and nod, and live with justifiable worst practices, because fixing them might be more painful.