Bad habits to kick : using SELECT * / omitting the column list
In my last post in this series, I talked about inappropriately using SELECT, OUTPUT and RETURN in stored procedures. Today I wanted to talk about using SELECT * or omitting the column list entirely.
Using SELECT *
This is a typical operation when developing, debugging or testing, and I have no qualms about its use there. But there are several reasons why you should avoid SELECT * in production code:
- You can be returning unnecessary data that will just be ignored, since you don't usually need every single column. This is wasteful in I/O, since you will be reading all of that data off of the pages, when perhaps you only needed to read the data from the index pages. It is also wasteful in network traffic and in many cases the memory required by the consuming application to hold the results.
- When you use SELECT * in a join, you can introduce complications when multiple tables have columns with the same name (not only on the joined columns, such as OrderID, which are typically the same, but also peripheral columns like CreatedDate or Status). On a straight query this might be okay, but when you try to order by one of these columns, or use the query in a CTE or derived table, you will need to make adjustments.
- While applications should *not* be relying on ordinal position of columns in the resultset, using SELECT * will ensure that when you add columns or change column order in the table, the shape of the resultset should change. Ideally, this should only happen intentionally.
A major roadblock to using the explicit column list is
laziness efficiency. For those of you using IntelliSense in SQL Server 2008, or 3rd party tools like SQL Prompt, this is much less likely to be a good excuse, since the columns can appear for you in a drop down list (and in some cases you can set up keystrokes that will change * to the explicit column list). For those of you that don't use 3rd party tools, are not using SSMS 2008 against SQL Server 2008 instances, or have become so frustrated with IntelliSense that you've turned it off, here's another little tip that is seemingly obscure: you can grab the column list easily within SSMS, and drag it into the query window. Just open Object Explorer, expand the table or view name, click on the "Columns" node and drag it onto the query window:
Voila! The column list appears for you… it is probably not in the format you want, since they are just separated by commas and listed out horizontally. But reformatting that list sure beats typing it out yourself.
Using SELECT * in a view
Some people adopt the misconception that they can avoid maintenance down the road by simply using SELECT * in a view. Then they can change the base table and not worry about updating the view. Let's try this at home, shall we? Assuming SQL Server 2008:
USE [tempdb]; GO CREATE TABLE dbo.foo ( a INT, b INT, c INT ); GO INSERT dbo.foo(a, b, c) SELECT 1, 2, 3; GO CREATE VIEW dbo.view_foo AS SELECT * FROM dbo.foo; GO ALTER TABLE dbo.foo ADD d INT; GO SELECT d FROM dbo.view_foo; GO -- Error message : Invalid column name 'd'. UPDATE dbo.foo SET d = 4; GO ALTER TABLE dbo.foo DROP COLUMN c; GO SELECT a, b, c FROM dbo.view_foo; GO -- Even though c is no longer in dbo.foo, this works! -- But it returns 4 (the data for d) under column c. EXEC sys.sp_refreshsqlmodule N'dbo.view_foo'; GO -- you can use sys.sp_refreshview on older versions. SELECT * FROM dbo.view_foo; GO -- Now this returns columns a, b and d as we expect. DROP VIEW dbo.view_foo; DROP TABLE dbo.foo;
So obviously, it is not a good idea to use SELECT * in a view, and rely on changes to the underlying table(s) to be reflected in the view and its results. The person making the changes has to know and understand that the view must be refreshed in order to ensure correct results. When you use an explicit column list, there is no ambiguity caused by metadata changes, and when you drop a column, the view will break immediately instead of potentially returning bogus data in the wrong column.
Using INSERT … SELECT
A lot of times we build copies of our tables or otherwise populate a table with data from another table with the same structure. Depending on property columns, it can be tempting to use code like this:
INSERT dbo.foo SELECT a, b, c FROM dbo.bar; -- or even worse: INSERT dbo.foo SELECT * FROM dbo.bar;
However, if this kind of syntax is used in production code, take care to populate the column lists once. This will prevent you from being bitten later when the source table has a column added, or column order is changed in either table. By explicitly defining what you are selecting and where you are putting it, you insulate yourself from these types of metadata changes:
INSERT dbo.foo ( a, b, c ) SELECT a, b, c FROM dbo.bar;
Now, the code will break if you drop or rename a column, but since that is less frequent and because you *should* be notified of that problem immediately, I think that's okay.
Using SELECT INTO
Similar to the above case, we often build copies of our tables using syntax like:
SELECT * INTO dbo.foo FROM dbo.bar;
I'm actually okay with this usage, mostly because it is used in an ad hoc fashion, and because it will take into account metadata changes that have happened in the meantime anyway (so if the source table has changed, the destination table will have changed the exact same way). Just keep in mind a few things when you use SELECT INTO: you *will* get certain table elements such as IDENTITY property; you will *not* get extensions to the table such as indexes and foreign keys; and, you cannot dictate which filegroup or partition scheme the destination table will belong to.
As demonstrated above, it is very easy to avoid SELECT *, and there are plenty of good reasons to do so. All of which trump the "efficiency" factor, IMHO.
I am working on a series of "Bad habits to kick" articles, in an effort to motivate people to drop some of the things that I hate to see when I inherit code. Up next: avoiding the schema prefix.