Bad Habits to Kick : Using loops to populate large tables
See the full index.
Today I'd like to talk about using primitive loops to generate a large number of rows. Okay, I'll admit, in most cases you are doing this for a demo or proof of concept, and so speed / performance is not all that important. But using a loop to populate a table with 1 million rows, one row at a time, is not fun at all, no matter how much thumb-twiddling you are willing to do.
I often see people building up test cases like this:
CREATE TABLE dbo.foo(bar int); DECLARE @i int, @UpperLimit int; SELECT @i = 1, @UpperLimit = 1000000; WHILE @i <= @UpperLimit BEGIN INSERT dbo.foo(bar) VALUES(@i); SET @i = @i + 1; END
For 500 or 50,000 rows, this probably wouldn't be a big deal. But when you get into the bigger numbers, you are going to be spending a long time waiting for the inserts to finish. So how do you get around it?
I have long been an advocate of an auxiliary numbers table. It makes quick work of generating dummy data, does not require a lot of storage space, and is easy to join against for various reporting-type functions. Let's say you create a numbers table with 1,000,000 rows, then you can change the above code to:
SET NOCOUNT ON; CREATE TABLE dbo.foo(bar int); DECLARE @UpperLimit int = 1000000; INSERT dbo.foo(bar) SELECT [Number] FROM dbo.Numbers WHERE [Number] BETWEEN 1 AND @UpperLimit;
However, there are some who fundamentally object to adding a table to their schema, even if they put it in some utility database.
Thankfully, there are some clever workarounds to this that let you construct a numbers table on the fly. Here is one that I have used in previous posts:
SET NOCOUNT ON; DECLARE @UpperLimit int = 1000000; ;WITH n AS ( SELECT x = ROW_NUMBER() OVER (ORDER BY s1.[object_id]) FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2 CROSS JOIN sys.all_objects AS s3 ) SELECT [Number] = x INTO #Numbers FROM n WHERE x BETWEEN 1 AND @UpperLimit; GO CREATE UNIQUE CLUSTERED INDEX n ON #Numbers([Number]);
And here is one using a set of CTEs and no system views, based on code I stole from Itzik Ben-Gan at some point:
SET NOCOUNT ON; DECLARE @UpperLimit int = 1000000; ;WITH n5 (x) AS (SELECT 1 UNION SELECT 0), n4 (x) AS (SELECT 1 FROM n5 CROSS JOIN n5 AS x), n3 (x) AS (SELECT 1 FROM n4 CROSS JOIN n4 AS x), n2 (x) AS (SELECT 1 FROM n3 CROSS JOIN n3 AS x), n1 (x) AS (SELECT 1 FROM n2 CROSS JOIN n2 AS x), n0 (x) AS (SELECT 1 FROM n1 CROSS JOIN n1 AS x), Nbrs (x) AS ( SELECT ROW_NUMBER() OVER (ORDER BY x) FROM n0 ) SELECT [Number] = x INTO #Numbers FROM Nbrs WHERE n BETWEEN 1 AND @UpperLimit; GO CREATE UNIQUE CLUSTERED INDEX n ON #Numbers([Number]);
Another alternative is the recursive CTE.
SET NOCOUNT ON; DECLARE @UpperLimit int = 1000000; ;WITH n(x) AS ( SELECT x = 1 UNION ALL SELECT x + 1 FROM n WHERE x < @UpperLimit ) SELECT [Number] = x INTO #Numbers FROM n OPTION (MAXRECURSION 0); -- or <= 32767 if you need fewer rows GO CREATE UNIQUE CLUSTERED INDEX n ON #Numbers([Number]);
If you are going to continue using a loop, at the very least, please get in the habit of adding
SET NOCOUNT ON; statements to your code. This way, at least, that big loop won't be wasting all that effort passing
1 row(s) affected to the client n times, and the client won't be wasting all that effort printing it out to you n times.
Now, don't get me wrong; I am not saying that loops are a bad thing. In fact, for a lot of things, they are the right answer, and for some things they are the only answer. But in cases where things don't have to happen one at a time, they shouldn't happen one at a time.
See the full index.