Processing a list of integers : my approach
It is a common problem… you need to pass multiple values into a stored procedure and, as a result, update multiple rows. Let's pretend we have a table of students:
CREATE DATABASE School; GO USE School; GO CREATE TABLE dbo.Students ( StudentID INT PRIMARY KEY, FullName NVARCHAR(64), ModifiedDate SMALLDATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ); INSERT dbo.Students(StudentID, FullName) SELECT 1, 'Aaron Bertrand' UNION ALL SELECT 2, 'Kalen Delaney' UNION ALL SELECT 3, 'Adam Machanic' UNION ALL SELECT 4, 'Denis Gobo' UNION ALL SELECT 5, 'Louis Davidson';
Now we let the user click different checkboxes or select multiple items in a multi-select on a web form, and as a result we can pass a comma-separated list back to the database quite easily. (Imagine the procedure is doing other things to this set, but to keep it simple, we are just going to update the ModifiedDate column.)
But once you have the comma-separated list in the database, what do you do next? Most users' first attempt is something like this:
CREATE PROCEDURE dbo.UpdateStudents @List VARCHAR(MAX) AS BEGIN SET NOCOUNT ON; UPDATE dbo.Students SET ModifiedDate = CURRENT_TIMESTAMP WHERE StudentID IN (@List); END GO EXEC dbo.UpdateStudents @List = '1,3,5';
But because IN() is expecting the predicate data type to match the column (or to be a column name or independent variable), you get this error:
Conversion failed when converting the varchar value '1,3,5' to data type int.
Next, people will often try to use dynamic SQL, but then you will have to go to extra lengths to thwart SQL injection, and you won't know for sure if your values are all integers, or even if other invalid characters crept into your list:
ALTER PROCEDURE dbo.UpdateStudents @List VARCHAR(MAX) AS BEGIN SET NOCOUNT ON; DECLARE @sql NVARCHAR(MAX); SET @sql = 'UPDATE dbo.Students SET ModifiedDate = CURRENT_TIMESTAMP WHERE StudentID IN (' + @List + ');'; EXEC master.dbo.sp_executeSQL @sql; END GO EXEC dbo.UpdateStudents @List = '1,3,5,junk_foobar';
Since there is no column named
junk_foobar, you get this error:
Invalid column name 'junk_foobar'.
Folks like –CELKO– will tell you that you should pass each value individually into independent parameters. His reasoning: you get data type validation "for free," and the only string splitting mechanisms he has been exposed to do not handle validation (though many do). The problem with Celko's solution is that it is neither scalable nor maintainable. Sure, it might be okay with only a handful maximum StudentIDs supported, as follows:
ALTER PROCEDURE dbo.UpdateStudents @Student1 INT, @Student2 INT = NULL, @Student3 INT = NULL, @Student4 INT = NULL, @Student5 INT = NULL, @Student6 INT = NULL, @Student7 INT = NULL, @Student8 INT = NULL, @Student9 INT = NULL, @Student10 INT = NULL AS BEGIN SET NOCOUNT ON; UPDATE dbo.Students SET ModifiedDate = CURRENT_TIMESTAMP WHERE StudentID IN ( @Student1, @Student2, @Student3, @Student4, @Student5, @Student6, @Student7, @Student8, @Student9, @Student10 ); END GO EXEC dbo.UpdateStudents @Student1 = 1, @Student2 = 3, @Student3 = 5;
Even at 10 StudentIDs, that code is really ugly. But what if you want to be able to update 100, 500, or 5,000 at a time? Do you want me to post a code sample that shows a stored procedure with 500 of those parameters defined? You probably don't even want to see it, never mind manage it. The technical limit for the maximum number of parameters for a stored procedure in SQL Server 2005 and SQL Server 2008 is 2,100 (reference). The practical limit, meaning the most you would *want* to have, I can easily presume is much lower (at least among the sane). Who wants to maintain a stored procedure with 2,100 parameters? And what about the application code that now has to know that the maximum number of parameters is n, and then has to destruct the full list and make multiple calls to the stored procedure, one for each full or partial set of n parameters that has been broken out? Yes, validation is free in this case; but everything else about this solution is expensive. In any case, in spite of Celko's delusions of the contrary, you can split a set of comma-separated integers in a much more elegant fashion, and have validation as well. The rest of this post will describe the method I favor.
Populate a Numbers table
First things first; if you don't already have a numbers table in place, you should create one, even if you don't plan to implement anything else in this post. (This auxiliary table can be immensely useful; I've been writing about its virtues for years, and still have a hard time convincing people that one extra table is not the end of the world as we know it.) The code below will create a numbers table and populate it for you. On my system, one cross join yields 16,129 rows, and two cross joins yield 2,048,383 rows. If you need < 16,000 rows, then you can comment out the second CROSS JOIN statement. If you need more than 2 million rows in your numbers table, good luck — I added a 4th instance of sys.objects, and killed it after several minutes. I'm not sure if you'd want to go over the limit of the INT data type for something like this, but you know your requirements better than I do. For the situation we're talking about here, let's assume that the practical upper bound for the number of selected students is 10,000.
SET NOCOUNT ON; DECLARE @UpperLimit INT; SET @UpperLimit = 10000; WITH n AS ( SELECT rn = ROW_NUMBER() OVER (ORDER BY s1.[object_id]) FROM sys.objects AS s1 CROSS JOIN sys.objects AS s2 CROSS JOIN sys.objects AS s3 ) SELECT [Number] = rn - 1 INTO dbo.Numbers FROM n WHERE rn <= @UpperLimit + 1; CREATE UNIQUE CLUSTERED INDEX n ON dbo.Numbers([Number]);
Create a splitting function
There are many ways to skin this cat. Most involve cursors (or loops in disguise); this one doesn't. It uses a table-valued function and the numbers table to find each item in the list (and uses DISTINCT to avoid extra work from negligent duplicates).
CREATE FUNCTION dbo.SplitInts ( @List VARCHAR(MAX) ) RETURNS TABLE AS RETURN ( SELECT DISTINCT [Value] = CONVERT(INT, LTRIM(RTRIM(CONVERT( VARCHAR(12), SUBSTRING(@List, Number, CHARINDEX(',', @List + ',', Number) - Number))))) FROM dbo.Numbers WHERE Number <= CONVERT(INT, LEN(@List)) AND SUBSTRING(',' + @List, Number, 1) = ',' ); GO SELECT [Value] FROM dbo.SplitInts('1,3,5');
The explicit CONVERT to INT gives us our "free" validation. If we pass in a non-integer value or other garbage, we get an error message:
SELECT [Value] FROM dbo.SplitInts('1,3,5,junk_foobar');
Conversion failed when converting the varchar value 'junk_foobar' to data type int.
Fix the procedure
Now that we have a function that will return a table of values that we KNOW are integers, the stored procedure is quite easy to code. I could have gone for a proprietary UPDATE FROM here, but I figured it would be best to use a more straigthforward IN() clause.
DROP PROCEDURE dbo.UpdateStudents GO CREATE PROCEDURE dbo.UpdateStudents @List VARCHAR(MAX) AS BEGIN SET NOCOUNT ON; UPDATE dbo.Students SET ModifiedDate = CURRENT_TIMESTAMP WHERE StudentID IN ( SELECT [Value] FROM dbo.SplitInts(@List) ); END GO EXEC dbo.UpdateStudents @List = '1,3,5';
We could wrap a TRY/CATCH around that UPDATE statement to give the user a more friendly message (something we can't do from inside the function, and which we wouldn't want to anyway, because the function can be used for other entities besides Students):
ALTER PROCEDURE dbo.UpdateStudents @List VARCHAR(MAX) AS BEGIN SET NOCOUNT ON; DECLARE @msg VARCHAR(MAX); BEGIN TRY UPDATE dbo.Students SET ModifiedDate = CURRENT_TIMESTAMP WHERE StudentID IN ( SELECT [Value] FROM dbo.SplitInts(@List) ); END TRY BEGIN CATCH SELECT @msg = ERROR_MESSAGE(), @msg = SUBSTRING ( @msg, CHARINDEX('varchar value ''', @msg) + 14, LEN(@msg) - (CHARINDEX('varchar value ''', @msg) + 14) - CHARINDEX('''', REVERSE(@msg)) + 2 ); RAISERROR('Value for StudentID %s was invalid.', 11, 1, @msg); END CATCH END GO EXEC dbo.UpdateStudents '1,3,junk_foobar';
Now when we run the faulty code, we get this error message:
Value for StudentID 'junk_foobar' was invalid.
If we want to simply ignore any garbage that is passed in, and only return values that are actually present in our numbers table, we can change the function like this:
CREATE FUNCTION dbo.SplitInts ( @List VARCHAR(MAX) ) RETURNS TABLE AS RETURN ( SELECT DISTINCT [Value] = [Number] FROM dbo.Numbers WHERE CHARINDEX ( ',' + RTRIM([Number]) + ',', ',' + REPLACE(@List,' ','') + ',' ) > 0 ); GO SELECT [Value] FROM dbo.SplitInts('1,3,5,junk,200000,2');
If we want to support more than 10,000 items in @List, we don't have to touch the code, we just have to make sure that we add enough rows to the dbo.Numbers table.
You may find value in Tony Rogerson's solution, which has a couple of nice features, such as indicating which positional value failed validation. Erland Sommarskog also does a very thorough treatment of the topic in this article. If you want to process sets of delimited values (e.g. 'foo|bar,blat|splunge,goo|ber'), you can see my post on Google groups. Finally, if you are using SQL Server 2008 only, then I also suggest you take a look at Table-Valued Parameters.