December 16, 2008 | SQL Server

SQL Server does not care about the order of your bulk file!

I have seen several people make this assumption over the past few weeks, and thought I should write up a little something to drive the point home.  Primarily people are expecting to take a flat file, like a log file, bulk insert it into SQL Server, and expect SQL Server to "remember" the original ordering of the file.  Some even go to the trouble of using an IDENTITY column, and rigging up a format file to skip this column, assuming that SQL Server will assign the IDENTITY values in the same order that the rows are listed in the file.  Well, this may be true if you don't have a clustered index, but it is not guaranteed.  Let me show you what happens when you do have a clustered index, since a lot of the time you will.  First, let's have a very simple CSV file, which I will save as C:\blat.txt:

1,2,1
2,1,2
1,2,3

Then, a very simple format file, which I will save as C:\format.fmt:

8.0
3
1    SQLCHAR    0    12    ","    2    A    ""
2    SQLCHAR    0    12    ","    3    B    ""
3    SQLCHAR    0    12    "\r\n"    4    C    ""

Now, given those two files, a repro script:

CREATE DATABASE BulkTest_Aaron; 
GO
USE BulkTest_Aaron; 
GO
 
CREATE TABLE dbo.A 
( 
   id INT IDENTITY(1,1), 
   A INT, 
   B INT, 
   C INT 
); 
GO 
CREATE CLUSTERED INDEX A ON dbo.A(A); 
GO
 
CREATE TABLE dbo.B 
( 
   id INT IDENTITY(1,1), 
   A INT, 
   B INT, 
   C INT 
); 
GO 
CREATE CLUSTERED INDEX B ON dbo.B(B); 
GO
 
CREATE TABLE dbo.C 
( 
   id INT IDENTITY(1,1), 
   A INT, 
   B INT, 
   C INT 
); 
GO 
CREATE CLUSTERED INDEX C ON dbo.C(C); 
GO
 
BULK INSERT dbo.A FROM 'c:\blat.txt' 
WITH (ROWTERMINATOR = '\r\n', FORMATFILE = 'c:\format.fmt'); 
GO 
BULK INSERT dbo.B FROM 'c:\blat.txt' 
WITH (ROWTERMINATOR = '\r\n', FORMATFILE = 'c:\format.fmt'); 
GO 
BULK INSERT dbo.C FROM 'c:\blat.txt' 
WITH (ROWTERMINATOR = '\r\n', FORMATFILE = 'c:\format.fmt'); 
GO
 
SELECT [t] = 'A',[id],A,B,C FROM dbo.A 
UNION ALL 
SELECT 'B',[id],A,B,C FROM dbo.B 
UNION ALL 
SELECT 'C',[id],A,B,C FROM dbo.C 
ORDER BY [t],[id]; 
GO
 
USE [master]; 
GO
 
DROP DATABASE BulkTest_Aaron; 
GO

Result:

Result 1 

Note that only in table C, which has a clustered index on the third column (which also happens to represent the row number in the file), do we see that SQL Server has assigned IDENTITY values in the same order as the rows in the file.

People also assume that if you have data in the table that CAN be used to identify the order (e.g. a sequence number of some kind, or date/time data), and you specify the ORDER option in the BULK INSERT statement, that this will fix the "problem."  Not so, as shown if you simply change the BULK INSERT statements above:

BULK INSERT dbo.A FROM 'c:\blat.txt' 
WITH (ROWTERMINATOR = '\r\n', FORMATFILE = 'c:\format.fmt', ORDER(C)); 
GO 
BULK INSERT dbo.B FROM 'c:\blat.txt' 
WITH (ROWTERMINATOR = '\r\n', FORMATFILE = 'c:\format.fmt', ORDER(C)); 
GO 
BULK INSERT dbo.C FROM 'c:\blat.txt' 
WITH (ROWTERMINATOR = '\r\n', FORMATFILE = 'c:\format.fmt', ORDER(C)); 
GO

Result:

 Result 2

The results are the same: only the table with the clustered index that happens to match a column in the file that dictates the order do we see SQL Server obey the original ordering in the data file. 

So, the moral of the story is, do not expect SQL Server to order your bulk data, logically or physically, exactly as it appears in the file.  If the order in the file is important, then consider including information in the file, such that the order can be reassembled using a query against the data — regardless of how SQL Server decided to store it.

1 comment on this post

    • Armando Prato - December 17, 2008, 8:33 PM

      Nice FYI, Aaron. I've never had the need to expect data to be bulked in a certain order but it's a nice to know, nonetheless.

Comments are closed.