Two BULK INSERT issues I worked around recently
Since I am still afraid of SSIS, and because I am dealing mostly with CSV files and table structures that are relatively simple and require only one of the three letters in the acronym "ETL," I find myself using BULK INSERT a lot. I have been meaning to switch to using CLR, since I am doing a lot of file system querying using xp_cmdshell, but I haven't had the chance to really explore it yet. I know, a lot of you are probably thinking, wow, look at all those bad habits. But for every person thinking that way, I am sure there is someone thinking, yes, I am still using BULK INSERT as well. So, let me tell you about two little problems I came across recently, and how I worked around them.
The Phantom Carriage Return
In one system, we get log files from a bunch of application servers running Linux. The other day we built a new log file output mechanism using the same code (we are removing, as much as we can, the applications' dependency on the database). However, the same code outputting the same type of log file (just containing different data) was generating the following errors when I tried to BULK INSERT any file with more than one row:
Msg 4863, Level 16, State 1, Line 1 Bulk load data conversion error (truncation) for row 1, column <last column number> (<last column name >).
I asked the developer if anything had changed with the constant he was using for carriage returns, and he said no, "\n" was still in place. No matter how I changed the code, I was still getting the error:
... (WITH ROWTERMINATOR = '\r\n', ... ); ... (WITH ROWTERMINATOR = '\n\r', ... ); ... (WITH ROWTERMINATOR = '\n\n', ... ); ... (WITH ROWTERMINATOR = '\r', ... );
Nothing would work. I had half a mind to go and look at the Perl code on the Linux box, but before I got to that point, and on a total lark, I tried this version of the code instead:
DECLARE @b NVARCHAR(2048); SET @b = 'BULK INSERT ... WITH (ROWTERMINATOR = ''' + CHAR(10) + ''', ... );'
Lo and behold; if I used the CHAR() code from SQL embedded in the string, it worked! For some of you, the addition of dynamic SQL will make you wag your finger at me even more. But for anyone who is polling for new files to load, you are probably already using dynamic SQL to construct these statements anyway. I haven't really dug into why this worked but "\n" didn't, I just knew it was working and could move onto the next most important task.
Using BULK INSERT Against a Linked Server
When you try to use BULK INSERT against a four-part name, e.g.:
BULK INSERT [OtherServer].[MyDatabase].[dbo].[foo] ... ;
You get the following error:
Msg 208, Level 16, State 82, Line 1 Invalid object name 'foo'.
I complained about the misleading error message in Connect #525287: BULK INSERT to linked server gives misleading error message. In that Connect item, I mentioned that I had an easy workaround, and had a follow-up from "Pbaggett" asking me about my workaround (though I just noticed the comment today), so I thought it was a perfect opportunity to make this post a little meatier.
In the system I am working in, there is a central table that describes various tables that exist on various systems in the environment. Most of the tables that are a target for BULK INSERT are local, but a few are remote. So, this central table contains information about the server if it exists; otherwise it is NULL. As described above, all BULK INSERT statements are constructed dynamically anyway, since the target tables aren't always inserted to, and the file names and number of files are volatile. Anyway, the trick here is to execute a SQL statement *on* the remote server when necessary (using a dynamically built, and then nested, sp_executesql call). I'll illustrate what I mean here with some sample code (for brevity, I've left placeholders for error handling, loops, etc.):
DECLARE @LinkedServer VARCHAR(32), @Database NVARCHAR(128), @Table NVARCHAR(255), @FilePath VARCHAR(2048), @sql NVARCHAR(MAX); -- this would actually be in two loops: one to get all of the target tables, -- and the other to get all of the relevant files for each target table. SELECT @LinkedServer = '[OtherServer]', @Database = '[MyDB]', @Table = '[dbo].[foo]', @FilePath = '\\FileServer\file.csv'; SET @sql = N'BULK INSERT ' + @Database + '.' + @Table + ' FROM ''' + @FilePath + ''' ' + 'WITH (FIELDTERMINATOR = ''|'', ROWTERMINATOR = ''\n'');'; SET @sql = N'sp_executesql N''' + REPLACE(@sql, '''', '''''') + ''';'; BEGIN TRY -- sp_testlinkedserver goes here SET @sql = N'EXEC ' + COALESCE(@Server + '...', '') + @sql; PRINT @sql; -- I've commented out the EXEC here; PRINT should -- give you the gist of what's going on. --EXEC master..sp_executesql @sql; END TRY BEGIN CATCH -- error handling END CATCH
Of course there are some complications here… doubling up the apostrophes and trying to debug the final SQL statement can be a pain; you also need to be sure that all of the remote servers are able to see the same network shares using the context that the linked server uses. But so far this workaround has allowed me to keep my log polling / loading process central, even while the distribution of data has scaled out to other servers.