September 10, 2010 | SQL Server

Updating a table with HTML data from a file

On the #SQLHelp hash tag today, @zippy1981 asked the following question:

Is there an easy one-liner (powershell,sqlcmd/SSMS/etc) to load a single html file on the local filesystem to a field in the DB?

While typing with one raised eyebrow (wanting to ask, but it wouldn't fit in 140 characters, "why would you want to do that?"), my quick answer was yes.  Well, there isn't necessarily a one-liner, and I'm not sure about doing this through PowerShell, SQLCMD or from within SSMS.  I do know you can do this with a stored procedure call, but there are a few of caveats to the solution below.

One caveat is that my old-school approach requires xp_cmdshell to be enabled, which may not be an option in some shops. When you try to use xp_cmdshell, by default, you should get this error:

Msg 15281, Level 16, State 1, Procedure xp_cmdshell, Line 1
SQL Server blocked access to procedure 'sys.xp_cmdshell' of component 'xp_cmdshell' because this component is turned off as part of the security configuration for this server. A system administrator can enable the use of 'xp_cmdshell' by using sp_configure. For more information about enabling 'xp_cmdshell', see "Surface Area Configuration" in SQL Server Books Online.

To enable xp_cmdshell, you can run the following code…

   EXEC sp_configure 'show advanced options', 1;
RECONFIGURE WITH OVERRIDE;
EXEC sp_configure 'xp_cmdshell', 1;
RECONFIGURE WITH OVERRIDE;
EXEC sp_configure 'show advanced options', 0;
RECONFIGURE WITH OVERRIDE;

Another caveat is that, because the type command will break the HTML file up line by line, we rely on undocumented string concatenation – so you need to make assumptions that the SELECT will pull the "rows" out in the same order.  I tested with rather small files, and if you publish minimized HTML files where everything is all on a single line, it may not be an issue.  But I can't guarantee that this will work for all files in all environments.  For these reasons, I strongly recommend that you don't consider this a long-term solution, but rather investigate using an alternate solution.  One would be to use CLR inside the database to handle filesystem-related tasks; another would be to have an external application read the file contents and then call a stored procedure which passes the data along; yet another would be to investigate storing your HTML files using the new FILESTREAM feature introduced in SQL Server 2008.

With those disclaimers out of the way, let's pretend I have an HTML file on my hard drive, called C:\test.html.  Here is how I would go about updating an existing row in an arbitrary table with the HTML contents from that file (and no, single quotes in the file will not screw anything up with this, however single quotes in the path or filename might):

CREATE TABLE dbo.Foo
(
   FooID INT PRIMARY KEY,
   HTML  NVARCHAR(MAX)
);
GO
 
INSERT dbo.Foo(FooID, HTML) SELECT 1, NULL;
GO
 
CREATE PROCEDURE dbo.Foo_UpdateHTML
   @FooID        INT,
   @HTMLFilePath VARCHAR(8000)
AS
BEGIN
   SET NOCOUNT ON;
 
   DECLARE 
       @txt NVARCHAR(MAX),
       @cmd VARCHAR(8000) = 'type "' + @HTMLFilePath + '"';
 
   CREATE TABLE #f(t NVARCHAR(MAX));
 
   BEGIN TRY
 
       INSERT #f EXEC master..xp_cmdshell @cmd;
 
       IF @@ROWCOUNT > 0  
       BEGIN
           DELETE #f WHERE t IS NULL;
 
           IF EXISTS 
           (
               SELECT 1
                   FROM #f
                   WHERE t LIKE '%The system cannot find the file specified%'
           )
           BEGIN
               RAISERROR('File not found.', 11, 1);
           END
 
           SELECT @txt = COALESCE(@txt + CHAR(13) + CHAR(10), '') + t
               FROM #f;
 
           UPDATE dbo.Foo 
               SET   HTML  = @txt
               WHERE FooID = @FooID;
       END
 
   END TRY
   BEGIN CATCH
       SET @txt = ERROR_MESSAGE();
       RAISERROR('HTML file error (%s)', 11, 1, @txt);    
   END CATCH
 
   DROP TABLE #f;
END
GO 
 
EXEC dbo.Foo_UpdateHTML
   @FooID        = 1,
   @HTMLFilePath = 'C:\test.html';
GO
 
SELECT FooID, HTML FROM dbo.Foo;
GO
 
DROP PROCEDURE dbo.Foo_UpdateHTML;
DROP TABLE dbo.Foo;
GO

In this case if the file is not valid you will receive the following error message:

 Msg 50000, Level 11, State 1, Procedure Foo_UpdateHTML, Line 44
HTML file error (File not found.)

Of course you are more than welcome to adjust the error handling to your liking.  But I will recommend again that, while you may want to use this as a stop-gap solution, you are going to be better off in the long run using a solution that requires several assumptions to be correct.

EDIT: From Adam Machanic's suggestion, here is a different version of the procedure which uses BULK INSERT, avoiding both the xp_cmdshell requirement as well as the quirky string concatenation:

 CREATE PROCEDURE dbo.Foo_UpdateHTML
   @FooID        INT,
   @HTMLFilePath VARCHAR(8000)
AS
BEGIN
   SET NOCOUNT ON;
 
   DECLARE @txt NVARCHAR(MAX);
 
   CREATE TABLE #f(t NVARCHAR(MAX));
 
   BEGIN TRY
       SET @txt = 'BULK INSERT #f FROM ''' + @HTMLFilePath + '''
            WITH
            (
                ROWTERMINATOR   = ''!@#$%^&*()'', 
                FIELDTERMINATOR = ''!@#$%^&*()''
            );';
 
       EXEC sp_executeSQL @txt;
 
       UPDATE dbo.Foo
           SET HTML = (SELECT t FROM #f)
           WHERE FooID = @FooID;
   END TRY
   BEGIN CATCH
       DECLARE @txt NVARCHAR(MAX) = ERROR_MESSAGE();
       RAISERROR('HTML file error (%s)', 11, 1, @txt);    
   END CATCH
 
   DROP TABLE #f;
END
GO

6 comments on this post

    • cinahcaM madA - September 11, 2010, 2:57 AM

      How about using BULK INSERT with field and row terminators that won't exist in the file (e.g. "!!!—THISISTHETERMINATOR—!!!")?

    • AaronBertrand - September 11, 2010, 6:27 AM

      Yeah, I didn't even think of doing it that way, so used to using BULK INSERT for log files that this didn't occur to me.  Makes perfect sense and I'll update the post.

    • Brad Schulz - September 11, 2010, 9:03 AM

      How about using OPENROWSET(BULK)?:
      UPDATE Foo
      SET HTML=(SELECT * FROM OPENROWSET(BULK 'C:\Test.HTML', SINGLE_NCLOB) X)
      WHERE FooID=1
      (If the HTML file is not actually in Unicode, then you'd declare it above as a SINGLE_CLOB).
      Or, in the procedure that accepts @HTMLFilePath as a parameter:
      DECLARE @sql NVARCHAR(MAX)
      SET @sql =
      'UPDATE Foo
      SET HTML=(SELECT * FROM OPENROWSET(BULK "'+@HTMLFilePath+"', SINGLE_NCLOB) X)
      WHERE FooID=@FooID'
      EXEC sp_executesql @sql, N'@FooID INT', @FooID
      –Brad

    • Justin Dearing - September 11, 2010, 4:30 PM

      Aaron,
      Thanks for the answer. Gotta love the SQL server community. The short answer to "why would I want to do that" is I store email templates in the database. The email text is stored in SVN. I'm trying to automate deployment of updated email texts.
      I'll definitely report back my progress..

    • AaronBertrand - September 11, 2010, 4:41 PM

      Justin,
      As you can see there are many ways to skin this cat.  I might suggest though, since you are already protecting your e-mail content with SVN, that you change the logic in your database to pull the HTML contents from the file at runtime (just reference the path the HTML file, instead of storing the HTML data).  Then you don't need to deploy anything, and it will also keep your database much slimmer.

    • Justin Dearing - September 11, 2010, 9:21 PM

      Aaron,
      Your suggestion duly noted. Filesystem is probably ideal for this. I wold not rule out FILESTREAM backed varbinary columns either.
      I also think this might be more of a mongodb job. This database is only used by a service that sends confirmation emails It is called by other applications that have their own, more complex, databases. Since the data will be readonly, I don't have to worry about the durability issues with mongo.
      With all of that taken into consideration. The service ain't broke now. A little script-fu would be the quickest way to skin this cat.

Comments are closed.