Illogical errors can be a factor of the plan

I've talked about illogical errors before. In several answers on Database Administrators (one, two, three), I show how you can use a CASE expression or TRY_CONVERT to work around an error where a non-numeric value, that should have been filtered out by a join or other clause, still leads to a conversion error. Erland Sommarskog raised a Connect item over a decade ago, still unaddressed, called "SQL Server should not raise illogical errors."

Recently we had a scenario where a query was failing on one server but not another. But this was slightly different; there were no numerics involved. Imagine this scenario: a source table has a column that is varchar(20). A query creates a table variable with a column that is varchar(10), and inserts rows from the source table, with a filter in place that only exposes values that are 10 characters or less.

In a lot of cases, of course, this scenario is perfectly fine, and everything works as expected.

CREATE TABLE dbo.src(i int, a varchar(20));
 
INSERT dbo.src VALUES(1,'aaaaa'),(2,'bbbbbbbbbbbbbbb');
 
DECLARE @t table(a varchar(10));
 
INSERT @t SELECT a FROM dbo.src WHERE i = 1;
 
DROP TABLE dbo.src;

If you change that to WHERE i = 2;, though, you would get this* error message:

Msg 2628, Level 16, State 1
String or binary data would be truncated in table 'tempdb.dbo.#A60B3C91', column 'a'. Truncated value: 'bbbbbbbbbb'.

No surprises here; this is how it should work.

* As this answer explains, certain versions may display a more generic message (see this feedback item), unless you have this fix and/or have enabled trace flag 460.

A more complex example

When the schema and the queries get more complicated, the plan becomes important. Here's a slightly more complicated scenario, but kept as simple as possible, to demonstrate the problem:

CREATE TABLE dbo.t1(id int NOT NULL, s varchar(5) NOT NULL);
CREATE TABLE dbo.t2(id int NOT NULL);
 
INSERT dbo.t1 (id, s) VALUES (1, 'l=3'), (2, 'len=5'), (3, 'l=3');
INSERT dbo.t2 (id)    VALUES (1), (3), (4), (5);
GO
 
DECLARE @t table(dest varchar(3) NOT NULL);
 
INSERT @t(dest) SELECT t1.s 
  FROM dbo.t1 
  INNER JOIN dbo.t2 ON t1.id = t2.id;
GO
 
DROP TABLE dbo.t1, dbo.t2;

This yields an error similar to the one we saw earlier:

Msg 2628, Level 16, State 1
String or binary data would be truncated in table 'tempdb.dbo.#AC65D70E', column 'dest'. Truncated value: 'len'.

While we should have only retrieved rows with values that fit in the destination column (id is 1 or 3, since those are the only two rows that match the join criteria), the error message indicates that the row where id is 2 was also returned, even though we know it couldn't possibly have been. Let's look at the estimated plan (with apologies for that terminology to Grant, Hugo, and Erin):

Plan derived from "Explain" in Azure Data StudioPlan derived from "Explain" in Azure Data Studio

You can see the CONVERT_IMPLICIT there, which you might think should have made this error impossible. You can also see that the hash match has only two rows coming out of it, but something missing from diagrams is how many rows go into an operator, and in this case the operator's property sheet reveals that the source table is used to materialize the build side. This is where the error occurs, because the implicit conversion has to be evaluated.

I can show a few ways that, without changing the query, we can make the error disappear (or reappear).
 

Different Data

With just one change to the source data:

CREATE TABLE dbo.t1(id int NOT NULL, s varchar(5) NOT NULL);
CREATE TABLE dbo.t2(id int NOT NULL);
 
INSERT dbo.t1 (id, s) VALUES (1, 'l=3'), (2, 'len=5'), (3, 'l=3');
INSERT dbo.t2 (id)    VALUES (1), (3), (4);--, (5);
GO
 
DECLARE @t table(dest varchar(3) NOT NULL);
 
INSERT @t(dest) SELECT t1.s 
  FROM dbo.t1 
  INNER JOIN dbo.t2 ON t1.id = t2.id;
GO
 
DROP TABLE dbo.t1, dbo.t2;

This time, the hash match yields no errorThis time, the hash match yields no error

We still have a hash match, but notice that the build side and probe side have swapped, a side effect of slightly different data. Because the hash table no longer has to materialize the result of the implicit conversion, no error occurs.
 

Different Schema

Another "solution" is to add a clustered index to dbo.t2 (and revert to inserting the original 4 rows):

CREATE TABLE dbo.t1(id int NOT NULL, s varchar(5) NOT NULL);
CREATE TABLE dbo.t2(id int PRIMARY KEY CLUSTERED);
 
INSERT dbo.t1 (id, s) VALUES (1, 'l=3'), (2, 'len=5'), (3, 'l=3');
INSERT dbo.t2 (id)    VALUES (1), (3), (4), (5);
GO
 
DECLARE @t table(dest varchar(3) NOT NULL);
 
INSERT @t(dest) SELECT t1.s 
  FROM dbo.t1 
  INNER JOIN dbo.t2 ON t1.id = t2.id;
GO
 
DROP TABLE dbo.t1, dbo.t2;

Here is how the plan changes:

Now we get a nested loops join and no errorNow we get a nested loops join and no error

The CONVERT_IMPLICIT is still there on the compute scalar, but the evaluation isn't forced before the bad row is eliminated, because the nested loops join doesn't have to materialize anything.
 

Different Query

You could also change the query; for example, you could apply functions like LEFT, SUBSTRING, or an explicit CONVERT (or CAST) to the column:

CREATE TABLE dbo.t1(id int NOT NULL, s varchar(5) NOT NULL);
CREATE TABLE dbo.t2(id int NOT NULL);
 
INSERT dbo.t1 (id, s) VALUES (1, 'l=3'), (2, 'len=5'), (3, 'l=3');
INSERT dbo.t2 (id)    VALUES (1), (3), (4), (5);
GO
 
DECLARE @t table(dest varchar(3) NOT NULL);
 
INSERT @t(dest) SELECT s = LEFT(t1.s, 3) -- or SUBSTRING(t1.s, 1, 3)
                                         -- or CONVERT(varchar(3), t1.s)
  FROM dbo.t1                            -- or CAST(t1.s AS varchar(3))
  INNER JOIN dbo.t2 ON t1.id = t2.id;
GO
 
DROP TABLE dbo.t1, dbo.t2;

The underlying transformation here is implemented as a SUBSTRING, and this gives the optimizer enough information to know that no value could be longer than 3 characters:

No error when applying LEFT()No error when applying LEFT()

You could also apply a WHERE clause to eliminate the rows explicitly. This works in this specific case but, as demonstrated in the posts above, this won't always work (nor will using a CTE or derived table):

CREATE TABLE dbo.t1(id int NOT NULL, s varchar(5) NOT NULL);
CREATE TABLE dbo.t2(id int NOT NULL);
 
INSERT dbo.t1 (id, s) VALUES (1, 'l=3'), (2, 'len=5'), (3, 'l=3');
INSERT dbo.t2 (id)    VALUES (1), (3), (4), (5);
GO
 
DECLARE @t table(dest varchar(3) NOT NULL);
 
INSERT @t(dest) SELECT t1.s
  FROM dbo.t1 
  INNER JOIN dbo.t2 ON t1.id = t2.id
  WHERE LEN(t1.s) <= 3;
GO
 
DROP TABLE dbo.t1, dbo.t2;

The plan is now:

No error with an explicit filterNo error with an explicit filter

In this case, the bad row is filtered out early enough that it becomes irrelevant which way the hash match operates.

You could also try to hint a loop join to avoid the hash match, and probably a half dozen other ways to change the query to either force earlier filtering or eliminate materialization.

Note that some hints may bring the error back, though. For example, in this case, OPTION (FORCE ORDER) (or any hint that preserves order) brings t1 back into the build side of the hash match, and the error returns:

CREATE TABLE dbo.t1(id int NOT NULL, s varchar(5) NOT NULL);
CREATE TABLE dbo.t2(id int NOT NULL);
 
INSERT dbo.t1 (id, s) VALUES (1, 'l=3'), (2, 'len=5'), (3, 'l=3');
INSERT dbo.t2 (id)    VALUES (1), (3), (4), (5);
GO
 
DECLARE @t table(dest varchar(3) NOT NULL);
 
INSERT @t(dest) SELECT t1.s
  FROM dbo.t1 
  INNER JOIN dbo.t2 ON t1.id = t2.id
  OPTION (FORCE ORDER);
GO
 
DROP TABLE dbo.t1, dbo.t2;

 

An ounce of prevention…

A more logical solution is to change the table variable to matching data types or, if longer data doesn't make sense in the source table, fix it at the source. This can have trickle-down effects but, generally, the problem stems from the source allowing values too large. Fix the source, and declarations too big elsewhere are unlikely to cause any issues unless they are populated from other sources.

In the meantime, make sure there isn't some errant data present that is explicitly causing the error, making you chase your tail. It wouldn't be the first time the code was blamed when bad data (or, rather, a schema too permissive) was actually at fault.
 

Conclusion

My point here isn't to tell you how to work around the error, it's more that you may or may not see the truncation error, even when it doesn't make sense. And that you may see the reverse happen — happily working code may suddenly break when someone adds data, updates statistics, creates or rebuilds an index, or a failover or service restart happens.

The schema and the underlying data can play a part in the plan that you get, and other factors may make you get different plans on different servers or even on the same server at different points in time, even when the schema and data are the same.

Huge shout out to Paul White, whose invaluable input made this post way better than it might have been otherwise.

Aaron Bertrand

I am a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. I am a long-time Microsoft MVP, speak frequently at conferences, and write at SQLPerformance and MSSQLTips. In real life I am an architect at Wayfair.

1 Response

  1. March 23rd, 2020

    […] Aaron Bertrand takes us through a problem with seemingly indeterminate query errors: […]