Advanced JSON Techniques in SQL Server - Part 2

By:   |   Comments (1)   |   Related: 1 | 2 | 3 | > SQL Server 2016


Problem

In my previous tip, Advanced JSON Techniques in SQL Server 2016 - Part 1, I gave examples of three new functions to support JSON features: OPENJSON(), ISJSON(), and JSON_VALUE(). In this tip, I wanted to show how you can use JSON_VALUE() to index particular data points within a JSON string.

Solution

Let's say you have a simple table to store data about customers, and because different customers can have different attributes, you've chosen to store those attributes in an NVARCHAR column using JSON structure. The original table is pretty simple:

CREATE TABLE dbo.Customers
(
  CustomerID INT NOT NULL,
  DateCreated DATETIME2 NOT NULL DEFAULT SYSDATETIME(),
  [GUID] UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID(),
  Attributes NVARCHAR(4000)
  CONSTRAINT PK_Cust_A PRIMARY KEY(CustomerID),
  CONSTRAINT IsValidJSON_A CHECK (ISJSON(Attributes) = 1)
);

A sample row would have the following JSON structure in the Attributes column:

{"HairColor" : "red", "EyeColor : "green", "YearJoined" : 1974 }

Now, if I want to find all the customers with red hair, I would run a query like this:

SELECT CustomerID, JSON_VALUE(Attributes, '$.HairColor')
  FROM dbo.Customers_A
  WHERE JSON_VALUE(Attributes, '$.HairColor') = N'red'; 

This, of course, would have to scan the entire table, according to SQL Sentry Plan Explorer:

Table scan when searching for a particular value with JSON

A little trick I've learned from Jovan Popovic, though, is that you can create a computed column from JSON_VALUE(), and index the computed column. So let's create a new copy of the table:

CREATE TABLE dbo.Customers_B
(
  CustomerID INT NOT NULL,
  DateCreated DATETIME2 NOT NULL DEFAULT SYSDATETIME(),
  [GUID] UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID(),
  Attributes NVARCHAR(4000),
  HairColor AS JSON_VALUE(Attributes, '$.HairColor'),
  CONSTRAINT PK_Cust_B PRIMARY KEY(CustomerID),
  CONSTRAINT IsValidJSON_B CHECK (ISJSON(Attributes) = 1)
);

And then we can create this index:

CREATE INDEX HairColor ON dbo.Customers_B(HairColor);

Note that you'll get a warning here, because SQL Server has no way to know how long a specific JSON attribute will be - it could, theoretically, be almost the entire column:

Warning! The maximum key length for a nonclustered index is 1700 bytes. The index 'HairColor' has maximum length of 8000 bytes. For some combination of large values, the insert/update operation will fail.

Now we can make the query simpler:

SELECT CustomerID, HairColor 
   FROM dbo.Customers_B
   WHERE HairColor = N'red';

And now we are likely to get a much more efficient index seek:

Index Seek with an index created on the JSON_VALUE column

We also get the same plan for the original query, so SQL Server is already doing a great job of recognizing the JSON_VALUE() arguments and matching it to our indexed, computed column.

We could also choose to make the column persisted, which can mean less work for queries at the cost of more work for DML:

CREATE TABLE dbo.Customers_C
(
  CustomerID INT NOT NULL,
  DateCreated DATETIME2 NOT NULL DEFAULT SYSDATETIME(),
  [GUID] UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID(),
  Attributes NVARCHAR(4000),
  HairColor AS JSON_VALUE(Attributes, '$.HairColor') PERSISTED,
  CONSTRAINT PK_Cust_C PRIMARY KEY(CustomerID),
  CONSTRAINT IsValidJSON_C CHECK (ISJSON(Attributes) = 1)
);
GO

CREATE INDEX HairColor ON dbo.Customers_C(HairColor);

This leads again to the same plan and, at least at my small data volumes, similar performance. At scale, though, persisted computed columns are worth considering:

Query plan with a computed column with the JSON data

If you want to avoid the warning from above, or - more importantly - failures from inserting JSON strings with extremely long values for the HairColor attribute - you can create this index instead:

CREATE TABLE dbo.Customers_D
(
  CustomerID INT NOT NULL,
  DateCreated DATETIME2 NOT NULL DEFAULT SYSDATETIME(),
  [GUID] UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID(),
  Attributes NVARCHAR(4000),
  HairColor AS JSON_VALUE(Attributes, '$.HairColor') PERSISTED,
  CONSTRAINT PK_Cust_D PRIMARY KEY(CustomerID),
  CONSTRAINT IsValidJSON_D CHECK (ISJSON(Attributes) = 1)
);
GO

CREATE INDEX HairColor ON dbo.Customers_D(CustomerID) INCLUDE(HairColor);

This works, however since the HairColor value is no longer in the key for the index, there is no way for SQL Server to satisfy this specific query without performing a scan - though it chooses this skinnier index instead of the wider clustering key:

Index Scan on JSON computed column when the data is an INCLUDED column

Also note that I tried creating a filtered index, since, wouldn't it be nice to be able to create a smaller index that only has the rows we might care about? Alas, I tried this:

CREATE INDEX HairColor2 ON dbo.Customers_B(HairColor) WHERE HairColor = N'red';

And was promptly told:

Msg 10609, Level 16, State 1
Filtered index 'HairColor2' cannot be created on table 'dbo.Customers_B' because the column 'HairColor' in the filter expression is a computed column. Rewrite the filter expression so that it does not include this column.
Summary

JSON structures can be stored in SQL Server, but by default they just look like big blobs of text to the optimizer. You can use computed columns and indexes to make operations for extracting and searching specific JSON elements more efficient.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Aaron Bertrand Aaron Bertrand (@AaronBertrand) is a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com, and also blogs at sqlblog.org.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Wednesday, November 4, 2015 - 7:58:35 PM - Ian Yates Back To Top (39023)

Out of curiosity, what if you tried to create the filtered index against table _C rather than _B?  The difference being that "C" has the column persisted.  I don't have SQL 2016 installed at the moment to try it out unfortunately.  

Thanks for the post - the new JSON stuff looks like it'll be quite handy.  Do you know if it has full support in the Express edition too?















get free sql tips
agree to terms