Building DBML syntax from metadata
April 13th, 2022
Building DBML syntax from metadata
April 13th, 2022
 
 

I recently came across dbdiagram.io, a free, web-based solution for displaying entity relationship diagrams (ERDs) using database markup language (DBML). I like this approach compared to the long-standing database diagrams feature in Management Studio, simply because it's much easier to share and collaborate.

DBML expects syntax like the following very basic example:

Table "dbo.Users" 
{
  UserID int [pk]
  Username nvarchar(128)
}
 
Table "dbo.Posts" 
{
  PostID int [pk] 
  UserID int
}
 
Ref: "dbo.Posts".(UserID) > "dbo.Users".(UserID)

This will produce the following ER diagram:

Simple ER Diagram

But this syntax isn't exactly something that SQL Server has been designed to produce for you.

I thought it would be a fun exercise to dynamically build the syntax that could be used to show the relationships between most of the tables on Stack Exchange Data Explorer (SEDE), since those relationships are not always obvious.

Now, I say most because, for various reasons, some of the real tables in our production databases are transformed and/or combined before being published to SEDE. Also, foreign key relationships are not implemented, since the database is read only. Luckily, I happen to have access to a version of this database that you don't, and I can query the metadata to help build the output needed for such a diagram.

Because there isn't always a 1:1 mapping between what you see and what I see internally, I am going to work with the following subset of tables (20 of the 28 tables currently presented in SEDE):

Badges
PostFeedback
PostHistory
PostHistoryTypes
PostLinks
PostNotices
PostNoticeTypes
Posts
PostTags
PostTypes
ReviewRejectionReasons
ReviewTaskResults
ReviewTaskResultTypes
ReviewTasks
ReviewTaskStates
ReviewTaskTypes
SuggestedEdits
Tags
TagSynonyms
Users

The point isn't so you can visually learn our schema but so you can use this example to generate DBML from your own schema.

Baseline : Defining the tables I care about

I started by dumping the list of tables I'm interested in into a #temp table:

CREATE TABLE #Tables
(
  TableName nvarchar(520), 
  ObjectID  int
);
 
INSERT #Tables(TableName, ObjectID)
SELECT t, OBJECT_ID(t) FROM 
( 
  VALUES (N'dbo.Badges'),                 (N'dbo.PostFeedback'),                  
         (N'dbo.PostHistory'),            (N'dbo.PostHistoryTypes'),
         (N'dbo.PostLinks'),              (N'dbo.PostNotices'),
         (N'dbo.PostNoticeTypes'),        (N'dbo.Posts'), 
         (N'dbo.PostTags'),               (N'dbo.PostTypes'),       
         (N'dbo.ReviewRejectionReasons'), (N'dbo.ReviewTaskResults'),
         (N'dbo.ReviewTaskResultTypes'),  (N'dbo.ReviewTasks'),
         (N'dbo.ReviewTaskStates'),       (N'dbo.ReviewTaskTypes'), 
         (N'dbo.SuggestedEdits'),         (N'dbo.Tags'),
         (N'dbo.TagSynonyms'),            (N'dbo.Users')
) AS t(t);

Once I have the tables, I can join to catalog views like sys.columns, sys.key_constraints, sys.foreign_keys, and sys.foreign_key_columns to build out the code to represent the relationships. Thankfully, this specific tool doesn't require the tables to be presented in any sort of dependency order, so there is one less complexity that is usually a part of building scripts around foreign keys.

Getting all the columns + data types

Ever since SQL Server 2012 I've absolutely loved being able to build columns and data types from sys.dm_exec_describe_first_result_set. This makes it really easy to derive things like CREATE TABLE scripts, CRUD parameter lists, or documentation. We can use our #temp table and grab all of their columns and data types in one swoop:

SELECT t.TableName, 
    ColumnName = c.name,
    DataType   = c.system_type_name,
    Ordinal    = c.column_ordinal
  INTO #Columns
  FROM #Tables AS t 
  CROSS APPLY
  (
    SELECT name, system_type_name, column_ordinal
      FROM sys.dm_exec_describe_first_result_set
      (N'SELECT * FROM ' + t.TableName, NULL, 1)
  ) AS c;
 
SELECT * FROM #Columns;

This produces a result like the following (truncated):

TableName    ColumnName   DataType       Ordinal
----------   ----------   ------------   -------
dbo.Badges   Id           int            1
dbo.Badges   Class        tinyint        2
dbo.Badges   Name         nvarchar(50)   3
...

Getting the primary key columns

Next we can grab all of the columns that participate in each table's primary key constraint:

SELECT t.TableName, ColumnName = c.name
INTO #PrimaryKeys
FROM #Tables AS t
INNER JOIN sys.index_columns AS ic 
      ON ic.[object_id] = t.ObjectID 
INNER JOIN sys.columns AS c
      ON ic.column_id = c.column_id
      AND ic.[object_id] = c.[object_id]
INNER JOIN sys.key_constraints AS pk
      ON pk.parent_object_id = c.object_id
      AND pk.type = N'PK'
      AND pk.unique_index_id = ic.index_id;
 
SELECT * FROM #PrimaryKeys;

In our case, this is pretty simple, because the PK is almost always a column called Id – no compound PKs or even data types outside the int family. Output:

TableName                    ColumnName
--------------------------   ----------
dbo.Badges                   Id
dbo.PostFeedback             Id
dbo.PostHistory              Id
dbo.PostHistoryTypes         Id
dbo.PostLinks                Id
dbo.PostNotices              Id
dbo.PostNoticeTypes          Id
dbo.Posts                    Id
dbo.PostTypes                Id
dbo.ReviewRejectionReasons   Id
dbo.ReviewTaskResults        Id
dbo.ReviewTaskResultTypes    Id
dbo.ReviewTasks              Id
dbo.ReviewTaskStates         Id
dbo.ReviewTaskTypes          Id
dbo.SuggestedEdits           Id
dbo.Tags                     Id
dbo.TagSynonyms              Id
dbo.Users                    Id

Getting the foreign key references

The last piece of the puzzle is linking the tables together via referential integrity so that the relationship diagram knows how to draw the arrows. While dbdiagram.io does support inline shorthand for single-column relationships, I'm just going to opt out of it and always use the longer form, so there is less conditional logic for anyone with compound keys.

This join is a little more intense (and could be much simpler with STRING_AGG(), but I know not everyone is on 2017+ just yet):

SELECT 
  ConstraintTable = ct.TableName,
  ConstraintColumns = STUFF((SELECT N',' + c.name
    FROM sys.columns AS c 
    INNER JOIN sys.foreign_key_columns AS fkc 
    ON fkc.parent_column_id = c.column_id
    AND fkc.parent_object_id = c.[object_id]
    WHERE fkc.constraint_object_id = fk.[object_id]
    ORDER BY fkc.constraint_column_id 
    FOR XML PATH(N''), 
    TYPE).value(N'./text()[1]', N'nvarchar(max)'), 1, 1, N''),
  ReferencedTable = rt.TableName,
  ReferencedColumns = STUFF((SELECT N',' + c.name
    FROM sys.columns AS c 
    INNER JOIN sys.foreign_key_columns AS fkc 
    ON fkc.referenced_column_id = c.column_id
    AND fkc.referenced_object_id = c.[object_id]
    WHERE fkc.constraint_object_id = fk.[object_id]
    ORDER BY fkc.constraint_column_id 
    FOR XML PATH(N''), 
    TYPE).value(N'./text()[1]', N'nvarchar(max)'), 1, 1, N'')
INTO #ForeignKeys
FROM sys.foreign_keys AS fk
INNER JOIN #Tables AS rt -- referenced table
  ON fk.referenced_object_id = rt.ObjectID
INNER JOIN #Tables AS ct -- constraint table
  ON fk.parent_object_id = ct.ObjectID;
 
SELECT * FROM #ForeignKeys;

Output (truncated):

ConstraintTable   ConstraintColumns  ReferencedTable       ReferencedColumns
----------------  -----------------  --------------------  -----------------
dbo.PostFeedback  PostId             dbo.Posts             Id
dbo.PostFeedback  UserId             dbo.Users             Id
dbo.PostHistory   UserId             dbo.Users             Id
dbo.PostHistory   PostId             dbo.Posts             Id
dbo.PostHistory   PostHistoryTypeId  dbo.PostHistoryTypes  Id
dbo.PostNotices   PostNoticeTypeId   dbo.PostNoticeTypes   Id
dbo.Posts         PostTypeId         dbo.PostTypes         Id
...

Putting it all together

With all this data tucked away in #temp tables, much of the logic to generate the output for dbdiagram.io relatively simple. But first, in order to output data in SSMS should our strings get extremely large, we are going to need to create some objects to help with that – a numbers table and an ordered string splitting function, which I proposed in the tip, "Validate the contents of large dynamic SQL strings."

CREATE TABLE dbo.Numbers
(
  n int PRIMARY KEY
);
 
;WITH n(n) AS 
(
  SELECT TOP (1000) number FROM master.dbo.spt_values
    WHERE [type] = N'P' AND number BETWEEN 1 AND 1000
    ORDER BY number
)
INSERT dbo.Numbers(n)
  SELECT TOP (1000000) n = ROW_NUMBER() OVER (ORDER BY n1.n)
  FROM n AS n1 CROSS JOIN n AS n2 -- CROSS JOIN n AS n3
  ORDER BY n;
GO
 
CREATE FUNCTION dbo.SplitStrings_Ordered
(
    @List       nvarchar(max),
    @Delimiter  nvarchar(255)
)
RETURNS TABLE
AS
    RETURN ( SELECT n, s = SUBSTRING(@List,n,
        CHARINDEX(@Delimiter,@List+@Delimiter,n)-n)
      FROM ( SELECT n FROM dbo.Numbers WHERE n <= LEN(@List)
      AND SUBSTRING(@Delimiter+@List,n,LEN(@Delimiter)) = @Delimiter
      ) AS y );
GO

With those in place, and with results to grid, we can run this query:

DECLARE @output nvarchar(max) = N'';
 
SELECT @output += char(13) + char(10) + N'Table ' 
  + QUOTENAME(TableName, char(34)) 
  + char(13) + char(10) + N'{'
  + char(13) + char(10) + STUFF((SELECT char(13) + char(10) 
        + space(4) + c.ColumnName + N' ' + c.DataType 
        + CASE WHEN pk.ColumnName IS NOT NULL THEN N' [pk]' ELSE N'' END
      FROM #Columns AS c
      LEFT OUTER JOIN #PrimaryKeys AS pk
        ON c.TableName = pk.TableName
        AND c.ColumnName = pk.ColumnName
      WHERE c.TableName = t.TableName
      ORDER BY c.Ordinal FOR XML PATH(N''),
        TYPE).value(N'./text()[1]', N'nvarchar(max)'),2,1,'')
  + char(13) + char(10) + N'}' + char(13) + char(10)
FROM #Tables AS t;
 
SELECT @output += char(13) + char(10) + N'Ref: ' 
  + QUOTENAME(ConstraintTable, char(34))
  + N'.(' + ConstraintColumns + N') > '
  + QUOTENAME(ReferencedTable, char(34))
  + N'.(' + ReferencedColumns + N')'
FROM #ForeignKeys;
 
SELECT s FROM dbo.SplitStrings_Ordered(@output, char(13) + char(10)) ORDER BY n;

The output will look something like this (and remember, order doesn't matter):

Output in SSMS

Now you can select that entire grid and copy and paste the contents (without the column header) into the sidebar on dbdiagram.io, and end up with at least a starting point.

For posterity, a screenshot of the diagram produced on the site, as well as the entire DBML output, are in the appendix. In the meantime, try it out! DBML is a bit gnarly but hopefully the scripts above will help you convert your existing schema to the required syntax with little difficulty.

Appendix

Here is a screenshot and the verbose final output (click to enlarge):

Our creation on dbdiagram.io

Table "dbo.Posts"
{

    Id int [pk]
    PostTypeId tinyint
    CreationDate datetime
    Score int
    Body nvarchar(max)
    OwnerUserId int
    LastEditorUserId int
    LastEditDate datetime
    LastActivityDate datetime
    LastActivityUserId int
    ParentId int
    AcceptedAnswerId int
    Title nvarchar(250)
    Tags nvarchar(250)
    CommunityOwnedDate datetime
    HistorySummary nvarchar(150)
    AnswerScore int
    AnswerCount int
    CommentCount int
    FavoriteCount int
    DeletionDate datetime
    ClosedDate datetime
    LockedDate datetime
    OwnerDisplayName nvarchar(40)
    LastEditorDisplayName nvarchar(40)
    BountyAmount int
    BountyCloses datetime
    BountyClosed datetime
    LastOwnerEmailDate datetime
    ProtectedDate datetime
    MigrationDate datetime
    IsAcceptedAnswer bit
    LastChange timestamp
    IsAnswered bit
    HasNotices bit
    IsFrozen bit
    QualityScore smallint
    IsClosedAsDuplicate bit
    LastActivityTypeId tinyint
    OwnerIsNewPoster bit
    HasOneboxes bit
    LicenseVersion tinyint
    ReactionCount int
    Reaction1Count smallint
    Reaction2Count smallint
    Reaction3Count smallint
    Reaction4Count smallint
    Reaction5Count smallint
    Reaction6Count smallint
    Reaction7Count smallint
    Reaction8Count smallint
    PostState tinyint
}

Table "dbo.Badges"
{

    Id int [pk]
    Class tinyint
    Name nvarchar(50)
    Description nvarchar(max)
    Single bit
    Secret bit
    TagBased bit
    AwardFrequency int
    BadgeReasonTypeId tinyint
    IsCode bit
    AwardedCount int
}

Table "dbo.PostFeedback"
{

    Id int [pk]
    PostId int
    IPAddress varchar(15)
    UserId int
    VoteTypeId tinyint
    CreationDate datetime
}

Table "dbo.PostHistory"
{

    Id int [pk]
    PostHistoryTypeId tinyint
    PostId int
    RevisionGUID uniqueidentifier
    CreationDate datetime
    IPAddress varchar(15)
    UserId int
    Comment nvarchar(400)
    Text nvarchar(max)
    UserDisplayName nvarchar(40)
    ApplicationId int
}

Table "dbo.PostHistoryTypes"
{

    Id tinyint [pk]
    Name nvarchar(50)
    Description nvarchar(300)
}

Table "dbo.PostLinks"
{

    Id int [pk]
    CreationDate datetime
    PostId int
    RelatedPostId int
    LinkTypeId tinyint
    Rank tinyint
}

Table "dbo.PostNotices"
{

    Id int [pk]
    PostId int
    PostNoticeTypeId int
    CreationDate datetime
    DeletionDate datetime
    ExpiryDate datetime
    Body nvarchar(max)
    OwnerUserId int
    DeletionUserId int
    LastChange timestamp
}

Table "dbo.PostNoticeTypes"
{

    Id int [pk]
    ClassId tinyint
    Name nvarchar(80)
    Body nvarchar(max)
    IsHidden bit
    Predefined bit
    PostNoticeDurationId int
}

Table "dbo.PostTags"
{

    PostId int
    TagId int
}

Table "dbo.PostTypes"
{

    Id tinyint [pk]
    Name nvarchar(50)
    Description nvarchar(300)
}

Table "dbo.ReviewRejectionReasons"
{

    Id tinyint [pk]
    Name nvarchar(100)
    Description nvarchar(300)
    PostTypeId tinyint
    IsHidden bit
}

Table "dbo.ReviewTaskResults"
{

    Id int [pk]
    ReviewTaskId int
    ReviewTaskResultTypeId tinyint
    UserId int
    CreationDate datetime
    DeletionDate datetime
    PostCommentId int
    RejectionReasonId tinyint
    Comment nvarchar(150)
    ResultSubtypeFlags int
    DurationMs int
    AuditPassed bit
    ReviewTaskTypeId tinyint
}

Table "dbo.ReviewTaskResultTypes"
{

    Id tinyint [pk]
    Name nvarchar(100)
    Description nvarchar(300)
}

Table "dbo.ReviewTasks"
{

    Id int [pk]
    ReviewTaskTypeId tinyint
    CreationDate datetime
    DeletionDate datetime
    ReviewTaskState tinyint
    IsAudit bit
    PostId int
    SuggestedEditId int
    AuditPositivity bit
    AuditTitle nvarchar(max)
    AuditBody nvarchar(max)
    AuditComment nvarchar(max)
    ValidResultCount tinyint
    CreationReason tinyint
}

Table "dbo.ReviewTaskStates"
{

    Id tinyint [pk]
    Name nvarchar(50)
    Description nvarchar(300)
}

Table "dbo.ReviewTaskTypes"
{

    Id tinyint [pk]
    Name nvarchar(50)
    Description nvarchar(300)
}

Table "dbo.SuggestedEdits"
{

    Id int [pk]
    PostId int
    CreationDate datetime
    ApprovalDate datetime
    RejectionDate datetime
    OwnerUserId int
    OwnerUserIPAddress varchar(15)
    OwnerUserCookieGuid uniqueidentifier
    Comment nvarchar(800)
    Text nvarchar(max)
    Title nvarchar(250)
    Tags nvarchar(250)
    RevisionGUID uniqueidentifier
    RejectionReasonId tinyint
    ApplicationId int
    IsSignificant bit
}

Table "dbo.Tags"
{

    Id int [pk]
    Name nvarchar(35)
    Count int
    UserId int
    CreationDate datetime
    IsModeratorOnly bit
    IsRequired bit
    WikiPostId int
    IsProtected bit
    ExcerptPostId int
    CodeLanguage nvarchar(50)
    LastActivityDate datetime
    AnswerScoreTop1Percent numeric(18,2)
    AnswerScoreTop5Percent numeric(18,2)
    AnswerScoreTop10Percent numeric(18,2)
    AnswerScoreTop20Percent numeric(18,2)
    AnswerScoreTop30Percent numeric(18,2)
    FriendlyName nvarchar(50)
    IsDiverged bit
    IsArticleRequired bit
}

Table "dbo.TagSynonyms"
{

    Id int [pk]
    SourceTagName nvarchar(35)
    TargetTagName nvarchar(35)
    CreationDate datetime
    OwnerUserId int
    AutoRenameCount int
    LastAutoRename datetime
    Score int
    ApprovedByUserId int
    ApprovalDate datetime
}

Table "dbo.Users"
{

    Id int [pk]
    UserTypeId tinyint
    Reputation int
    Views int
    CreationDate datetime
    LastAccessDate datetime
    HasReplies bit
    OptInEmail bit
    LastLoginDate datetime
    LastEmailDate datetime
    LastLoginIP nvarchar(15)
    Email nvarchar(100)
    DisplayName nvarchar(40)
    WebsiteUrl nvarchar(200)
    RealName nvarchar(100)
    Location nvarchar(100)
    PreferencesRaw nvarchar(max)
    TimedPenaltyDate datetime
    LastDailySiteAccessDate datetime
    DisplayNameLower nvarchar(40)
    VerifiedEmail bit
    DaysVisitedConsecutive int
    DaysVisitedTotal int
    LastModifiedDate datetime
    IsVeteran bit
    ReputationWeek int
    ReputationMonth int
    ReputationQuarter int
    ReputationYear int
    ReputationSinceLastCheck int
    AcceptRateAsked smallint
    AcceptRateAccepted smallint
    ReputationToday int
    AccountId int
    ProfileImageUrl nvarchar(200)
    HasAboutMeExcerpt bit
    SignupStarted bit
    Title nvarchar(225)
    TeamId int
    TeamName nvarchar(150)
    BronzeBadges int
    SilverBadges int
    GoldBadges int
    JobSearchStatus tinyint
    Flags bigint
    QuestionCount int
    AnswerCount int
    HasQuestionLists bit
    ChannelInlineOnboardingFlags bigint
}

Ref: "dbo.PostFeedback".(PostId) > "dbo.Posts".(Id)
Ref: "dbo.Posts".(ParentId) > "dbo.Posts".(Id)
Ref: "dbo.Posts".(AcceptedAnswerId) > "dbo.Posts".(Id)
Ref: "dbo.PostHistory".(PostId) > "dbo.Posts".(Id)
Ref: "dbo.ReviewTasks".(PostId) > "dbo.Posts".(Id)
Ref: "dbo.PostHistory".(PostHistoryTypeId) > "dbo.PostHistoryTypes".(Id)
Ref: "dbo.PostNotices".(PostNoticeTypeId) > "dbo.PostNoticeTypes".(Id)
Ref: "dbo.Posts".(PostTypeId) > "dbo.PostTypes".(Id)
Ref: "dbo.ReviewRejectionReasons".(PostTypeId) > "dbo.PostTypes".(Id)
Ref: "dbo.ReviewTaskResults".(RejectionReasonId) > "dbo.ReviewRejectionReasons".(Id)
Ref: "dbo.ReviewTaskResults".(ReviewTaskResultTypeId) > "dbo.ReviewTaskResultTypes".(Id)
Ref: "dbo.ReviewTaskResults".(ReviewTaskId) > "dbo.ReviewTasks".(Id)
Ref: "dbo.ReviewTasks".(ReviewTaskState) > "dbo.ReviewTaskStates".(Id)
Ref: "dbo.ReviewTasks".(ReviewTaskTypeId) > "dbo.ReviewTaskTypes".(Id)
Ref: "dbo.ReviewTasks".(SuggestedEditId) > "dbo.SuggestedEdits".(Id)
Ref: "dbo.Posts".(OwnerUserId) > "dbo.Users".(Id)
Ref: "dbo.Tags".(UserId) > "dbo.Users".(Id)
Ref: "dbo.Posts".(LastEditorUserId) > "dbo.Users".(Id)
Ref: "dbo.Posts".(LastActivityUserId) > "dbo.Users".(Id)
Ref: "dbo.PostFeedback".(UserId) > "dbo.Users".(Id)
Ref: "dbo.PostHistory".(UserId) > "dbo.Users".(Id)
Ref: "dbo.ReviewTaskResults".(UserId) > "dbo.Users".(Id)
By: Aaron Bertrand

I am a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. I am a long-time Microsoft MVP, write at Simple Talk, SQLPerformance, and MSSQLTips, and have had the honor of speaking at more conferences than I can remember. In non-tech life, I am a husband, a father of two, a huge hockey and football fan, and my pronouns are he/him.