I recently came across dbdiagram.io, a free, web-based solution for displaying entity relationship diagrams (ERDs) using database markup language (DBML). I like this approach compared to the long-standing database diagrams feature in Management Studio, simply because it's much easier to share and collaborate.
DBML expects syntax like the following very basic example:
Table "dbo.Users" { UserID int [pk] Username nvarchar(128) } Table "dbo.Posts" { PostID int [pk] UserID int } Ref: "dbo.Posts".(UserID) > "dbo.Users".(UserID)
This will produce the following ER diagram:
But this syntax isn't exactly something that SQL Server has been designed to produce for you.
I thought it would be a fun exercise to dynamically build the syntax that could be used to show the relationships between most of the tables on Stack Exchange Data Explorer (SEDE), since those relationships are not always obvious.
Now, I say most because, for various reasons, some of the real tables in our production databases are transformed and/or combined before being published to SEDE. Also, foreign key relationships are not implemented, since the database is read only. Luckily, I happen to have access to a version of this database that you don't, and I can query the metadata to help build the output needed for such a diagram.
Because there isn't always a 1:1 mapping between what you see and what I see internally, I am going to work with the following subset of tables (20 of the 28 tables currently presented in SEDE):
The point isn't so you can visually learn our schema but so you can use this example to generate DBML from your own schema.
Baseline : Defining the tables I care about
I started by dumping the list of tables I'm interested in into a #temp table:
CREATE TABLE #Tables ( TableName nvarchar(520), ObjectID int ); INSERT #Tables(TableName, ObjectID) SELECT t, OBJECT_ID(t) FROM ( VALUES (N'dbo.Badges'), (N'dbo.PostFeedback'), (N'dbo.PostHistory'), (N'dbo.PostHistoryTypes'), (N'dbo.PostLinks'), (N'dbo.PostNotices'), (N'dbo.PostNoticeTypes'), (N'dbo.Posts'), (N'dbo.PostTags'), (N'dbo.PostTypes'), (N'dbo.ReviewRejectionReasons'), (N'dbo.ReviewTaskResults'), (N'dbo.ReviewTaskResultTypes'), (N'dbo.ReviewTasks'), (N'dbo.ReviewTaskStates'), (N'dbo.ReviewTaskTypes'), (N'dbo.SuggestedEdits'), (N'dbo.Tags'), (N'dbo.TagSynonyms'), (N'dbo.Users') ) AS t(t);
Once I have the tables, I can join to catalog views like sys.columns
, sys.key_constraints
, sys.foreign_keys
, and sys.foreign_key_columns
to build out the code to represent the relationships. Thankfully, this specific tool doesn't require the tables to be presented in any sort of dependency order, so there is one less complexity that is usually a part of building scripts around foreign keys.
Getting all the columns + data types
Ever since SQL Server 2012 I've absolutely loved being able to build columns and data types from sys.dm_exec_describe_first_result_set
. This makes it really easy to derive things like CREATE TABLE
scripts, CRUD parameter lists, or documentation. We can use our #temp table and grab all of their columns and data types in one swoop:
SELECT t.TableName, ColumnName = c.name, DataType = c.system_type_name, Ordinal = c.column_ordinal INTO #Columns FROM #Tables AS t CROSS APPLY ( SELECT name, system_type_name, column_ordinal FROM sys.dm_exec_describe_first_result_set (N'SELECT * FROM ' + t.TableName, NULL, 1) ) AS c; SELECT * FROM #Columns;
This produces a result like the following (truncated):
TableName ColumnName DataType Ordinal ---------- ---------- ------------ ------- dbo.Badges Id int 1 dbo.Badges Class tinyint 2 dbo.Badges Name nvarchar(50) 3 ...
Getting the primary key columns
Next we can grab all of the columns that participate in each table's primary key constraint:
SELECT t.TableName, ColumnName = c.name INTO #PrimaryKeys FROM #Tables AS t INNER JOIN sys.index_columns AS ic ON ic.[object_id] = t.ObjectID INNER JOIN sys.columns AS c ON ic.column_id = c.column_id AND ic.[object_id] = c.[object_id] INNER JOIN sys.key_constraints AS pk ON pk.parent_object_id = c.object_id AND pk.type = N'PK' AND pk.unique_index_id = ic.index_id; SELECT * FROM #PrimaryKeys;
In our case, this is pretty simple, because the PK is almost always a column called Id
– no compound PKs or even data types outside the int
family. Output:
TableName ColumnName
-------------------------- ----------
dbo.Badges Id
dbo.PostFeedback Id
dbo.PostHistory Id
dbo.PostHistoryTypes Id
dbo.PostLinks Id
dbo.PostNotices Id
dbo.PostNoticeTypes Id
dbo.Posts Id
dbo.PostTypes Id
dbo.ReviewRejectionReasons Id
dbo.ReviewTaskResults Id
dbo.ReviewTaskResultTypes Id
dbo.ReviewTasks Id
dbo.ReviewTaskStates Id
dbo.ReviewTaskTypes Id
dbo.SuggestedEdits Id
dbo.Tags Id
dbo.TagSynonyms Id
dbo.Users Id
Getting the foreign key references
The last piece of the puzzle is linking the tables together via referential integrity so that the relationship diagram knows how to draw the arrows. While dbdiagram.io does support inline shorthand for single-column relationships, I'm just going to opt out of it and always use the longer form, so there is less conditional logic for anyone with compound keys.
This join is a little more intense (and could be much simpler with STRING_AGG()
, but I know not everyone is on 2017+ just yet):
SELECT ConstraintTable = ct.TableName, ConstraintColumns = STUFF((SELECT N',' + c.name FROM sys.columns AS c INNER JOIN sys.foreign_key_columns AS fkc ON fkc.parent_column_id = c.column_id AND fkc.parent_object_id = c.[object_id] WHERE fkc.constraint_object_id = fk.[object_id] ORDER BY fkc.constraint_column_id FOR XML PATH(N''), TYPE).value(N'./text()[1]', N'nvarchar(max)'), 1, 1, N''), ReferencedTable = rt.TableName, ReferencedColumns = STUFF((SELECT N',' + c.name FROM sys.columns AS c INNER JOIN sys.foreign_key_columns AS fkc ON fkc.referenced_column_id = c.column_id AND fkc.referenced_object_id = c.[object_id] WHERE fkc.constraint_object_id = fk.[object_id] ORDER BY fkc.constraint_column_id FOR XML PATH(N''), TYPE).value(N'./text()[1]', N'nvarchar(max)'), 1, 1, N'') INTO #ForeignKeys FROM sys.foreign_keys AS fk INNER JOIN #Tables AS rt -- referenced table ON fk.referenced_object_id = rt.ObjectID INNER JOIN #Tables AS ct -- constraint table ON fk.parent_object_id = ct.ObjectID; SELECT * FROM #ForeignKeys;
Output (truncated):
ConstraintTable ConstraintColumns ReferencedTable ReferencedColumns
---------------- ----------------- -------------------- -----------------
dbo.PostFeedback PostId dbo.Posts Id
dbo.PostFeedback UserId dbo.Users Id
dbo.PostHistory UserId dbo.Users Id
dbo.PostHistory PostId dbo.Posts Id
dbo.PostHistory PostHistoryTypeId dbo.PostHistoryTypes Id
dbo.PostNotices PostNoticeTypeId dbo.PostNoticeTypes Id
dbo.Posts PostTypeId dbo.PostTypes Id
...
Putting it all together
With all this data tucked away in #temp tables, much of the logic to generate the output for dbdiagram.io relatively simple. But first, in order to output data in SSMS should our strings get extremely large, we are going to need to create some objects to help with that – a numbers table and an ordered string splitting function, which I proposed in the tip, "Validate the contents of large dynamic SQL strings."
CREATE TABLE dbo.Numbers ( n int PRIMARY KEY ); ;WITH n(n) AS ( SELECT TOP (1000) number FROM master.dbo.spt_values WHERE [type] = N'P' AND number BETWEEN 1 AND 1000 ORDER BY number ) INSERT dbo.Numbers(n) SELECT TOP (1000000) n = ROW_NUMBER() OVER (ORDER BY n1.n) FROM n AS n1 CROSS JOIN n AS n2 -- CROSS JOIN n AS n3 ORDER BY n; GO CREATE FUNCTION dbo.SplitStrings_Ordered ( @List nvarchar(max), @Delimiter nvarchar(255) ) RETURNS TABLE AS RETURN ( SELECT n, s = SUBSTRING(@List,n, CHARINDEX(@Delimiter,@List+@Delimiter,n)-n) FROM ( SELECT n FROM dbo.Numbers WHERE n <= LEN(@List) AND SUBSTRING(@Delimiter+@List,n,LEN(@Delimiter)) = @Delimiter ) AS y ); GO
With those in place, and with results to grid, we can run this query:
DECLARE @output nvarchar(max) = N''; SELECT @output += char(13) + char(10) + N'Table ' + QUOTENAME(TableName, char(34)) + char(13) + char(10) + N'{' + char(13) + char(10) + STUFF((SELECT char(13) + char(10) + space(4) + c.ColumnName + N' ' + c.DataType + CASE WHEN pk.ColumnName IS NOT NULL THEN N' [pk]' ELSE N'' END FROM #Columns AS c LEFT OUTER JOIN #PrimaryKeys AS pk ON c.TableName = pk.TableName AND c.ColumnName = pk.ColumnName WHERE c.TableName = t.TableName ORDER BY c.Ordinal FOR XML PATH(N''), TYPE).value(N'./text()[1]', N'nvarchar(max)'),2,1,'') + char(13) + char(10) + N'}' + char(13) + char(10) FROM #Tables AS t; SELECT @output += char(13) + char(10) + N'Ref: ' + QUOTENAME(ConstraintTable, char(34)) + N'.(' + ConstraintColumns + N') > ' + QUOTENAME(ReferencedTable, char(34)) + N'.(' + ReferencedColumns + N')' FROM #ForeignKeys; SELECT s FROM dbo.SplitStrings_Ordered(@output, char(13) + char(10)) ORDER BY n;
The output will look something like this (and remember, order doesn't matter):
Now you can select that entire grid and copy and paste the contents (without the column header) into the sidebar on dbdiagram.io, and end up with at least a starting point.
For posterity, a screenshot of the diagram produced on the site, as well as the entire DBML output, are in the appendix. In the meantime, try it out! DBML is a bit gnarly but hopefully the scripts above will help you convert your existing schema to the required syntax with little difficulty.
Appendix
Here is a screenshot and the verbose final output (click to enlarge):
Table "dbo.Posts" { Id int [pk] PostTypeId tinyint CreationDate datetime Score int Body nvarchar(max) OwnerUserId int LastEditorUserId int LastEditDate datetime LastActivityDate datetime LastActivityUserId int ParentId int AcceptedAnswerId int Title nvarchar(250) Tags nvarchar(250) CommunityOwnedDate datetime HistorySummary nvarchar(150) AnswerScore int AnswerCount int CommentCount int FavoriteCount int DeletionDate datetime ClosedDate datetime LockedDate datetime OwnerDisplayName nvarchar(40) LastEditorDisplayName nvarchar(40) BountyAmount int BountyCloses datetime BountyClosed datetime LastOwnerEmailDate datetime ProtectedDate datetime MigrationDate datetime IsAcceptedAnswer bit LastChange timestamp IsAnswered bit HasNotices bit IsFrozen bit QualityScore smallint IsClosedAsDuplicate bit LastActivityTypeId tinyint OwnerIsNewPoster bit HasOneboxes bit LicenseVersion tinyint ReactionCount int Reaction1Count smallint Reaction2Count smallint Reaction3Count smallint Reaction4Count smallint Reaction5Count smallint Reaction6Count smallint Reaction7Count smallint Reaction8Count smallint PostState tinyint } Table "dbo.Badges" { Id int [pk] Class tinyint Name nvarchar(50) Description nvarchar(max) Single bit Secret bit TagBased bit AwardFrequency int BadgeReasonTypeId tinyint IsCode bit AwardedCount int } Table "dbo.PostFeedback" { Id int [pk] PostId int IPAddress varchar(15) UserId int VoteTypeId tinyint CreationDate datetime } Table "dbo.PostHistory" { Id int [pk] PostHistoryTypeId tinyint PostId int RevisionGUID uniqueidentifier CreationDate datetime IPAddress varchar(15) UserId int Comment nvarchar(400) Text nvarchar(max) UserDisplayName nvarchar(40) ApplicationId int } Table "dbo.PostHistoryTypes" { Id tinyint [pk] Name nvarchar(50) Description nvarchar(300) } Table "dbo.PostLinks" { Id int [pk] CreationDate datetime PostId int RelatedPostId int LinkTypeId tinyint Rank tinyint } Table "dbo.PostNotices" { Id int [pk] PostId int PostNoticeTypeId int CreationDate datetime DeletionDate datetime ExpiryDate datetime Body nvarchar(max) OwnerUserId int DeletionUserId int LastChange timestamp } Table "dbo.PostNoticeTypes" { Id int [pk] ClassId tinyint Name nvarchar(80) Body nvarchar(max) IsHidden bit Predefined bit PostNoticeDurationId int } Table "dbo.PostTags" { PostId int TagId int } Table "dbo.PostTypes" { Id tinyint [pk] Name nvarchar(50) Description nvarchar(300) } Table "dbo.ReviewRejectionReasons" { Id tinyint [pk] Name nvarchar(100) Description nvarchar(300) PostTypeId tinyint IsHidden bit } Table "dbo.ReviewTaskResults" { Id int [pk] ReviewTaskId int ReviewTaskResultTypeId tinyint UserId int CreationDate datetime DeletionDate datetime PostCommentId int RejectionReasonId tinyint Comment nvarchar(150) ResultSubtypeFlags int DurationMs int AuditPassed bit ReviewTaskTypeId tinyint } Table "dbo.ReviewTaskResultTypes" { Id tinyint [pk] Name nvarchar(100) Description nvarchar(300) } Table "dbo.ReviewTasks" { Id int [pk] ReviewTaskTypeId tinyint CreationDate datetime DeletionDate datetime ReviewTaskState tinyint IsAudit bit PostId int SuggestedEditId int AuditPositivity bit AuditTitle nvarchar(max) AuditBody nvarchar(max) AuditComment nvarchar(max) ValidResultCount tinyint CreationReason tinyint } Table "dbo.ReviewTaskStates" { Id tinyint [pk] Name nvarchar(50) Description nvarchar(300) } Table "dbo.ReviewTaskTypes" { Id tinyint [pk] Name nvarchar(50) Description nvarchar(300) } Table "dbo.SuggestedEdits" { Id int [pk] PostId int CreationDate datetime ApprovalDate datetime RejectionDate datetime OwnerUserId int OwnerUserIPAddress varchar(15) OwnerUserCookieGuid uniqueidentifier Comment nvarchar(800) Text nvarchar(max) Title nvarchar(250) Tags nvarchar(250) RevisionGUID uniqueidentifier RejectionReasonId tinyint ApplicationId int IsSignificant bit } Table "dbo.Tags" { Id int [pk] Name nvarchar(35) Count int UserId int CreationDate datetime IsModeratorOnly bit IsRequired bit WikiPostId int IsProtected bit ExcerptPostId int CodeLanguage nvarchar(50) LastActivityDate datetime AnswerScoreTop1Percent numeric(18,2) AnswerScoreTop5Percent numeric(18,2) AnswerScoreTop10Percent numeric(18,2) AnswerScoreTop20Percent numeric(18,2) AnswerScoreTop30Percent numeric(18,2) FriendlyName nvarchar(50) IsDiverged bit IsArticleRequired bit } Table "dbo.TagSynonyms" { Id int [pk] SourceTagName nvarchar(35) TargetTagName nvarchar(35) CreationDate datetime OwnerUserId int AutoRenameCount int LastAutoRename datetime Score int ApprovedByUserId int ApprovalDate datetime } Table "dbo.Users" { Id int [pk] UserTypeId tinyint Reputation int Views int CreationDate datetime LastAccessDate datetime HasReplies bit OptInEmail bit LastLoginDate datetime LastEmailDate datetime LastLoginIP nvarchar(15) Email nvarchar(100) DisplayName nvarchar(40) WebsiteUrl nvarchar(200) RealName nvarchar(100) Location nvarchar(100) PreferencesRaw nvarchar(max) TimedPenaltyDate datetime LastDailySiteAccessDate datetime DisplayNameLower nvarchar(40) VerifiedEmail bit DaysVisitedConsecutive int DaysVisitedTotal int LastModifiedDate datetime IsVeteran bit ReputationWeek int ReputationMonth int ReputationQuarter int ReputationYear int ReputationSinceLastCheck int AcceptRateAsked smallint AcceptRateAccepted smallint ReputationToday int AccountId int ProfileImageUrl nvarchar(200) HasAboutMeExcerpt bit SignupStarted bit Title nvarchar(225) TeamId int TeamName nvarchar(150) BronzeBadges int SilverBadges int GoldBadges int JobSearchStatus tinyint Flags bigint QuestionCount int AnswerCount int HasQuestionLists bit ChannelInlineOnboardingFlags bigint } Ref: "dbo.PostFeedback".(PostId) > "dbo.Posts".(Id) Ref: "dbo.Posts".(ParentId) > "dbo.Posts".(Id) Ref: "dbo.Posts".(AcceptedAnswerId) > "dbo.Posts".(Id) Ref: "dbo.PostHistory".(PostId) > "dbo.Posts".(Id) Ref: "dbo.ReviewTasks".(PostId) > "dbo.Posts".(Id) Ref: "dbo.PostHistory".(PostHistoryTypeId) > "dbo.PostHistoryTypes".(Id) Ref: "dbo.PostNotices".(PostNoticeTypeId) > "dbo.PostNoticeTypes".(Id) Ref: "dbo.Posts".(PostTypeId) > "dbo.PostTypes".(Id) Ref: "dbo.ReviewRejectionReasons".(PostTypeId) > "dbo.PostTypes".(Id) Ref: "dbo.ReviewTaskResults".(RejectionReasonId) > "dbo.ReviewRejectionReasons".(Id) Ref: "dbo.ReviewTaskResults".(ReviewTaskResultTypeId) > "dbo.ReviewTaskResultTypes".(Id) Ref: "dbo.ReviewTaskResults".(ReviewTaskId) > "dbo.ReviewTasks".(Id) Ref: "dbo.ReviewTasks".(ReviewTaskState) > "dbo.ReviewTaskStates".(Id) Ref: "dbo.ReviewTasks".(ReviewTaskTypeId) > "dbo.ReviewTaskTypes".(Id) Ref: "dbo.ReviewTasks".(SuggestedEditId) > "dbo.SuggestedEdits".(Id) Ref: "dbo.Posts".(OwnerUserId) > "dbo.Users".(Id) Ref: "dbo.Tags".(UserId) > "dbo.Users".(Id) Ref: "dbo.Posts".(LastEditorUserId) > "dbo.Users".(Id) Ref: "dbo.Posts".(LastActivityUserId) > "dbo.Users".(Id) Ref: "dbo.PostFeedback".(UserId) > "dbo.Users".(Id) Ref: "dbo.PostHistory".(UserId) > "dbo.Users".(Id) Ref: "dbo.ReviewTaskResults".(UserId) > "dbo.Users".(Id)