January 7, 2011 | SQL Server

The fallacy of preventing plagiarism

If you're not living in a cave, you are probably aware of the blog posts and twitter discussions that resulted from an innocent post by Tom LaRock (blog | twitter) yesterday (original post).  This led to at least the following three posts, and maybe others I haven't noticed yet:

Jonathan Kehayias:
Has the SQL Community Lost its Focus?

Karen Lopez:
It Isn't Stealing, But I Will Respect Your Wishes. That's the Bad News.

And then Tom:
Protecting Blog Content

There seem to be some different opinions about what Tom is perceived to have done wrong.  At the highest level, they are:

  • that he spelled "plagiarized" wrong;
  • that his disclaimer was rude and accusatory; and,
  • that the disclaimer caused the code sample to be broken.

All three may be true, but none is worthy of extremely sharp reprimand, IMHO – though I do admire Jonathan's courage to take Tom to task, and then stand up for his opinions when challenged by Tom and others.  I'll ignore the spelling mistake, because we all make those.  I agreed with Tom's point that, if your content is stolen, you have a right to not be nice about it – however the disclaimer assumes that everyone who presses Ctrl+C is doing so for the purpose of plagiarizing ("guilty until proven innocent").  And for the last point, I feel that often we post code samples without testing them (I am guilty of this, also, as I'm sure many of us are).  If Tom had done this with his original disclaimer in place, he would have realized that it actually disrupts the code from running correctly, and may have avoided Jonathan's wrath altogether.  And when he corrected the code due to Adam's minor criticisms, it would not have contained a typo that I later corrected.

Before the discussion got heated, I did make some comments on Tom's original post – mostly trying to illustrate that there are better examples of accessing WMI from T-SQL than using OLE automation procedures to get performance counters – something you can do much more efficiently without touching T-SQL at all.  I noticed the disclaimer when I tried out the code, but found it more silly and ineffective than offensive.

Let me explain why.  The folks who are stealing your content know they are stealing your content.  If they are paying any attention at all to what they are doing, it is pretty trivial for them to delete your disclaimer before publishing your work and branding it as their own.  Such a tactic may fool the rare individual who copies and pastes your entire post and never looks at it, but it's nothing more than a minor annoyance to the rest (especially since there are very easy ways to permanently disable the features of services like tynt).  And for this, the cost is that you may inadvertently annoy the thousands of other readers who liked your post, wanted to try your code sample, and had no desire to use it for any other purpose than the one you intended: to help them solve a problem.  In this case, the damage was deeper: it spurred a debate that probably seems much more vicious and harmful from the outside than it truly is.  I can assure you that, in spite of how you may interpret some of the comments, there is a lot of respect between all of the folks involved and that there are no hard feelings.

Look, everyone, plagiarism is a real concern, and I am just as annoyed by it as any of you.  But this disclaimer insert methodology is just not going to work, regardless of how nicely or rudely you word your disclaimer, or whether or not it messes up the code snippet.  How long has Brent Ozar (blog | twitter) been using this methodology?  And yet he still finds plagiarists left and right.  It is just not enough of a deterrent for the 0.001% of content thieves reading your blog, but it is pretty annoying for the 99.999% who read your blog for the right reasons.  And in this one case, how much of your collective time and effort has been taken away from the community just to defend your thoughts and opinions about whether your disclaimer is too rude, poorly implemented, or even necessary in the first place?  Personally I think if you are concerned about content theft, then it is probably better to use methods which catch the actual few thieves than accuse everyone of stealing by default.  Copyscape was mentioned as one possible approach to doing this, and I'm sure there are others.