Six reasons you should be nervous about parallelism
Don't get me wrong; parallel processing is great. And with SQL Server in particular, there are many benefits to taking advantage of the power of multiple CPUs. However, we have all had cases where what looked like parallelism was actually hurting us (you remember turning off hyperthreading on SQL Server boxes, right?), and cases where long CXPACKET waits on individual threads completely defeated the purpose of speeding up the query with parallelism in the first place … often making the query take longer than it would with a MAXDOP 1 query hint, or in some cases run away completely.
These are loosely defined scenarios that are difficult to reproduce and may depend on other environmental factors at the time of the query. For example, with the runaway case, I have simply stopped the query and re-issued it, and this time it came back quickly.
However, there are other cases that are definitely potential problems, but not as easy to determine the cause. Some of these issues have been deferred until the next version of SQL Server (tentatively 2011), or have been closed as fixed, but not specific enough about where/when it was fixed in both SQL Server 2005 and SQL Server 2008.
STATUS UNKNOWN – CONNECT ITEM DISAPPEARED
User "dave_dave" reported a problem that I had seen before, where according to the Microsoft engineer, "Yes, it's a bug – whenever a parallel query plan is generated @@IDENTITY and SCOPE_IDENTITY() are not being updated consistently and can't be relied upon." This problem can result in incorrect data and even data loss. The workarounds include using OPTION (MAXDOP 1) on individual queries or attempting to force serial plans system-wide via sp_configure. According to the Connect item, at the present time, there is no plan to fix this issue for SQL Server 2008.
2. #382323, "dm_exec_requests does not show blocking SPID"
FIXED SOME TIME AFTER SQL 2008 (?)
In this case, fellow MVP Andrew Kelly demonstrates that parallelism can prevent blockers from showing up in the sys.dm_exec_requests DMV. This problem exists in both SQL Server 2005 and in SQL Server 2008, but is considered too risky to fix in a service pack or cumulative update, so we will not see a fix for this until SQL Server 2011 at the earliest. As Adam points out (and I missed this in the recent discussion), the workaround in the meantime is to stop trusting sys.dm_exec_requests for finding blocking/blocked processes, and instead use the sys.dm_os_waiting_tasks DMV. (Thanks Adam!)
FIXED IN 2008
This problem, reported by "Mike1D", shows that when using READ_COMMITTED_SNAPSHOT, a session using parallelism may see uncommitted rows from another session. Effectively this makes it possible for dirty reads to occur in snapshot isolation, which should only be possible in READ_UNCOMMITTED or when using the WITH (NOLOCK) table hint. The problem does not occur when the reading session uses OPTION (MAXDOP 1). The fix is checked into the next cumulative update for both SQL Server 2005 SP2 (CU#13) and SP3 (CU#3), however there is no mention of whether this issue affects SQL Server 2008 and, if so, when a fix will be released.
CLOSED – NOT REPRODUCIBLE
User "lasa1" reported this issue, where a blocking issue due to parallelism can lead to unlogged RING_BUFFER_EXCEPTION errors (severity 25) and undetected deadlocks for queries run under the context of sp_getapplock (more info and plenty of gory details here). The issue is completely resolved by using OPTION (MAXDOP 1) on the affected queries. The issue is currently in the "Resolved (Not Reproducible)" state.
FIXED IN 2008
Meiyalagan Balasubramanian reported this issue, where running a PIVOT query that uses parallelism against a very large table can lead to very long runtimes and extremely high tempdb growth. The bug is closed as fixed in SQL Server 2008, however the user reported it against SQL Server 2005 as well, and for this version there does not seem to be an answer, other than the now-all-too-familiar workaround of adding the OPTION (MAXDOP 1) query hint.
CLOSED – WON'T FIX (ISOLATED; EASY WORKAROUND)
User "hova" reported this problem, where a CLR string contatenation example from Microsoft fails with an EXCEPTION_ACCESS_VIOLATION error if the table is large enough to induce parallelism. As a side bit of humor, hova asked for a free upgrade to SQL Server 2008, since the problem is fixed there; the request was denied, of course. Currently the workaround of using OPTION (MAXDOP 1) in SQL Server 2005 has been deemed "good enough" for those shops that haven't completely migrated to SQL Server 2008 (a.k.a. "just about everybody").
Now hopefully, these issues don't affect you, but you should check each scenario just in case. If any does, you should address the issue and validate / vote / comment so that Microsoft will have more motivation to treat these bugs seriously. If you have others in this category, please point them out!And as a fellow MVP rightly pointed out, just because a parallelism bug isn't affecting you right now, who is to say what will happen tomorrow … you may suddenly get a parallel plan due to a hotfix, service pack, migrating to different hardware, altering table structure, or even simple data or statistics changes that hit some yet-unknown threshold.