Performance / Storage Comparisons : MONEY vs. DECIMAL
As you may already know, I am not a big fan of the MONEY data type, because of its inflexibility, accuracy problems, and the expectations the name of the type evokes in new users. If I had my way, MONEY would become a synonym for DECIMAL in SQL Server 2008 (allowing for specific precision and scale), and be removed in the following version. Of course there are people out there that either don't feel as strongly as I do, or feel the opposite — that MONEY should be here to stay.
After a recent discussion about the pros and cons of using MONEY vs DECIMAL for storing currency (and even non-currency) data, curiosity got the better of me. One of the arguments for the MONEY data type was performance. No supporting data was provided, of course. So I decided to conduct some tests myself. I wanted to measure how MONEY compared to DECIMAL data types both in their original implementations and using new technologies available in SQL Server 2005 (VARDECIMAL storage format) and SQL Server 2008 (page and row compression).
The person arguing for MONEY showed the space used by MONEY compared to the same information stored in a DECIMAL(20,4) column. Not all that surprisingly, the latter was slightly larger. But is that the whole story? No, for two reasons. One is that the *performance* of these choices was not compared, and the other is that DECIMAL(20,4) is not a very realistic requirement for storing currency data. Unless you are storing the pricing information for luxury yachts or aircraft carriers, in which case you can probably drop the decimal places altogether and use INT or BIGINT. For the rest of us, a better choice would be DECIMAL(8,2) or DECIMAL(10,2).
I created 11 databases, each with a single table containing a single column:
Keeping these tables in separate databases allowed for isolation of several factors and measurements, including database level settings, log growth, data file size and even backup time.
Next, I populated the table in each database with approximately 390,000 rows of varying length decimal data (based on calculations against object_id from a triple cross join of sys.objects on itself), and measured the insert times and storage requirements. Here is how they stacked up:
Then I performed an update that affected all rows, making sure that roughly 20% of the rows would have a significant change in significant digits (e.g. by adding 1,000,000). Here is the performance comparison, as well as how the data and log were affected:
Next I compared the time and cost of performing a SELECT COUNT(*) with a WHERE clause against the column:
And finally, I performed native and compressed backups of each database, comparing execution time and output size:
The following chart summarizes everything performance-wise. The orange with the dot means that database performed the best; the x on the red background means it performed the worst.
And this chart summarizes all things size-wise:
Of course, there is nothing overly definitive here. DECIMAL(10,2) with row compression enabled got the most "first place" metrics, while MONEY with no compression and VARDECIMAL types never finished near the top of the class. But you can judge from the results for yourself, and make decisions based on your own priorities.
Alex asked for some metrics on more complex operations like SUM(). I ran some tests using both SUM() and AVG(). The logical reads of course are the same as all the others, and the scan costs remained unchanged as well. But as for the observed performance of both calculations (compute scalar cost was identical for both operations), see the following chart:
Again, this was an average over 10 tests. Note that I did not append these results to the summary charts I delivered above. And sorry about the slightly different-looking screen shot.
Please take into account that these tests were performed on a dual-core laptop computer, and the database files were created on external storage. There are many other tests I could have run to glean more performance and storage data, against a much larger data set, and using production-class hardware, but for the scope of this post I just wanted to glance at the most basic operations. I repeated these tests 10 times from start to finish, so each metric taken is an average of 10 tests (in a lot of cases they were the same every time).
This was a very tedious exercise to perform. If you would like to perform your own tests, with your own sample data, and on your own hardware, I will more than gladly share my scripts. I'd post them here right now, but they are scattered and not distribution-friendly at this point.
Sorry, those are hosted at my old personal website, and I let the domain lapse by mistake, and some squatter scooped it up. I'll try to locate the images in my backups.
I can't see your charts. I got message:
Error. Page cannot be displayed. Please contact your service provider for more details ;/
@Brent I hope you can extrapolate that I didn't literally mean that yachts and aircraft carriers specifically were the only two possible exceptions. I don't agree that you should just blindly use DECIMAL(18,6) everywhere without considering your actual business requirements and compatibility issues. YMMV.
alexeia is right, after talking to accountants and working for banks for a number of years, I find SQL Servers DECIMAL(18,6) to be compliant with almost all accounting requirements while taking up the lowest amount of storage possible. Aaron you made an assumption when stating "Unless you are storing the pricing information for luxury yachts or aircraft carriers", which you probably shouldn't 🙂 (e.g. storing yen currency trades in financial transactions, which can easily be in billions of USD and require a precision of 6 DP!)
@alexeia my point was not to use money / smallmoney. Since you insist that at least 6 digits must be stored after the decimal, that is even more reason to not use money or smallmoney, since they only support 4 digits.
Why technologists are discussing storage and processing of the monetary data? First the input of the professional accountants is needed (I am not the one). Second, there exist "GAAP Compliance" documents for storage of the monetary data types (see wikipedia GAAP compliance). They are very clear: in any system, where currrency conversion is done there MUST BE AT LEAST 6 DIGITS STORED after the decimal -otherwise the batch will not reconcile with the third party after the currency conversions(s) done.
Only AFTER the mandatory requirements are satisfied, we can start talk about technical/technological considerations and preferences and speed of processing.
Accuracy problem are also due to 'operations on money result in money' and should be an ez fix.
Techvslife: not sure about the decimal stuff, I haven't done enough empirical tests. But certainly the variable length stuff can hurt. The optimizer will assume that the average row size is (fixed data) + (50% of variable data max size). This can cause sorts etc. to happen in tempdb when they could have fit in memory.
Very grateful for the tests; I second what Charles says above though,
the key result would be comparing precision of _9_ against precision 10, because 9 is in the first storage tier for decimal (5 bytes), and 10 is already in the second tier (9 bytes).
Other questions: does scale make a difference in performance if precision constant (I expect not). Do the min and max points of precision in each decimal storage size tier differ in performance?
As an aside, we know that even when storage is unchanged (say nVARchar(4000) is defined for a column but all values in that column actually have a max of 5 bytes), the field definition can still make things slower because of the (pessimistic) expectations created by the optimizer when it sees (4000) in the definition.
Jeff, that is a trackback, and you'll see it on a lot of blogs. In the following blog post:
I referenced this blog post. Sqlblog.com is set up so that this trackback is referenced as a comment on the linked post. It's not obvious but if you click on my name on that comment it will take you to the referencing post.
A lot of blogs work this way; I'm surprised you haven't come across it before.
Aaron Bertrand said:
I see a lot of people trash-talking the EAV (entity-attribute-value) model. If you want the full story
The post above seems incomplete…. is there a link for the "full story"?
Very nice write-up and thread. I visited the thread that Aaron posted about the inaccuracy of money (http://tinyurl.com/59s2dn), and as one of the posters on that thread mentioned, it is somewhat nonsensical to divide a money value by another money value, as is done in the contrived example. The suggested solution is to use money / decimal. I wonder if the same kind of rounding error occurs with money divided by int?
Thats very interesting and challenging excercise. When designing database, is any one really going beyond like what you did, to select the data types?
The kind of report that you had produced is great!
Thanks — Vj
I know its an older post, but it is quite interesting. As I try to decide what types to use, and how flexable they all are. Thanks for the writeup.
It would be interesting to see this test done with Decimal(9,2) rather than Decimal(10,2). According to the MSDN library http://msdn.microsoft.com/en-us/library/ms187746(SQL.90).aspx a precision of 9 uses only five bytes of storage whereas a precision of 10 uses 9 bytes of storage.
If you can drop that one digit of precision it seems reasonable to expect a speed boost…
the above results are accurate for in-memory ops, which is why i said decimal is not cheap, like i said in my execution plan blog, the plan cost is for determining the table access strategy, not the additional logic cost, consider the plan and true cost for SELECT SUM(col1) FROM xx, against SELECT SUM(col1), SUM(col2) FROM xx and so on. the only thing that is different in 30GB the disk access patterns, which i will write about later
Sure. I am working on a much larger example (~30GB databases) and would welcome any input you have on calculations that should be performed… I think the small data size (and therefore sub-second execution times) does not accurately reflect what would happen in a real-world scenario.
This is indeed surprising. Thank you again for writing it up!
Thanks Alex, I have added a chart showing the performance of the following queries:
SELECT SUM(i) FROM dbo.Test WHERE i BETWEEN 200000 AND 370000;
SELECT AVG(i) FROM dbo.Test WHERE i BETWEEN 200000 AND 370000;
The result is a little surprising to me, but I can't dispute what I observed. 🙂
This is very interesting, thanks! Would you mind comparing performance of more complex calculations involving these numbers, such as SUM().
Joe, at least in these tests in both SELECT and WHERE, CPU performance according to actual plan was identical across all the datatypes used.
the performance overhead for decimal is quite substantial, 500-1000 additional cpu cycles per row & column for the conversion.
I believe money is just bigint offset by 4 decimal places
Sure, but some operations on money result in *inaccurate* results. See the following URL for an example:
I agree that the rules for operations on decimal values are not very well known and sometimes downright unintuitive, but the result is obviously not as desired because the difference is visible in the result (wrong precision / scale). With money these results can easily go unnoticed.
One of the reasons to prefer money is that operations on money result in money. Sometimes operations on decimal(20,4) results in decimal(38,4) or decimal(38,8) or something even more surprising.