Adam Cameron's Dev Blog: Floating point accuracy

G'day:
This is not going to be particularly earth-shattering, sorry.

Yesterday someone asked on my favourite source of militant pedantry (Stack Overflow) how to pass a BigDecimal to POI for a spreadsheet cell value. You'll need to take my word for it, as the original question has been deleted (not sure what to make of that). There's a Google-cached version of it here. I dunno how long google caches stuff for though. The gist of the question text is (this is an abridged version of the original:

How to use BigDecimal when I have no implementation that will take it

Here is my code:

[snipped]

The part I changed for it to error is:

thisCell.setCellValue(javacast("bigdecimal", invoicesDue.amount) );
The setCellValue() method has no overload which will take a BigDecimal.

The field is going to be currency which requires a BigDecimal, however XSSFCell's setCellValue() doesn't have an implementation that will take it. The closest approximation only accepts a Double, which is no good to me because in the documentation it says double shouldn't be used for currency. That being the case, what should I do?
[...]

I had a hunt around and came up with this:

To answer the specific question you have asked: the setCellValue() method has no overload which will take a BigDecimal. Which is what the error is telling you.

A follow-up question you might want to ask is "I need to do [x] which requires a BigDecimal, however XSSFCell's setCellValue() doesn't have an implementation that will take it. The closest approximation only accepts a Double, which is no good to me [because of this reason]. That being the case, what should I do?"

I guess you could update this question, or you could create a new question.

And that's about as far as the Q&A got before the question was deleted.

The problem the questioner was having is that they were - rightly - cautious about using a float for the currency amounts, and floats are not very accurate, so if one needs accuracy, using something like a BigDecimal is better as it (seems to ~... I am no expert) preserve accuracy to an arbitrary degree.

POI will only accept a Double for a numeric, and I also read Excel only works in Doubles too, so there are going to be inherent accuracy constraints here.

It then occurred to me that I had no idea how much accuracy a Double actually has, so set out to find out. I knocked this code together:

What this does is to build three strings that represent numbers:

one that's an integer only (ie: all digits are to the left of the decimal point)
one that's a decimal only (ie: all digits are to the right of the decimal point)
one that's both (ie: digits are on either side of the decimal point)

I then convert each of these to a numeric (val() it), and then check to see if the string and the numeric value (treated as a string again) compare favourable.

After that I just tabulate the results, as well as writing them to an XLS file so I can see what happens there as well.

The results are as follows:

Digits	Integer			Decimal			Both
Digits	String	Number	Equal	String	Number	Equal	String	Number	Equal
1	1	1	YES	0.1	0.1	YES	1.0	1	NO
2	12	12	YES	0.12	0.12	YES	1.2	1.2	YES
3	123	123	YES	0.123	0.123	YES	13.2	13.2	YES
4	1234	1234	YES	0.1234	0.1234	YES	13.24	13.24	YES
5	12345	12345	YES	0.12345	0.12345	YES	135.24	135.24	YES
6	123456	123456	YES	0.123456	0.123456	YES	135.246	135.246	YES
7	1234567	1234567	YES	0.1234567	0.1234567	YES	1357.246	1357.246	YES
8	12345678	12345678	YES	0.12345678	0.12345678	YES	1357.2468	1357.2468	YES
9	123456789	123456789	YES	0.123456789	0.123456789	YES	13579.2468	13579.2468	YES
11	1234567891	1234567891	YES	0.1234567891	0.1234567891	YES	135791.2468	135791.2468	YES
12	12345678912	12345678912	YES	0.12345678912	0.12345678912	YES	135791.24682	135791.24682	YES
13	123456789123	123456789123	YES	0.123456789123	0.123456789123	YES	1357913.24682	1357913.24682	YES
14	1234567891234	1.23456789123E+012	NO	0.1234567891234	0.123456789123	NO	1357913.246824	1357913.24682	NO
15	12345678912345	1.23456789123E+013	NO	0.12345678912345	0.123456789123	NO	13579135.246824	13579135.2468	NO
16	123456789123456	1.23456789123E+014	NO	0.123456789123456	0.123456789123	NO	13579135.2468246	13579135.2468	NO
17	1234567891234567	1.23456789123E+015	NO	0.1234567891234567	0.123456789123	NO	135791357.2468246	135791357.247	NO
18	12345678912345678	1.23456789123E+016	NO	0.12345678912345678	0.123456789123	NO	135791357.24682468	135791357.247	NO
19	123456789123456789	1.23456789123E+017	NO	0.123456789123456789	0.123456789123	NO	1357913579.24682468	1357913579.25	NO

(there's an error in this in that I'm implying the "digits" column represents how many digits are in the number, but that's wrong after 9, as I skip 10)

What we see is that a Double has 12 digits of precision, be they on the left, right, or across the decimal point. I "knew" (mostly) the number of digits accuracy it had would be fixed irrespective of where the decimal point was, but wasn't 100% certain, and I certainly didn't know how many digits it was.

The resultant spreadsheet was as follows (converted to a vanilla-looking HTML table):

1	1	1	TRUE	0.1	0.1	TRUE	1	1	TRUE
2	12	12	TRUE	0.12	0.12	TRUE	1.2	1.2	TRUE
3	123	123	TRUE	0.123	0.123	TRUE	13.2	13.2	TRUE
4	1234	1234	TRUE	0.1234	0.1234	TRUE	13.24	13.24	TRUE
5	12345	12345	TRUE	0.12345	0.12345	TRUE	135.24	135.24	TRUE
6	123456	123456	TRUE	0.123456	0.123456	TRUE	135.246	135.246	TRUE
7	1234567	1234567	TRUE	0.1234567	0.1234567	TRUE	1357.246	1357.246	TRUE
8	12345678	12345678	TRUE	0.12345678	0.12345678	TRUE	1357.2468	1357.2468	TRUE
9	123456789	123456789	TRUE	0.123456789	0.123456789	TRUE	13579.2468	13579.2468	TRUE

11	1234567891	1234567891	TRUE	0.123456789	0.123456789	TRUE	135791.2468	135791.2468	TRUE
12	12345678912	12345678912	TRUE	0.123456789	0.123456789	TRUE	135791.2468	135791.2468	TRUE
13	1.23457E+11	1.23457E+11	TRUE	0.123456789	0.123456789	TRUE	1357913.247	1357913.247	TRUE
14	1.23457E+12	1.23457E+12	FALSE	0.123456789	0.123456789	FALSE	1357913.247	1357913.247	FALSE
15	1.23457E+13	1.23457E+13	FALSE	0.123456789	0.123456789	FALSE	13579135.25	13579135.25	FALSE
16	1.23457E+14	1.23457E+14	FALSE	0.123456789	0.123456789	FALSE	13579135.25	13579135.25	FALSE
17	1.23457E+15	1.23457E+15	FALSE	0.123456789	0.123456789	FALSE	135791357.2	135791357.2	FALSE
18	1.23457E+16	1.23457E+16	FALSE	0.123456789	0.123456789	FALSE	135791357.2	135791357.2	FALSE
19	1.23457E+17	1.23457E+17	FALSE	0.123456789	0.123456789	FALSE	1357913579	1357913579	FALSE

This is much the same, except note Excel displays that "13th" row in scientific notation, whilst still seemingly preserving the accuracy (judging by the fact that row says "true" and the next one says "false".

From this, I could not be sure if the data "truncation" was coming from CF, or the limit in Excel is the same as in CF. So I just pasted in the "string" column into a new spreadsheet, and the results were as follows:

1	0.1000000000000000000000	1.0000000000000000000000
12	0.1200000000000000000000	1.2000000000000000000000
123	0.1230000000000000000000	13.2000000000000000000000
1234	0.1234000000000000000000	13.2400000000000000000000
12345	0.1234500000000000000000	135.2400000000000000000000
123456	0.1234560000000000000000	135.2460000000000000000000
1234567	0.1234567000000000000000	1357.2460000000000000000000
12345678	0.1234567800000000000000	1357.2468000000000000000000
123456789	0.1234567890000000000000	13579.2468000000000000000000
1234567891	0.1234567891000000000000	135791.2468000000000000000000
12345678912	0.1234567891200000000000	135791.2468200000000000000000
123456789123	0.1234567891230000000000	1357913.2468200000000000000000
1234567891234	0.1234567891234000000000	1357913.2468240000000000000000
12345678912345	0.1234567891234500000000	13579135.2468240000000000000000
123456789123456	0.1234567891234560000000	13579135.2468246000000000000000
1234567891234560	0.1234567891234560000000	135791357.2468240000000000000000
12345678912345600	0.1234567891234560000000	135791357.2468240000000000000000
123456789123456000	0.1234567891234560000000	1357913579.2468200000000000000000

So Excel runs to 15 digits of accuracy, which is better than CF will manage.

Despite all that, if we go back to the original situation in that the person asking the question was concerned about the accuracy levels for currency amounts: 12 digits will represent a currency value in the tens of billions, accurate to a penny. This will probably be fine for most people's requirements.

There are still potential issues with the inherent flaws with floating point numbers accurately representing decimals, but I think for the purposes of financial calculations, once the data is in Excel, it will be fine.

So my bottom line that I would have given the person had the question lasted long enough for me to update my answer would be "don't worry about it: just use a double".

And at least now I know how accurate CF floating point numbers are. This is one of the good things about helping people with their questions on forums... sometimes one needs to work out what the answer is before answering, so it's a learning exercise for the answerer too!

Time for work...

--
Adam

Tuesday 15 January 2013

Floating point accuracy