Bugzilla – Bug 15411

'format' adds a leading space depending on 'digits'

Last modified: 2013-08-27 04:15:05 UTC

Playing with the 'digits' argument in 'format' can lead to an additional leading space in some cases: > format(9995, digits = 3) [1] " 9995" as compared to: > format(9994, digits = 3) [1] "9994" First of all, it seems to affect only 'numeric', but not 'integers': > seq <- 9990:10000 > class(seq) [1] "integer" > vapply(seq, function(x) format(x, digits = 3), "") [1] "9990" "9991" "9992" "9993" "9994" "9995" "9996" "9997" "9998" [10] "9999" "10000" > class(as.numeric(seq)) [1] "numeric" > vapply(as.numeric(seq), function(x) format(x, digits = 3), "") [1] "9990" "9991" "9992" "9993" "9994" " 9995" " 9996" " 9997" " 9998" [10] " 9999" "10000" Note the leading space from 9995 to 9999. It also seems to happen only for numbers that would round up to the next power of 10 when printed to the requested number of significant digits. Consider the following example: > seq <- as.numeric(99990:100000) > vapply(seq, function(x) format(x, digits = 4), "") [1] "99990" "99991" "99992" "99993" "99994" "1e+05" "1e+05" "1e+05" "1e+05" [10] "1e+05" "1e+05" > vapply(seq, function(x) format(x, scientific = FALSE, digits = 4), "") [1] "99990" "99991" "99992" "99993" "99994" " 99995" " 99996" " 99997" [9] " 99998" " 99999" "100000" > print(seq, digits = 4) [1] 99990 99991 99992 99993 99994 99995 99996 99997 99998 99999 [11] 100000 > print(seq[6:10], digits = 4) [1] 1e+05 1e+05 1e+05 1e+05 1e+05 Finally, the function 'format.AsIs' does not show the same behavior: > format.AsIs(9994, digits = 3) [1] "9994" > format.AsIs(9995, digits = 3) [1] "9995" > format.default(9994, digits = 3) [1] "9994" > format.default(9995, digits = 3) [1] " 9995" It seems to affect similarly Windows and Linux (R 3.0.1); see the thread about this problem on the R-help list: https://stat.ethz.ch/pipermail/r-help/2013-July/357642.html I don't know if it is a bug or the expected behavior -- in which case, I could not find the reason of it in the documentation. The use of the argument 'trim = TRUE' does remove the leading space, but seems more of a workaround to me. Note also that I could not find a bug that seems related to that problem here. Here is my R version: > version _ platform x86_64-pc-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 3 minor 0.1 year 2013 month 05 day 16 svn rev 62743 language R version.string R version 3.0.1 (2013-05-16) nickname Good Sport

The reason for this behaviour is in format.c The formatReal() function uses scientific() to estimate how much space would a number take if printed in a scientific form. It then uses this estimates to calculate how much space the number would take in the fixed form. The issue is that scientific() takes in the account "digits" parameter, and rounds up 9995. Later on, since " 9995" is still no worse than "1e+04", formatReal() decides to use fixed point format. But it does not bother to recalculate how much space the number would take in the fixed point format. As a result, estimate for "10000" is used. You can check the bug with this function as well: > format.info(9995, digits = 3) [1] 5 0 0 > format.info(9994, digits = 3) [1] 4 0 0 Easier to check for the value, rather than keep an eye for a loose space.

Dear Aleksey, Thanks to dig into this. I appreciate your answer, and I have to admit that it somewhat makes sense to me (note: I'm definitely not a R guru). However, would you consider this a bug? I find this behavior very hard to debug. It took me a couple of hours to understand where the problem lied, and then a few messages and another couple of hours to extract a reproducible example, before I could finally fix the problem with the 'trim' argument of format. What I mean is that, in real-world scenarios, it can be almost impossible to debug this problem, which is either not documented, or not easily found. Not mentioning the fact that this behavior affects only numeric and not integers... Mathieu

I am not a guru either. Simply trying to contribute a bit. I am trying to create a patch to fix it. I have a patch, which fixes your particular issue (I changed scientific() to return a flag if it rounded up the number and used that flag in the width computation). However, I found several other cases, which are not fixed by the patch I have. I will work on the patch a bit more tomorrow. Just to give you an idea on what I am trying to fix, consider this: > format(c(1, 9995, 1119996), digits=3); [1] " 1" " 9995" "1119996" That is even worse, is not it? I do think it is a bug. And trim=TRUE is only a partial solution. Just because format.info() does not have trim argument. Also, similar code is used in formatComplex(), which I have not debugged in the similar way yet...

Created attachment 1472 [details] Patch v1 After re-reading the documentation, I am thinking the follow-up bug I was trying to fix is not a bug. This is a patch, which fixes the particular issue, reproducible with > format(9995, digits=3)

A way to reproduce the same bug with complex numbers: > z = complex(real=9994,imaginary=9995) > format(z, digits=3) [1] "9994+ 9995i" > format.info(z, digits=3) [1] 4 0 0 5 0 0 while > z = complex(real=9994,imaginary=9993) > format(z, digits=3) [1] "9994+9993i" > format.info(z, digits=3) [1] 4 0 0 4 0 0 I will try to update the patch tomorrow.

Thanks Aleksey for your contribution. Let me know if I can further help on this bug (notably test the patch), otherwise, I'll just keep an eye on it! Mathieu.

Another related bug (in an unpatched R): > format(complex(real=100,imaginary=2), digits=2) [1] "100+0i" Notice that the imaginary part is lost completely. Sorry I keep updating this bug, but I feel the need to record the issues I am passing by, so that the bug will be fixed in full. Eventually.

Created attachment 1473 [details] Patch v2 I have fixed the patch to work properly with real numbers in all situations I could think of. I did not write any unit tests for the cases, because I can not find unit tests for formatReal function. Here are two additional important cases, which were not handled properly by the previous patch: > format(c(94, 100, -95), digits=1); [1] " 94" "100" "-95" > format(c(94, 100, 95), digits=1); [1] " 94" "100" " 95" The spaces are expected in the last case, because the longest number is "100". The previous patch would've swallowed this spaces, because one of the numbers has been rounded up. The latest patch takes this into account and compensates for rounding up only if *the* longest number was rounded up. ("the longest" means the one with the longest representation in the Infinite precision "F" Format). Also, negative number were not compensated properly by the previous patch. I am happy with the fix for formatReal(). formatComplex() bug, on the other hand, is different. The problem is that it rounds up the complex number to the requested number of significant digits. But if real and imaginary part are more than `digits` orders of magnitude apart, one of them gets rounded up to zero. All the formatting is applied to that rounded up complex number. Because of that, the fix I have for formatReal() would not work for formatComplex(). I am going to spin up a separate bug about formatComplex().

Fixed in R-devel (and soon in R-patched).

(In reply to comment #9) > Fixed in R-devel (and soon in R-patched). This is good news to read! Many thanks Duncan and certainly Aleksey to dig into this very peculiar bug, and take the time to fix it. Mathieu.