Bug 15411 - 'format' adds a leading space depending on 'digits'
'format' adds a leading space depending on 'digits'
Status: CLOSED FIXED
Product: R
Classification: Unclassified
Component: Low-level
R 3.0.1
x86_64/x64/amd64 (64-bit) Linux-Debian
: P5 normal
Assigned To: R-core
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-01 19:36 UTC by Mathieu Basille
Modified: 2013-08-27 04:15 UTC (History)
2 users (show)

See Also:


Attachments
Patch v1 (2.90 KB, patch)
2013-08-15 06:13 UTC, Aleksey Vorona
Details | Diff
Patch v2 (3.65 KB, patch)
2013-08-15 22:39 UTC, Aleksey Vorona
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mathieu Basille 2013-08-01 19:36:42 UTC
Playing with the 'digits' argument in 'format' can lead to an additional leading space in some cases:

> format(9995, digits = 3)
[1] " 9995"

as compared to:

> format(9994, digits = 3)
[1] "9994"

First of all, it seems to affect only 'numeric', but not 'integers':

> seq <- 9990:10000
> class(seq)
[1] "integer"
> vapply(seq, function(x) format(x, digits = 3), "")
 [1] "9990"  "9991"  "9992"  "9993"  "9994"  "9995"  "9996"  "9997"  "9998" 
[10] "9999"  "10000"
> class(as.numeric(seq))
[1] "numeric"
> vapply(as.numeric(seq), function(x) format(x, digits = 3), "")
 [1] "9990"  "9991"  "9992"  "9993"  "9994"  " 9995" " 9996" " 9997" " 9998"
[10] " 9999" "10000"

Note the leading space from 9995 to 9999. It also seems to happen only for numbers that would round up to the next power of 10 when printed to the requested number of significant digits. Consider the following example:

> seq <- as.numeric(99990:100000)
> vapply(seq, function(x) format(x, digits = 4), "")
 [1] "99990" "99991" "99992" "99993" "99994" "1e+05" "1e+05" "1e+05" "1e+05"
[10] "1e+05" "1e+05"
> vapply(seq, function(x) format(x, scientific = FALSE, digits = 4), "")
 [1] "99990"  "99991"  "99992"  "99993"  "99994"  " 99995" " 99996" " 99997"
 [9] " 99998" " 99999" "100000"
> print(seq, digits = 4)
 [1]  99990  99991  99992  99993  99994  99995  99996  99997  99998  99999
[11] 100000
> print(seq[6:10], digits = 4)
[1] 1e+05 1e+05 1e+05 1e+05 1e+05

Finally, the function 'format.AsIs' does not show the same behavior:

> format.AsIs(9994, digits = 3)
[1] "9994"
> format.AsIs(9995, digits = 3)
[1] "9995"
> format.default(9994, digits = 3)
[1] "9994"
> format.default(9995, digits = 3)
[1] " 9995"

It seems to affect similarly Windows and Linux (R 3.0.1); see the thread about this problem on the R-help list: 

https://stat.ethz.ch/pipermail/r-help/2013-July/357642.html

I don't know if it is a bug or the expected behavior -- in which case, I could not find the reason of it in the documentation. The use of the argument 'trim = TRUE' does remove the leading space, but seems more of a workaround to me. Note also that I could not find a bug that seems related to that problem here.

Here is my R version:

> version
               _                           
platform       x86_64-pc-linux-gnu         
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          3                           
minor          0.1                         
year           2013                        
month          05                          
day            16                          
svn rev        62743                       
language       R                           
version.string R version 3.0.1 (2013-05-16)
nickname       Good Sport
Comment 1 Aleksey Vorona 2013-08-15 00:58:33 UTC
The reason for this behaviour is in format.c

The formatReal() function uses scientific() to estimate how much space would a number take if printed in a scientific form. It then uses this estimates to calculate how much space the number would take in the fixed form.

The issue is that scientific() takes in the account "digits" parameter, and rounds up 9995. Later on, since " 9995" is still no worse than "1e+04", formatReal() decides to use fixed point format. But it does not bother to recalculate how much space the number would take in the fixed point format. As a result, estimate for "10000" is used.

You can check the bug with this function as well:
> format.info(9995, digits = 3)
[1] 5 0 0
> format.info(9994, digits = 3)
[1] 4 0 0

Easier to check for the value, rather than keep an eye for a loose space.
Comment 2 Mathieu Basille 2013-08-15 02:19:39 UTC
Dear Aleksey, 

Thanks to dig into this. I appreciate your answer, and I have to admit that it somewhat makes sense to me (note: I'm definitely not a R guru). However, would you consider this a bug? I find this behavior very hard to debug. It took me a couple of hours to understand where the problem lied, and then a few messages and another couple of hours to extract a reproducible example, before I could finally fix the problem with the 'trim' argument of format. 

What I mean is that, in real-world scenarios, it can be almost impossible to debug this problem, which is either not documented, or not easily found. Not mentioning the fact that this behavior affects only numeric and not integers...

Mathieu
Comment 3 Aleksey Vorona 2013-08-15 06:07:18 UTC
I am not a guru either. Simply trying to contribute a bit. I am trying to create a patch to fix it. I have a patch, which fixes your particular issue (I changed scientific() to return a flag if it rounded up the number and used that flag in the width computation).

However, I found several other cases, which are not fixed by the patch I have. I will work on the patch a bit more tomorrow.

Just to give you an idea on what I am trying to fix, consider this:

> format(c(1, 9995, 1119996), digits=3);
[1] "      1" "   9995" "1119996"

That is even worse, is not it? I do think it is a bug. And trim=TRUE is only a partial solution. Just because format.info() does not have trim argument.

Also, similar code is used in formatComplex(), which I have not debugged in the similar way yet...
Comment 4 Aleksey Vorona 2013-08-15 06:13:17 UTC
Created attachment 1472 [details]
Patch v1

After re-reading the documentation, I am thinking the follow-up bug I was trying to fix is not a bug.

This is a patch, which fixes the particular issue, reproducible with
> format(9995, digits=3)
Comment 5 Aleksey Vorona 2013-08-15 07:08:59 UTC
A way to reproduce the same bug with complex numbers:

> z = complex(real=9994,imaginary=9995)
> format(z, digits=3)
[1] "9994+ 9995i"
> format.info(z, digits=3)
[1] 4 0 0 5 0 0

while

> z = complex(real=9994,imaginary=9993)
> format(z, digits=3)
[1] "9994+9993i"
> format.info(z, digits=3)
[1] 4 0 0 4 0 0

I will try to update the patch tomorrow.
Comment 6 Mathieu Basille 2013-08-15 15:48:00 UTC
Thanks Aleksey for your contribution. Let me know if I can further help on this bug (notably test the patch), otherwise, I'll just keep an eye on it!

Mathieu.
Comment 7 Aleksey Vorona 2013-08-15 21:27:32 UTC
Another related bug (in an unpatched R):
> format(complex(real=100,imaginary=2), digits=2)
[1] "100+0i"

Notice that the imaginary part is lost completely.

Sorry I keep updating this bug, but I feel the need to record the issues I am passing by, so that the bug will be fixed in full. Eventually.
Comment 8 Aleksey Vorona 2013-08-15 22:39:49 UTC
Created attachment 1473 [details]
Patch v2

I have fixed the patch to work properly with real numbers in all situations I could think of. I did not write any unit tests for the cases, because I can not find unit tests for formatReal function.

Here are two additional important cases, which were not handled properly by the previous patch:

> format(c(94, 100, -95), digits=1);
[1] " 94" "100" "-95"
> format(c(94, 100, 95), digits=1);
[1] " 94" "100" " 95"

The spaces are expected in the last case, because the longest number is "100".

The previous patch would've swallowed this spaces, because one of the numbers has been rounded up.

The latest patch takes this into account and compensates for rounding up only if *the* longest number was rounded up. ("the longest" means the one with the longest representation in the Infinite precision "F" Format).

Also, negative number were not compensated properly by the previous patch.

I am happy with the fix for formatReal().


formatComplex() bug, on the other hand, is different. The problem is that it rounds up the complex number to the requested number of significant digits. But if real and imaginary part are more than `digits` orders of magnitude apart, one of them gets rounded up to zero. All the formatting is applied to that rounded up complex number. Because of that, the fix I have for formatReal() would not work for formatComplex(). I am going to spin up a separate bug about formatComplex().
Comment 9 Duncan Murdoch 2013-08-27 02:47:27 UTC
Fixed in R-devel (and soon in R-patched).
Comment 10 Mathieu Basille 2013-08-27 04:15:05 UTC
(In reply to comment #9)
> Fixed in R-devel (and soon in R-patched).

This is good news to read! Many thanks Duncan and certainly Aleksey to dig into this very peculiar bug, and take the time to fix it.

Mathieu.