Bug 17281 - print.summaryDefault(): incorrect rounding on some Linux systems
Summary: print.summaryDefault(): incorrect rounding on some Linux systems
Status: NEW
Alias: None
Product: R
Classification: Unclassified
Component: Misc (show other bugs)
Version: 3.4.0
Hardware: x86_64/x64/amd64 (64-bit) Windows 32-bit
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2017-05-30 08:04 UTC by Arne Henningsen
Modified: 2017-05-31 08:48 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Arne Henningsen 2017-05-30 08:04:03 UTC
On some (not all) Linux systems, print.summaryDefault() incorrectly rounds the mean value and/or the median value (and perhaps also other values).

Example:
R> a <- 1234568.01 + c(0:1)


Incorrect output on my Ubuntu 16.04 LTS 64 bit computer (details see below)
R> summary(a)
  Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
1234568 1234568 1234568 1234568 1234569 1234569


Correct output on other computers (e.g. Windows, Dirk Eddelbuettels Ubuntu 17.04 64 bit computer):
R> summary(a)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
1234568 1234568 1234569 1234569 1234569 1234569


The following commands give the correct output on all (?) computers:
R> print(summary(a), digits=9)
    Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
1234568.0 1234568.3 1234568.5 1234568.5 1234568.8 1234569.0
R> summary(a)["Mean"]
  Mean
1234569
R> mean(a)
[1] 1234569
R> print(mean(a), digits=9)
[1] 1234568.51


see also:
https://stat.ethz.ch/pipermail/r-devel/2017-May/074351.html


My computer:
R> Sys.info()
                                        sysname
                                        "Linux"
                                        release
                      "4.5.0-040500rc6-generic"
                                        version
"#201602281230 SMP Sun Feb 28 17:33:02 UTC 2016"
                                       nodename
                             "arne-HP-EB-8560w"
                                        machine
                                       "x86_64"
                                          login
                                         "arne"
                                           user
                                         "arne"
                                 effective_user
                                         "arne"
R> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0

locale:
[1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
[3] LC_TIME=da_DK.UTF-8        LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=da_DK.UTF-8    LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
[9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=da_DK.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.0 tools_3.4.0
Comment 1 Peter Dalgaard 2017-05-30 08:56:06 UTC
This is due to the use of zapsmall() in format.summaryDefault(); i.e., double rounding.  

> print(zapsmall(1234568.51),digits=10)
[1] 1234568.5
> print(round(zapsmall(1234568.51)),digits=10)
[1] 1234568
> print(round(1234568.51),digits=10)
[1] 1234569

The 2nd one being due to round-to-even, apparently not used on some systems.

I suspect this is not easy to fix without problems popping up elsewhere, since the zapsmall() is likely there for a reason.
Comment 2 Arne Henningsen 2017-05-30 11:44:49 UTC
Yes, Peter, you are right. zapsmall() likely causes the difference:

I get (Ubuntu 16.04.02 LTS 64 bit, for details see my previous message):
R> zapsmall(1234568.51)
[1] 1234568

While my colleague gets (Windows 7, 64 bit, for details see below):
R> zapsmall(1234568.51)
[1] 1234569

Why do we get different outputs?

Which one is correct/expected? (I guess that most people would expect 1234569.)

Is there anything that we can do to get the same outputs on our computers?



My colleague's computer:
> Sys.info()
       sysname        release        version       nodename 
     "Windows"        "7 x64"   "build 7600"      "USER-PC" 
       machine          login           user effective_user 
      "x86-64"         "user"         "user"         "user" 


> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7600)

Matrix products: default

locale:
[1] LC_COLLATE=C                          
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=Eng
lish_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] foreign_0.8-67

loaded via a namespace (and not attached):
[1] compiler_3.4.0 tools_3.4.0
Comment 3 Tomas Kalibera 2017-05-30 18:11:50 UTC
The difference is caused by different rounding in snprintf on Linux and Windows. In a recent GLIBC the default is rounding to nearest (aka round-half-to-even) and this is honored also for printf/snprintf, which is the case also on Ubuntu 17.04. Round-half-to-even is also the default by IEEE 754.

R uses snprintf to round when printing floating point numbers and the reported example can be narrowed down to

> x <- 1234568.5 ; x
[1] 1234568  <=== Ubuntu 17.04 (glibc 2.24, round-half-to-even)
[1] 1234569  <=== Windows 10

Even in Windows one can select a rounding mode using fesetround and round-half-to-even is also the default, but this setting has no impact on printf (and a number of other functions) which always round half-away-from-zero. One can use e.g. rint to round values following the selected rounding mode.
Comment 4 Arne Henningsen 2017-05-30 19:10:01 UTC
Thanks for the explanation, Tomas!

I understand your example with 1234568.5, which has the same distance to 1234568 and to 1234569 but my example used the number 1234568.51, which is closer to 1234569 than to 1234568 and, thus, should be rounded to 1234569. Or do the functions do repeated rounding?

> round(round(1234568.51,1))
[1] 1234568
Comment 5 Tomas Kalibera 2017-05-30 20:03:08 UTC
Yes, the first rounding is in zapsmall and the second when printing. On both Linux and Windows zapsmall(1234568.51) returns 1234568.5 and this number is rounded differently on Linux and Windows when printed.
Comment 6 Arne Henningsen 2017-05-31 08:48:29 UTC
Thanks for your additional explanations, Tomas!

So there are two "issues" (which both could perhaps be called "bugs"):

a) repeated rounding in print.summaryDefault()

b) different rounding of .5 on different computers

I think that it would be great if both of these "issues" could be fixed.