|Summary:||Display of summary of numeric vector containing Inf is incorrect|
|Attachments:||Excludes infinite values in maximum calculation|
Description 8666f00c 2015-12-02 14:58:20 UTC
The display of a summary of a numeric vector containing Inf does not round correctly. > options(digits=22) > summary(c(.1,.2,.3,.4,.5),digits=22) Min. 1st Qu. Median Mean 3rd Qu. 0.1000000000000000055511 0.2000000000000000111022 0.2999999999999999888978 0.2999999999999999888978 0.4000000000000000222045 Max. 0.5000000000000000000000 > summary(c(.1,.2,.3,.4,Inf),digits=22) Min. 1st Qu. Median Mean 3rd Qu. Max. 0 0 0 Inf 0 Inf Note that the error is in the display, and not the calculation > x = summary(c(.1,.2,.3,.4,Inf)) > print(as.vector(x))  0.1000000000000000055511 0.2000000000000000111022 0.2999999999999999888978 Inf 0.4000000000000000222045  Inf
Comment 1 Duncan Murdoch 2015-12-02 17:44:55 UTC
The reason this happens is that the print method calls zapsmall() on the entries. Compared to Inf, every finite number is small. I agree the behaviour is ugly. Perhaps only the finite values should be zapped? On the other hand, summary.default prints ugly results by design (e.g. summary(123456) rounding the entries), so I'm not sure everyone would agree on changing this.
Comment 2 8666f00c 2015-12-03 00:34:44 UTC
Created attachment 1941 [details] Excludes infinite values in maximum calculation
Comment 3 8666f00c 2015-12-03 00:36:42 UTC
Thanks for the response, knowing about zapsmall() really helps me understand this. A few comments though Another issue with what I'm bringing up here is that it's not simply a matter of everything being small compared to infinity -- it seems that it simply truncates decimals places but otherwise respects significant digits > summary(c(11.1,22.2,33.3,44.4,Inf),digits=1) Min. 1st Qu. Median Mean 3rd Qu. Max. 10 20 30 Inf 40 Inf > summary(c(11.1,22.2,33.3,44.4,Inf),digits=22) Min. 1st Qu. Median Mean 3rd Qu. Max. 11 22 33 Inf 44 Inf As to your example about summary(123456), while we might be able to argue about the ideal behavior here, I would not have submitted it as a bug report because at least there is an option to get it to display correct results. > summary(123456) Min. 1st Qu. Median Mean 3rd Qu. Max. 123500 123500 123500 123500 123500 123500 > summary(123456,digits=6) Min. 1st Qu. Median Mean 3rd Qu. Max. 123456 123456 123456 123456 123456 123456 As it stands, printing summary with an Inf will provide mostly useless results if run on numeric data with an absolute value less than 1 no matter what options you provide. This is made more confusing by the fact that you get expected results if run on numeric data without decimal places. In an attempt to not be a total leech, I followed through on where you pointed me to and attached a modification to /src/library/base/R/zapsmall.R that would fix this behavior (only added 1 line and modified another). In case it's thought that modification of zapsmall would mess up something else, I'll also make a change to /src/library/base/R/summary.R that only affects this specific issue.
Comment 4 8666f00c 2015-12-03 01:05:30 UTC
I can't figure out if there is a way to edit my patch or change old comments, but I wanted to say that I'm having trouble testing/compiling R at the moment. I've done it plenty of times before, but having some trouble on the current machine. It seems like it should be good to me, but I'm too new to know better. Sorry about that.
Comment 5 Duncan Murdoch 2015-12-13 14:14:46 UTC
I've made a less extreme patch than yours, only changing the summary print and format methods, not zapsmall itself. Will commit after testing.