16620
2015-12-02 14:58:20 +0000
Display of summary of numeric vector containing Inf is incorrect
2015-12-13 14:14:46 +0000
1
1
1
Unclassified
R
Analyses
R 3.2.2
x86_64/x64/amd64 (64-bit)
Windows 64-bit
RESOLVED
FIXED
P5
normal
---
0
8666f00c
R-core
murdoch
oldest_to_newest
91274
0
8666f00c
2015-12-02 14:58:20 +0000
The display of a summary of a numeric vector containing Inf does not round correctly.
> options(digits=22)
> summary(c(.1,.2,.3,.4,.5),digits=22)
Min. 1st Qu. Median Mean 3rd Qu.
0.1000000000000000055511 0.2000000000000000111022 0.2999999999999999888978 0.2999999999999999888978 0.4000000000000000222045
Max.
0.5000000000000000000000
> summary(c(.1,.2,.3,.4,Inf),digits=22)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 0 0 Inf 0 Inf
Note that the error is in the display, and not the calculation
> x = summary(c(.1,.2,.3,.4,Inf))
> print(as.vector(x))
[1] 0.1000000000000000055511 0.2000000000000000111022 0.2999999999999999888978 Inf 0.4000000000000000222045
[6] Inf
91275
1
murdoch
2015-12-02 17:44:55 +0000
The reason this happens is that the print method calls zapsmall() on the entries. Compared to Inf, every finite number is small.
I agree the behaviour is ugly. Perhaps only the finite values should be zapped?
On the other hand, summary.default prints ugly results by design (e.g. summary(123456) rounding the entries), so I'm not sure everyone would agree on changing this.
91284
2
1941
8666f00c
2015-12-03 00:34:44 +0000
Created attachment 1941
Excludes infinite values in maximum calculation
91285
3
8666f00c
2015-12-03 00:36:42 +0000
Thanks for the response, knowing about zapsmall() really helps me understand this. A few comments though
Another issue with what I'm bringing up here is that it's not simply a matter of everything being small compared to infinity -- it seems that it simply truncates decimals places but otherwise respects significant digits
> summary(c(11.1,22.2,33.3,44.4,Inf),digits=1)
Min. 1st Qu. Median Mean 3rd Qu. Max.
10 20 30 Inf 40 Inf
> summary(c(11.1,22.2,33.3,44.4,Inf),digits=22)
Min. 1st Qu. Median Mean 3rd Qu. Max.
11 22 33 Inf 44 Inf
As to your example about summary(123456), while we might be able to argue about the ideal behavior here, I would not have submitted it as a bug report because at least there is an option to get it to display correct results.
> summary(123456)
Min. 1st Qu. Median Mean 3rd Qu. Max.
123500 123500 123500 123500 123500 123500
> summary(123456,digits=6)
Min. 1st Qu. Median Mean 3rd Qu. Max.
123456 123456 123456 123456 123456 123456
As it stands, printing summary with an Inf will provide mostly useless results if run on numeric data with an absolute value less than 1 no matter what options you provide. This is made more confusing by the fact that you get expected results if run on numeric data without decimal places.
In an attempt to not be a total leech, I followed through on where you pointed me to and attached a modification to /src/library/base/R/zapsmall.R that would fix this behavior (only added 1 line and modified another). In case it's thought that modification of zapsmall would mess up something else, I'll also make a change to /src/library/base/R/summary.R that only affects this specific issue.
91286
4
8666f00c
2015-12-03 01:05:30 +0000
I can't figure out if there is a way to edit my patch or change old comments, but I wanted to say that I'm having trouble testing/compiling R at the moment. I've done it plenty of times before, but having some trouble on the current machine. It seems like it should be good to me, but I'm too new to know better.
Sorry about that.
91309
5
murdoch
2015-12-13 14:14:46 +0000
I've made a less extreme patch than yours, only changing the summary print and format methods, not zapsmall itself. Will commit after testing.
1941
2015-12-03 00:34:44 +0000
2015-12-03 00:34:44 +0000
Excludes infinite values in maximum calculation
zapsmall.R
text/plain
1045
8666f00c
IyAgRmlsZSBzcmMvbGlicmFyeS9iYXNlL1IvemFwc21hbGwuUgojICBQYXJ0IG9mIHRoZSBSIHBh
Y2thZ2UsIGh0dHBzOi8vd3d3LlItcHJvamVjdC5vcmcKIwojICBDb3B5cmlnaHQgKEMpIDE5OTUt
MjAxMiBUaGUgUiBDb3JlIFRlYW0KIwojICBUaGlzIHByb2dyYW0gaXMgZnJlZSBzb2Z0d2FyZTsg
eW91IGNhbiByZWRpc3RyaWJ1dGUgaXQgYW5kL29yIG1vZGlmeQojICBpdCB1bmRlciB0aGUgdGVy
bXMgb2YgdGhlIEdOVSBHZW5lcmFsIFB1YmxpYyBMaWNlbnNlIGFzIHB1Ymxpc2hlZCBieQojICB0
aGUgRnJlZSBTb2Z0d2FyZSBGb3VuZGF0aW9uOyBlaXRoZXIgdmVyc2lvbiAyIG9mIHRoZSBMaWNl
bnNlLCBvcgojICAoYXQgeW91ciBvcHRpb24pIGFueSBsYXRlciB2ZXJzaW9uLgojCiMgIFRoaXMg
cHJvZ3JhbSBpcyBkaXN0cmlidXRlZCBpbiB0aGUgaG9wZSB0aGF0IGl0IHdpbGwgYmUgdXNlZnVs
LAojICBidXQgV0lUSE9VVCBBTlkgV0FSUkFOVFk7IHdpdGhvdXQgZXZlbiB0aGUgaW1wbGllZCB3
YXJyYW50eSBvZgojICBNRVJDSEFOVEFCSUxJVFkgb3IgRklUTkVTUyBGT1IgQSBQQVJUSUNVTEFS
IFBVUlBPU0UuICBTZWUgdGhlCiMgIEdOVSBHZW5lcmFsIFB1YmxpYyBMaWNlbnNlIGZvciBtb3Jl
IGRldGFpbHMuCiMKIyAgQSBjb3B5IG9mIHRoZSBHTlUgR2VuZXJhbCBQdWJsaWMgTGljZW5zZSBp
cyBhdmFpbGFibGUgYXQKIyAgaHR0cHM6Ly93d3cuUi1wcm9qZWN0Lm9yZy9MaWNlbnNlcy8KCnph
cHNtYWxsIDwtIGZ1bmN0aW9uKHgsIGRpZ2l0cyA9IGdldE9wdGlvbigiZGlnaXRzIikpCnsKICAg
IGlmIChsZW5ndGgoZGlnaXRzKSA9PSAwTCkKICAgICAgICBzdG9wKCJpbnZhbGlkICdkaWdpdHMn
IikKICAgIGlmIChhbGwoaW5hIDwtIGlzLm5hKHgpKSkKICAgICAgICByZXR1cm4oeCkKICAgIGlp
bmYgPC0gaXMuaW5maW5pdGUoeCkKICAgIG14IDwtIG1heChhYnMoeFshaW5hICYgIWlpbmZdKSkK
ICAgIHJvdW5kKHgsIGRpZ2l0cyA9IGlmKG14ID4gMCkgbWF4KDBMLCBkaWdpdHMgLSBsb2cxMCht
eCkpIGVsc2UgZGlnaXRzKQp9Cg==