Bug 15746 - quantile over repeated floating point bug
Summary: quantile over repeated floating point bug
Status: ASSIGNED
Alias: None
Product: R
Classification: Unclassified
Component: Low-level (show other bugs)
Version: R 3.0.2
Hardware: x86_64/x64/amd64 (64-bit) All
: P5 major
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2014-04-08 19:37 UTC by brett.stime
Modified: 2016-04-12 10:13 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description brett.stime 2014-04-08 19:37:13 UTC
The quantile() function from the stats package seems to introduce a subtle rounding error. Test case:

#Take eight repetitions of a particular, fairly precise number:
eight.repetitions <- rep(0.000384478262997113, 8)

#Calculate the deciles
deciles <- quantile(eight.repetitions, prob=(0:10)/10)

#De-duplicate the decile values (they should all be the same since we started with eight repetitions of the same value)
unique.deciles <- unique(deciles)

stopifnot(length(unique.deciles) == 1)

> max(unique.deciles) - min(unique.deciles)
[1] 1.084202e-19


There are probably other values that will also trigger the issue. I could probably come up with some of them if needed.

One thing augmenting the importance of the bug is the use of this pattern by the depending gbm package to prevent attempts to model non-varying data (https://github.com/harrysouthworth/gbm/blob/master/R/gbm.fit.R#L90-L100) . I'll try to pass a note about this bug to the maintainers of that package as well.

Thanks!
Comment 1 Martin Maechler 2014-04-10 06:47:55 UTC
Hmm, yes, in one line

diff(range(dec <- quantile(rep(x0 <- 0.000384478262997113, 8), prob=(0:10)/10)))

[1] 1.084202e-19

I'm quite tempted to declare this as a special case of  R FAQ 7.31
(http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f)

but I can see the principled point here that 
one could  require that

        min(x) <= quantile(x, *) <= max(x)

be always fulfilled.


Note (every reader) that your example is *not* at all typical!

The following code replaces your x0  by a randum  U ~ U[0,1]
and for me did not find *one single* case,
(i.e., the loop did not stop) in  4 million trials :

> i <- 0;while(diff(range(dec <- quantile(rep(x0 <- runif(1), 8), prob=(0:10)/10))) == 0){ if((i <- i+1) %% 100 == 0) {cat('',i);if(i %% 1000 == 0) cat("\n")}}

 100 200 300 400 500 600 700 800 900 1000
 ...............
 ...............
Comment 2 Martin Maechler 2014-04-10 07:10:41 UTC
(In addition to comment #1)

What really happens is simply

> x0 <- 0.000384478262997113
> h <- (1:9)/10
> diff(range((1-h)*x0 + h*x0))
[1] 1.084202e-19
> 

and that of course *is* just  R FAQ 7.31

I'm not sure this is something we want to amend;  we *could* ensure

         min(x) <= quantile(x, *) <= max(x)

with a bit of extra code that makes *all* calls of quantile slower than
now... just for the sake of this one very very rare exception ?

Alternatively we just document it .. 
Maybe this needs public discussion on R-devel 

Martin