Bug 15811 - floating point arithmetic inaccuracy leads to incorrect results from quantile() type=2
Summary: floating point arithmetic inaccuracy leads to incorrect results from quantile...
Status: NEW
Alias: None
Product: R
Classification: Unclassified
Component: Accuracy (show other bugs)
Version: R 3.1.0
Hardware: x86_64/x64/amd64 (64-bit) Windows 64-bit
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2014-05-26 05:14 UTC by Chris Green
Modified: 2015-12-26 21:16 UTC (History)
1 user (show)

See Also:


Attachments
small script to replicate the bug in quantile (459 bytes, text/plain)
2014-05-26 05:14 UTC, Chris Green
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Green 2014-05-26 05:14:40 UTC
Created attachment 1606 [details]
small script to replicate the bug in quantile

Occasionally, the quantile() function with "type=2" will yield inaccurate results due to floating point arithmetic inaccuracy. Here is an example:

> options(digits=16)
> x <- 1:1390
> # what is 1390*0.7
> print(1390*0.7)
[1] 972.9999999999999
> # what is floor(1390*0.7)
> print(floor(1390*0.7))
[1] 972
> # what is the 70th percentile of the data
> print(quantile(x,0.7,type=2))
70% 
973 
> # in exact arithmetic, 0.7*1390 = 973
> # according to the documentation for quantile()
> # j should be 973, g will be 0, and hence
> # the quantile should be 0.5*x(973)+0.5*x(974)
> # where x(i) is the ith order statistic
> print(sum(c(0.5,0.5)*sort(x)[c(973,974)]))
[1] 973.5
> 

The flaw seems to occur in the code for quantile.default() when n*p (called 'nppm' in the code) and j are computed:

if (type <= 3) {
  nppm <- if (type == 3) 
    n * probs - 0.5
  else n * probs
  j <- floor(nppm)
  h <- switch(type, (nppm > j), ((nppm > j) + 1)/2, 
   (nppm != j) | ((j%%2L) == 1L))
}

Since in this case nppm is not exactly 973, h will be set to 1 rather than 1/2. Then later in the code this line will chose the wrong value for the quantile:

qs[h == 1] <- x[j + 3L][h == 1]

whereas if h had been set to 1/2, we would have (correctly) executed the lines

other <- (0 < h) & (h < 1)
if (any(other)) 
  qs[other] <- ((1 - h) * x[j + 2L] + h * x[j + 
     3L])[other]


I realize that FAQ 7.31 (http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f) documents that this situation can happen in R; however, I was still surprised to see that the quantile() function was not written to take this possibility into account. Moreover, I found this error while trying to replicate some SAS results in R (the type=2 quantile matches SAS's default quantile definition). SAS gets the right answer here; R doesn't.

Another example can be made with n=1320 and p=0.7.

Verified in R 3.1.0 patched. sessionInfo:

R version 3.1.0 Patched (2014-05-23 r65732)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.1.0


I can also verify this behavior in R 3.0.3 and R 2.15.2.

I have attached a small script to replicate the bug for your convenience.
Comment 1 Jonathan Fry 2015-12-26 21:16:50 UTC
Inserting the following lines after nppm<-n*probs should fix most instances of this problem:

		rded <- round(nppm)
		nppm <- if (n > 0 && rded/n==probs) rded 
			else nppm