Created attachment 1606 [details] small script to replicate the bug in quantile Occasionally, the quantile() function with "type=2" will yield inaccurate results due to floating point arithmetic inaccuracy. Here is an example: > options(digits=16) > x <- 1:1390 > # what is 1390*0.7 > print(1390*0.7) [1] 972.9999999999999 > # what is floor(1390*0.7) > print(floor(1390*0.7)) [1] 972 > # what is the 70th percentile of the data > print(quantile(x,0.7,type=2)) 70% 973 > # in exact arithmetic, 0.7*1390 = 973 > # according to the documentation for quantile() > # j should be 973, g will be 0, and hence > # the quantile should be 0.5*x(973)+0.5*x(974) > # where x(i) is the ith order statistic > print(sum(c(0.5,0.5)*sort(x)[c(973,974)])) [1] 973.5 > The flaw seems to occur in the code for quantile.default() when n*p (called 'nppm' in the code) and j are computed: if (type <= 3) { nppm <- if (type == 3) n * probs - 0.5 else n * probs j <- floor(nppm) h <- switch(type, (nppm > j), ((nppm > j) + 1)/2, (nppm != j) | ((j%%2L) == 1L)) } Since in this case nppm is not exactly 973, h will be set to 1 rather than 1/2. Then later in the code this line will chose the wrong value for the quantile: qs[h == 1] <- x[j + 3L][h == 1] whereas if h had been set to 1/2, we would have (correctly) executed the lines other <- (0 < h) & (h < 1) if (any(other)) qs[other] <- ((1 - h) * x[j + 2L] + h * x[j + 3L])[other] I realize that FAQ 7.31 (http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f) documents that this situation can happen in R; however, I was still surprised to see that the quantile() function was not written to take this possibility into account. Moreover, I found this error while trying to replicate some SAS results in R (the type=2 quantile matches SAS's default quantile definition). SAS gets the right answer here; R doesn't. Another example can be made with n=1320 and p=0.7. Verified in R 3.1.0 patched. sessionInfo: R version 3.1.0 Patched (2014-05-23 r65732) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.1.0 I can also verify this behavior in R 3.0.3 and R 2.15.2. I have attached a small script to replicate the bug for your convenience.

Inserting the following lines after nppm<-n*probs should fix most instances of this problem: rded <- round(nppm) nppm <- if (n > 0 && rded/n==probs) rded else nppm