Created attachment 1125 [details] data (list) that can be used to reproduce the problem This bug refers to the approxfun-function from the "stats"-package When applied to certain data-types, the approxfun-function ignores the yleft and yright-arguments and returns values outside the yleft-yright range. In addition, the output varies between R sessions, even when applied to the exact same dataset. The problem has been discussed on the R-help forum (see https://stat.ethz.ch/pipermail/r-help/2010-August/250665.html or http://r.789695.n4.nabble.com/approxfun-problems-yleft-and-yright-ignored-td2338313.html). I am still not quite sure what exactly the problem is, but applying the round()-function to the input and x-values gets around the problem: even when the number of digits remain unchanged from the initial values. Greg Snow (see the discussion thread) has pointed out, that it might be a precision problem. However, I found it hard to simulate the particular data structure that causes the problem. Best, Sam Here some example code, using the data attached to this bug-report (also downloadable under http://bioinf.gen.tcd.ie/approx.data.Rdata: > ### load the data: a list called approx.data > load(file="approx.data.Rdata") > ### contains the slots "x", "y", "input" > names(approx.data) [1] "x" "y" "input" > ### with y ranging between 0 and 1 > range(approx.data$y) [1] 0 1 > ### compare ranges of x and input-x values (the latter is a small subset of 500 data points): > range(approx.data$x) [1] 3.098444 7.268812 > range(approx.data$input) [1] 3.329408 13.026700 > ### generate the interpolation function (warning message benign) > interp <- approxfun(approx.data$x, approx.data$y, yleft=1, yright=0, rule=2) Warning message: In approxfun(approx.data$x, approx.data$y, yleft = 1, yright = 0, : collapsing to unique 'x' values > ### apply to input-values > y.out <- interp(approx.data$input) > > ### still I find output values >1, even though yleft=1: > range(y.out) [1] 0.0000000 0.9816907 > hist(y.out) > ### and the input-data points for which strange interpolation does occur have no unusual distribution (however, they lie close to max(x)): > hist(approx.data$input[which(y.out>1)]) Error in hist.default(approx.data$input[which(y.out > 1)]) : invalid number of 'breaks' > > > > #### now the same process works fine, if input and x-values are rounded to, say, 10 digits: > interp2 <- approxfun(round(approx.data$x,digits=10), approx.data$y, yleft=1, yright=0, rule=2) Warning message: In approxfun(round(approx.data$x, digits = 10), approx.data$y, yleft = 1, : collapsing to unique 'x' values > > y.out <- interp2(round(approx.data$input,digits=10)) > ### here, the range of y.out lies, as expected, between 0 and 1: > range(y.out) [1] 0.000000 0.976981 > ### and the histogram: > hist(y.out) > > sessionInfo() R version 2.11.1 (2010-05-31) x86_64-apple-darwin9.8.0 locale: [1] en_IE.UTF-8/en_IE.UTF-8/C/C/en_IE.UTF-8/en_IE.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base

Your example code isn't consistent with the comments. You say > ### still I find output values >1, even though yleft=1: > range(y.out) [1] 0.0000000 0.9816907 But the range is not bigger than 1. However, when I run the version from your mailing list posting, I get inconsistent results from run to run, so I'll take a look.

Greg Snow's analysis was correct.

(In reply to comment #1) > Your example code isn't consistent with the comments. You say > > > ### still I find output values >1, even though yleft=1: > > range(y.out) > [1] 0.0000000 0.9816907 > > But the range is not bigger than 1. However, when I run the version from your > mailing list posting, I get inconsistent results from run to run, so I'll take > a look. My mistake: I had found a very different results when running the script previously (as mentioned, the results are inconsitent): unfortunately I hadn't seen that this time the range is correct indeed... Sorry for that. In most cases, I get values above 1, though... Thanks a lot, Sam