Bug 14377 - approxfun: yleft/yright arguments ignored
approxfun: yleft/yright arguments ignored
Status: RESOLVED FIXED
Product: R
Classification: Unclassified
Component: Analyses
R 2.11.1
Other Mac OS X v10.6
: P5 minor
Assigned To: R-core
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-09-10 10:37 UTC by Sam Wuest
Modified: 2010-09-11 16:06 UTC (History)
1 user (show)

See Also:


Attachments
data (list) that can be used to reproduce the problem (23.23 KB, application/octet-stream)
2010-09-10 10:37 UTC, Sam Wuest
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sam Wuest 2010-09-10 10:37:18 UTC
Created attachment 1125 [details]
data (list) that can be used to reproduce the problem

This bug refers to the approxfun-function from the "stats"-package

When applied to certain data-types, the approxfun-function ignores the yleft and yright-arguments and returns values outside the yleft-yright range. In addition, the output varies between R sessions, even when applied to the exact same dataset. 

The problem has been discussed on the R-help forum (see https://stat.ethz.ch/pipermail/r-help/2010-August/250665.html or http://r.789695.n4.nabble.com/approxfun-problems-yleft-and-yright-ignored-td2338313.html). 

I am still not quite sure what exactly the problem is, but applying the round()-function to the input and x-values gets around the problem: even when the number of digits remain unchanged from the initial values. Greg Snow (see the discussion thread) has pointed out, that it might be a precision problem. However, I found it hard to simulate the particular data structure that causes the problem.

Best, Sam

Here some example code, using the data attached to this bug-report (also downloadable under http://bioinf.gen.tcd.ie/approx.data.Rdata: 


> ### load the data: a list called approx.data
>  load(file="approx.data.Rdata")
> ### contains the slots "x", "y", "input"
> names(approx.data)
[1] "x"     "y"     "input"
> ### with y ranging between 0 and 1
> range(approx.data$y)
[1] 0 1
> ### compare ranges of x and input-x values (the latter is a small subset of 500 data points):
> range(approx.data$x)
[1] 3.098444 7.268812
> range(approx.data$input)
[1]  3.329408 13.026700
> ### generate the interpolation function (warning message benign)
> interp <- approxfun(approx.data$x, approx.data$y, yleft=1, yright=0, rule=2)
Warning message:
In approxfun(approx.data$x, approx.data$y, yleft = 1, yright = 0,  :
  collapsing to unique 'x' values
> ### apply to input-values
> y.out <- interp(approx.data$input)
> 
> ### still I find output values >1, even though yleft=1:
> range(y.out)
[1] 0.0000000 0.9816907
> hist(y.out)
> ### and the input-data points for which strange interpolation does occur have no unusual distribution (however, they lie close to max(x)):
>  hist(approx.data$input[which(y.out>1)])
Error in hist.default(approx.data$input[which(y.out > 1)]) : 
  invalid number of 'breaks'
>  
>  
>  
> #### now the same process works fine, if input and x-values are rounded to, say, 10 digits: 
> interp2 <- approxfun(round(approx.data$x,digits=10), approx.data$y, yleft=1, yright=0, rule=2)
Warning message:
In approxfun(round(approx.data$x, digits = 10), approx.data$y, yleft = 1,  :
  collapsing to unique 'x' values
> 
> y.out <- interp2(round(approx.data$input,digits=10))
> ### here, the range of y.out lies, as expected, between 0 and 1: 
> range(y.out)
[1] 0.000000 0.976981
> ### and the histogram: 
> hist(y.out)
> 
> sessionInfo()
R version 2.11.1 (2010-05-31) 
x86_64-apple-darwin9.8.0 

locale:
[1] en_IE.UTF-8/en_IE.UTF-8/C/C/en_IE.UTF-8/en_IE.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
Comment 1 Duncan Murdoch 2010-09-11 14:23:39 UTC
Your example code isn't consistent with the comments.  You say

> ### still I find output values >1, even though yleft=1:
> range(y.out)
[1] 0.0000000 0.9816907

But the range is not bigger than 1.  However, when I run the version from your mailing list posting, I get inconsistent results from run to run, so I'll take a look.
Comment 2 Duncan Murdoch 2010-09-11 15:36:36 UTC
Greg Snow's analysis was correct.
Comment 3 Sam Wuest 2010-09-11 16:06:21 UTC
(In reply to comment #1)
> Your example code isn't consistent with the comments.  You say
> 
> > ### still I find output values >1, even though yleft=1:
> > range(y.out)
> [1] 0.0000000 0.9816907
> 
> But the range is not bigger than 1.  However, when I run the version from your
> mailing list posting, I get inconsistent results from run to run, so I'll take
> a look.

My mistake: I had found a very different results when running the script previously (as mentioned, the results are inconsitent): unfortunately I hadn't seen that this time the range is correct indeed... Sorry for that. In most cases, I get values above 1, though... 

Thanks a lot, 

Sam