Bug 17121 - Segmentation fault when using stats:::predict.loess with se=TRUE
Summary: Segmentation fault when using stats:::predict.loess with se=TRUE
Status: UNCONFIRMED
Alias: None
Product: R
Classification: Unclassified
Component: Low-level (show other bugs)
Version: R 3.3.*
Hardware: x86_64/x64/amd64 (64-bit) Linux-Debian
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2016-07-19 19:53 UTC by Matt Shotwell
Modified: 2016-08-19 01:53 UTC (History)
1 user (show)

See Also:


Attachments
data and code that demonstrate bug (180.99 KB, application/gzip)
2016-07-19 19:53 UTC, Matt Shotwell
Details
patch to check loess workspace size (1.07 KB, patch)
2016-08-19 01:53 UTC, Benjamin Tyner
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Matt Shotwell 2016-07-19 19:53:03 UTC
Created attachment 2132 [details]
data and code that demonstrate bug

The function 'predict.loess' can generate a segmentation fault when using the argument 'se=TRUE'. It appears that an underlying fortran routine attempts to access unmapped memory of size 4 bytes.

I have attached the data and code referenced in the transcript below. Valgrind level 2 instrumentation was used here:

matt@deb7box:~$ R -d valgrind -f bug-code.R
==19841== Memcheck, a memory error detector
==19841== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==19841== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
==19841== Command: /home/matt/src/R-3.3.0/bin/exec/R -f bug-code.R
==19841== 

R version 3.3.0 (2016-05-03) -- "Supposedly Educational"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> load('bug-data.RData')
> 
> ## compute or load cached loess object
> fit_file <- "bug-fit.RData"
> if(!file.exists(fit_file)) {
+   fit <- loess(y~x, data=dat)
+   save(fit, file=fit_file, compress='xz',
+        compression_level=9)
+ } else {
+   load(fit_file)
+ }
> 
> predict(fit, newdata=data.frame(x=0))
        1 
-10.64571 
> predict(fit, newdata=data.frame(x=0), se=TRUE)
==19841== Warning: set address range perms: large range [0x3a04c040, 0x65b1fe08) (defined)
==19841== Warning: set address range perms: large range [0x65b20040, 0x11421a660) (defined)
==19841== Invalid write of size 4
==19841==    at 0xAEF0F11: ehg139_ (loessf.f:1445)
==19841==    by 0xAEF23EE: ehg131_ (loessf.f:468)
==19841==    by 0xAEF2F99: lowesb_ (loessf.f:1531)
==19841==    by 0xAEA422C: loess_ise (loessc.c:226)
==19841==    by 0x493908: do_dotCode (dotcode.c:1760)
==19841==    by 0x4D7521: bcEval (eval.c:5648)
==19841==    by 0x4C8B18: Rf_eval (eval.c:616)
==19841==    by 0x4CA087: Rf_applyClosure (eval.c:1134)
==19841==    by 0x4D71C3: bcEval (eval.c:5620)
==19841==    by 0x4C8B18: Rf_eval (eval.c:616)
==19841==    by 0x4CA087: Rf_applyClosure (eval.c:1134)
==19841==    by 0x51BB41: applyMethod (objects.c:118)
==19841==  Address 0x11422354c is not stack'd, malloc'd or (recently) free'd
==19841== 

 *** caught segfault ***
address 0x11422354c, cause 'memory not mapped'

Traceback:
 1: predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x else if (is.data.frame(newdata)) as.matrix(model.frame(delete.response(terms(object)),     newdata, na.action = na.action)) else as.matrix(newdata),     object$s, object$weights, object$robust, op$span, op$degree,     op$normalize, op$parametric, op$drop.square, op$surface,     op$cell, op$family, object$kd, object$divisor, se = se)
 2: predict.loess(fit, newdata = data.frame(x = 0), se = TRUE)
 3: predict(fit, newdata = data.frame(x = 0), se = TRUE)
An irrecoverable exception occurred. R is aborting now ...
==19841== 
==19841== HEAP SUMMARY:
==19841==     in use at exit: 3,709,989,494 bytes in 14,302 blocks
==19841==   total heap usage: 29,624 allocs, 15,322 frees, 4,021,602,525 bytes allocated
==19841== 
==19841== LEAK SUMMARY:
==19841==    definitely lost: 0 bytes in 0 blocks
==19841==    indirectly lost: 0 bytes in 0 blocks
==19841==      possibly lost: 0 bytes in 0 blocks
==19841==    still reachable: 3,709,989,494 bytes in 14,302 blocks
==19841==         suppressed: 0 bytes in 0 blocks
==19841== Rerun with --leak-check=full to see details of leaked memory
==19841== 
==19841== For counts of detected and suppressed errors, rerun with: -v
==19841== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault
Comment 1 Matt Shotwell 2016-07-20 17:39:58 UTC
Occurs on Windows (see sessionInfo below) too.

R version 3.2.3 (2015-12-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.2.3
Comment 2 Benjamin Tyner 2016-07-23 16:38:52 UTC
I am looking into this; it has been a while since I last played with this code; can someone confirm whether the fortran is based on cloess versus dloess? The help page says the former, but doc/NEWS.2 implies the latter:

    o   The compiled loess() code has been updated to the current
        version of dloess from Netlib.  This includes patches from Ben
        Tyner which correct some errors when degree = 0 and hence
        solve PR#13570.

Regards
Ben
Comment 3 Benjamin Tyner 2016-08-18 15:10:34 UTC
Assuming support for long vectors hasn't been added to the fortran, then I think this is essentially the same issue as was reported here:

   https://stat.ethz.ch/pipermail/r-devel/2013-March/066013.html

The "fix" for that,

   Index: src/library/stats/src/loessc.c                                        
   ===================================================================          
   --- src/library/stats/src/loessc.c      (revision 62144)                     
   +++ src/library/stats/src/loessc.c      (revision 62145)                     
   @@ -234,7 +234,9 @@                                                          
        tau0 = ((*degree) > 1) ? (int)((D + 2) * (D + 1) * 0.5) : (D + 1);      
        tau = tau0 - (*sum_drop_sqr);                                           
        lv = 50 + (3 * D + 3) * nvmax + N + (tau0 + 2) * nf;                    
   -    liv = 50 + ((int)pow((double)2, (double)D) + 4) * nvmax + 2 * N;        
   +    double dliv = 50 + (pow(2.0, (double)D) + 4.0) * nvmax + 2.0 * N;       
   +    if (dliv < INT_MAX) liv = dliv;
   +    else error("workspace required is too large");
        if(*setLf) {
           lv = lv + (D + 1) * nf * nvmax;
           liv = liv + nf * nvmax;

is not working as advertised whenever *setLf is 1...specifically, the check for the too large workspace size ought to be performed after the "if(*setLf)" clause, not before.
Comment 4 Benjamin Tyner 2016-08-19 01:53:06 UTC
Created attachment 2141 [details]
patch to check loess workspace size

...taking care to avoid integer overflow when computing "nf * nvmax"