Bug 15345 - R 3.0.0 gc() mis-reports memory usage and actual memory usage grows undesirably
R 3.0.0 gc() mis-reports memory usage and actual memory usage grows undesirably
Status: RESOLVED FIXED
Product: R
Classification: Unclassified
Component: Low-level
R 3.0.0
x86_64/x64/amd64 (64-bit) Linux
: P5 normal
Assigned To: R-core
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-13 10:14 UTC by Simon Wood
Modified: 2013-06-15 13:48 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Simon Wood 2013-06-13 10:14:08 UTC
I received a report from Damien Georges (damien.georges2@gmail.com) of mgcv using excessive memory, which appears to relate to changes in garbage collection in R 3.0.0. 

Fitting a simple model using mgcv:gam and then repeatedly performing the same predict call with the resulting model causes R to use progressively more memory in 3.0.0, but not in 2.5.1. 

The predict call does not call any compiled code from the mgcv package. 

Setting R_GC_MEM_GROW to zero before starting R stops actual memory use growing, but slows computation by a substantial factor: R profiling indicates that memory use is growing steadily and substantially, although it is not. Setting R_GC_MEM_GROW to one causes actual memory use to grow again. 

Calling gc() after each predict call keeps actual memory usage constant (according to a system monitor and 'top'), but R profiling and the gc() output again indicate steady and substantial increase. 
 
gc() is not reporting the amount of memory used correctly. If I repeat the predict call enough times then gc() eventually reports memory usage in excess of the total RAM and swap partition on my machine, whereas actual usage has at worse grown to 15% of total RAM.

None of the issues reported occur when I repeat the experiments with R 2.15.1. (and the same mgcv version, 1.7-24). 

Code and platform information below. I have also repeated everything on an otherwise similar 32 bit platform.

Simon W

library(mgcv)

# data simulation
data <- data.frame(x = sample(c(0,1),100, replace=100),
a=1:100,b=rnorm(100),c=runif(100))

# construct a model
mod <- gam(x~a+b+c, 
           data = data, 
           family = binomial(link='logit'))

# simulate newdata
newdata <- data.frame(a=1:1000,b=rnorm(1000),c=runif(1000))

Rprof(memory.profiling = TRUE,interval=0.05)
# repeat prediction many times...
for (i in 1:500){
  cat("\n*** projection run",i)
  ## note that there are no calls to mgcv compiled code 
  ## in the following...

  proj <- predict.gam(object=mod,newdata=newdata)

  # uncommenting the following keeps actual memory use constant
  # but R (gc and Rprof) still thinks that it is increasing.... 
  #gc()
}
Rprof(NULL)

## following reports far more memory in use than is actually used
## (c.f. 'top' or system monitor). If loop is long enough it
## reports a total greater than my total RAM+swap partition.... 
gc() 

# plot memory usage
xx <- summaryRprof(memory = "tseries", diff=F)
plot(as.numeric(rownames(xx)),xx$vsize.large, type='l')

### end of code

> R.version
               _                           
platform       x86_64-unknown-linux-gnu    
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          3                           
minor          0.0                         
year           2013                        
month          04                          
day            03                          
svn rev        62481                       
language       R                           
version.string R version 3.0.0 (2013-04-03)
nickname       Masked Marvel               
> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mgcv_1.7-24

loaded via a namespace (and not attached):
[1] grid_3.0.0      lattice_0.20-15 Matrix_1.0-12   nlme_3.1-109   
>
Comment 1 Luke Tierney 2013-06-14 15:27:56 UTC
Thanks for the report and example. Fixed in R-devel in r62959 and R-patched in
r62960.

This had nothing to do with memory manager changes -- it was the result of changes to the parser that did things to confuse the memory manager. Why parse was being called each time through your loop is not clear and something you might want to look into on your side.
Comment 2 Simon Wood 2013-06-15 13:48:47 UTC
Thanks!

I think parse is being called each loop iteration because predict.gam calls reformulate which calls parse.