Bug 15886 - summaryRprof not correct for chunksize < number of lines of log
Summary: summaryRprof not correct for chunksize < number of lines of log
Status: NEW
Alias: None
Product: R
Classification: Unclassified
Component: Language (show other bugs)
Version: R 3.1.0
Hardware: x86_64/x64/amd64 (64-bit) Mac OS X v10.6
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2014-07-16 20:08 UTC by epurdom
Modified: 2014-07-16 20:13 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description epurdom 2014-07-16 20:08:20 UTC
summaryRprof has a bug that makes it perform oddly or report an error (depending) when the number of lines of the log file is greater than the chunksize and the memory option is either 'tseries' or 'stats' and the function 'utils:::Rprof_memory_summary' gets called by summaryRprof. I can generate this problem in 3.1.1 in either a linux or Mac R environment.

There was a bug report (15288) that seems related and reported as fixed in the developer version. It reports that the function 'would only read the first
\code{chunksize} lines of an \code{Rprof} file produced with
\code{line.profiling=TRUE}', but I believe the problem I see is that the function Rprof_memory_summary called by summaryRprof only reads the *last* chunksize lines of a Rprof file. Also, for me, it is not related to line.profiling=TRUE option, but the memory option. For this reason, it seems at least possible that similar incorrect code was replicated in two places. However it may be already fixed.

I believe that the problem is that the object `memchunks' is overwritten at each iteration of the repeat{...} portion of the function utils:::Rprof_memory_summary. This means only the last part of the chunk is saved for later. So the error is only relevant if chunksize is < lines in the Rprof file. Depending on the relationship of chunksize to the number of lines of the code, this either just creates strange output with a warning or an error.

Here is example code, but it depends slightly on the specific size of the Rprof file as to whether the error or the warning version is created. So to exactly replicate and error versus a warning, you have to play around with the chunksize relative to the length of Rprof file. 

Rprof(tf <- "test.log", memory.profiling=TRUE,line.profiling=FALSE)
out<-lapply(1:10000,rnorm,n=100)
Rprof(NULL)
length(readLines("test.log")) #10 lines for me
summaryRprof(tf,memory="tseries")
summaryRprof(tf,memory="tseries",chunksize=7) #gives error
summaryRprof(tf,memory="tseries",chunksize=9) #gives warning
sessionInfo()

> length(readLines("test.log"))
[1] 10
> summaryRprof(tf,memory="tseries")
     vsize.small vsize.large    nodes duplications        stack:2
0.02      153814     2567909 15649424            1 "lapply":"FUN"
0.04        1065      219816   595952            0 "lapply":"FUN"
0.06           0           0        0            0 "lapply":"FUN"
0.08           0           0        0            0 "lapply":"FUN"
0.1         2234      461008  1250984            0 "lapply":"FUN"
0.12           0           0        0            0       "lapply"
0.14        2858      589864  1600424            0 "lapply":"FUN"
0.16           0           0        0            0 "lapply":"FUN"
0.18        3412      703952  1910664            0 "lapply":"FUN"
0.2            0           0        0            0 "lapply":"FUN"
> summaryRprof(tf,memory="tseries",chunksize=7)
Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 3, 10
> summaryRprof(tf,memory="tseries",chunksize=8)
  vsize.small vsize.large    nodes duplications        stack:2
1      153975     2409563 15623440            0 "lapply":"FUN"
2           0           0        0            0 "lapply":"FUN"
3           0           0        0            0 "lapply":"FUN"
4           0           0        0            0 "lapply":"FUN"
5           0           0        0            0 "lapply":"FUN"
6           0           0        0            0 "lapply":"FUN"
7           0           0        0            0 "lapply":"FUN"
8           0           0        0            0 "lapply":"FUN"
9           0           0        0            0 "lapply":"FUN"
Warning message:
In data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded
> > sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.1.1
Comment 1 epurdom 2014-07-16 20:13:38 UTC
I should have said, I don't have access to the devel version to be able to check whether the problem is solved.