Created attachment 1676 [details]
tool to reproduce the crash issue
I evaluate R script code in Java side through JRI. Most of functions are working well, but when loading a model with many fields, it crash JVM directly.
1) Install R 3.1.1 in Windows 7, install "rJava" package in R command line;
2) (follow up JRI instruction) Create environment variable for "R_HOME", "R_INCLUDE_DIR", "R_SHARE_DIR", "R_LIBS", add "$R_HOME/bin" and "$R_LIBS/rJava/jri" into "Path" variable (if you run 64bit OS, please use "$R_HOME/bin/x64" and $R_LIBS/rJava/jri/x64" instead) (You can use "library()" in R console to print out the exact location of "R_LIBS")
3) Download the attachment zip file "JRI.zip". Unzip it, and run "run.bat" in command line;
4) Check R version
R version 3.1.1 (2014-07-10)
Platform: i386-w64-mingw32/i386 (32-bit)
 LC_COLLATE=English_United States.1252
 LC_CTYPE=English_United States.1252
 LC_MONETARY=English_United States.1252
 LC_TIME=English_United States.1252
attached base packages:
 stats graphics grDevices utils datasets methods base
5) run load() command, crash.
An unrecoverable stack overflow has occurred.
# A fatal error has been detected by the Java Runtime Environment:
# EXCEPTION_STACK_OVERFLOW (0xc00000fd) at pc=0x6f8c794e, pid=421092, tid=371088
# JRE version: Java(TM) SE Runtime Environment (7.0_51-b13) (build 1.7.0_51-b13)
# Java VM: Java HotSpot(TM) Client VM (24.51-b03 mixed mode windows-x86 )
# Problematic frame:
# C [Rzlib.dll+0x794e]
6) Run above code in Mac, it is ok.
If running "load()" command in standalone R terminal in Windows, it is still ok.
Some tips: the modelRF.rda is the random forest model backup file. The special point is the model input has 1776 fields. The crash happens in R inside, but not JRI or Java.
1) build a debug version R in Windows OS;
2) Run the same JRI console, and launch GDB; attach to JVM process;
3) Load symbol files by "set solib-search-path";
3) Add breakpoint on saveload.c code of R source code;
>(gdb) b "saveload.c":2333
4) then run "load()" in JRI console; the execution will be paused in 2333 line of saveload.c code;
5) go back to GDB, and type "c" to continue, you will find the "segment fault" on inflate.c code; use "info stack" to print the stack trace:
(gdb) info stack
#0 0x6f8c8cd7 in inflate (strm=0x19098640, flush=0) at inflate.c:1234
#1 0x6c7b7c10 in R_gzread (file=0x19098640, buf=0x19293158, len=4)
#2 0x6c7b87d9 in R_gzread (len=4, buf=0x19293158, file=0x19098640)
#3 gzfile_read (ptr=0x19293158, size=1, nitems=4, con=0x1900f028)
#4 0x6c791659 in InBytesConn (stream=0x192ddfac, buf=0x19293158, length=4)
#5 0x6c791ec6 in InInteger (stream=0x192ddfac) at serialize.c:361
#6 0x6c7932ee in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)
#7 0x6c792680 in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)
#8 0x6c7925ed in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)
#9 0x6c79260f in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)
#10 0x6c79260f in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)
#11 0x6c79260f in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)
#1195 0x6c79260f in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)
#1196 0x6c79260f in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)
#1197 0x6c7925ed in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)
#1198 0x6c792cdf in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)
#1199 0x6c7924b7 in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)
#1200 0x6c7925ed in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)
#1201 0x6c796193 in R_Unserialize (stream=0x192ddfac) at serialize.c:1894
#1202 0x6c7cc6d1 in do_loadFromConn2 (call=0x19e02980, op=0x4115b4,
args=0x19e03798, env=0x19e025e4) at saveload.c:2378
#1203 0x6c775d1d in bcEval (body=<optimized out>, rho=<optimized out>,
useCache=<optimized out>) at eval.c:4753
#1204 0x6c77e512 in Rf_eval (e=0x19e01034, rho=0x19e025e4) at eval.c:560
#1205 0x6c7829c6 in Rf_eval (rho=0x19e025e4, e=0x19e01034) at eval.c:519
#1206 Rf_applyClosure (call=0x19e011d8, op=0x19e010c0, arglist=0x19e02654,
rho=0x41920c, suppliedenv=0x419228) at eval.c:1044
#1207 0x6c77e627 in Rf_eval (e=0x19e011d8, rho=0x41920c) at eval.c:676
#1208 0x6c7514c3 in Rf_ReplIteration (rho=0x41920c, savestack=0,
browselevel=0, state=0x192de7ec) at main.c:260
#1209 0x6c75179d in R_ReplConsole (rho=<optimized out>, savestack=0,
browselevel=0) at main.c:310
#1210 0x192df838 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
I ignored some duplicated lines for serialize.c:1601.
serialize.c ReadItem() will be invoked in many loops and finally cause stack overflow. The field count is 1776 in the randomRF.rda model, and the stack trace goes to around 1190 loops and die.
Analysis: When using R in JRI, the R is in single thread mode, embedded in JVM process. The runtime of R is limited in JVM thread level, like stack size (128K in JVM most of time). When there is recursive function call in R code, it is risk to cause stack overflow. Is it possible to rewrite serialize.c ReadItem() with for/while, but not by recursive function call?
We got another workaround, increase JVM thread stack size with "-Xss2M". But this workaround definitely is tricky and not for production env, as thread stack size should be not bigger and there are multiple threads in J2EE environment.
This is just a very complicated way to write a wish item to use iterative unserialization of pairlists - it has actually nothing to do directly with JRI.
(Note that stack checking is disabled in JRI due to threads - hence a crash of stack overflow which is expected).
yes, it is nothing with JRI. Can you put it with high priority? As it crash the process, should not be a simple wish. In our customer case, there are much data stored in long columns (around 10,000 columns at most). Right now a sample model with 1776 columns will crash immediately. I am not sure whether we need to continue to try R or consider other techniques.
(In reply to Simon Urbanek from comment #1)
> This is just a very complicated way to write a wish item to use iterative
> unserialization of pairlists - it has actually nothing to do directly with
Note that this is implemented in the latest version of pqR (pqR-2014-09-30) available at pqR-project.org. This change involves only a few lines of code (though there are other changes to unserialization in pqR as well, to support its use of read-only constants).
It is a good news. Thanks, Radford. Performance is very important for big data mining.
But I saw pgR website said "Windows system is not currently recommended", is there anyone use it in Windows in product mode?
What is the relationship between pgR and R? Is there a plan to merge pgR enhancements back to R?