Bug 16668

Summary: parallel: Unnecessary loading of stats (and graphics, grDevices, ...) with patch
Product: R Reporter: Henrik Bengtsson <henrikb>
Component: MiscAssignee: R-core <R-core>
Status: CLOSED FIXED    
Severity: enhancement CC: morgan
Priority: P5    
Version: R-devel (trunk)   
Hardware: All   
OS: All   
Attachments: Avoid loading stats et al. when loading parallel

Description Henrik Bengtsson 2016-01-12 17:26:48 UTC
Created attachment 2002 [details]
Avoid loading stats et al. when loading parallel

When loading the parallel package, it uses `stats::runif()` and when loading stats we also get graphics, grDevices, and utils.  This prevents setting up a minimal R session with only base and parallel loaded.  The memory usage with 'base' and 'parallel' alone is ~45 MiB whereas with all of the above it is ~102 MiB.


$ R_DEFAULT_PACKAGES=base,parallel Rscript --vanilla --quiet -e "loadedNamespaces()"
[1] "graphics"  "parallel"  "utils"     "grDevices" "stats"     "base"

As understood from the patch, one can avoid stats::runif() by:

$ R_DEFAULT_PACKAGES=base Rscript --vanilla --quiet -e "x <- sample.int(1L); Sys.setenv(R_PARALLEL_PORT=11321); x <- loadNamespace('parallel'); loadedNamespaces()"
[1] "parallel" "base"

Looking into the code the fix is trivial.  One can use `sample.int()` instead of `stats::runif()`.  I've attached a patch.

For more details, see https://github.com/HenrikBengtsson/Wishlist-for-R/issues/8

Comment 1 Martin Morgan 2016-01-12 23:23:07 UTC
sample.int(1) returns 1 always, unlike runif(1).

For the purpose of your patch (avoiding unnecessary dependencies on load) it seems like a sufficient change is below; somehow approximately random choice of ports seems a more tolerable change than truncating (??) random numbers.

Index: R/snow.R
--- R/snow.R	(revision 69936)
+++ R/snow.R	(working copy)
@@ -84,8 +84,10 @@
     rscript <- file.path(R.home("bin"), "Rscript")
     port <- Sys.getenv("R_PARALLEL_PORT")
     port <- if (identical(port, "random")) NA else as.integer(port)
-    if (is.na(port))
-        port <- 11000 + 1000 * ((stats::runif(1L) + unclass(Sys.time())/300) %% 1)
+    if (is.na(port)) {
+        runif1 <- sample.int(.Machine$integer.max, 1) / .Machine$integer.max
+        port <- 11000 + 1000 * ((runif1 + unclass(Sys.time()) / 300) %% 1)
+    }
     Sys.i <- Sys.info()
     options <- list(port = as.integer(port),
                     timeout = 60 * 60 * 24 * 30, # 30 days
Comment 2 Henrik Bengtsson 2016-01-12 23:40:12 UTC
Thanks Martin.  Yes, I was thinking about that truncation too. Your runif1 is certainly better emulation of runif().

Just to make sure, but I think you've got it: sample.int(1) is just to generate .Random.seed.
Comment 3 Martin Morgan 2016-01-20 01:09:28 UTC
Thanks Henrik I made these changes in devel.

I did actually misread your code on first pass, thinking that the result of sample.int() was being used.