Bug 16723 - sink() text goes missing after mclapply
Summary: sink() text goes missing after mclapply
Status: UNCONFIRMED
Alias: None
Product: R
Classification: Unclassified
Component: Low-level (show other bugs)
Version: R 3.2.3
Hardware: All Linux-RHEL
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2016-02-22 14:05 UTC by Robert McGehee
Modified: 2016-06-03 20:08 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Robert McGehee 2016-02-22 14:05:14 UTC
Hello,
I generally sink() my script output to a text file that is then emailed to me. However, I've found a somewhat annoying bug in which the first part of the text output following an mclapply is erased or never written to the file, if the mclapply itself contains code that prints text. Interestingly, (for some integer X) if the mclapply writes out X characters of text, then the next X characters of text following the mclapply are not written to then sink file.

Below is a simple example where X is 4 ("ABCD") and I print the numbers 1:9 after the mclapply. Instead of seeing "123456789" in the sink file, I only get the numbers "564789", as the first four characters are erased or not written to the file.

The bug appears irrespective of whether mc.silent is TRUE or FALSE. Also, the output from the mclapply is not written the file, even when mc.silent=FALSE, which might be a separate or related bug.

require(parallel)
scon <- file(open="w+")
sink(scon)
x <- mclapply(1:2, function(x) cat("ABCD"))
cat("123456789\n")
sink()
txt <- readLines(scon)
close(scon)
cat(paste(txt, collapse="\n"), "\n")

Thanks, Robert
Comment 1 Henrik Bengtsson 2016-06-03 19:36:05 UTC
I can reproduce this on R 3.3.0 on x86_64-pc-linux-gnu (64-bit).  I don't think is specific to sink() per se, but rather a problem due to connections being shared across different R processes.

library("parallel")
con <- file("foo.txt", open="wt")

writeLines(con=con, text=sprintf("Master PID: %d", Sys.getpid()))

res <- mclapply(1:2, function(x) {
  writeLines(con=con, text=sprintf("Child PID: %d", Sys.getpid()))
})

writeLines(con=con, text=sprintf("Master PID: %d", Sys.getpid()))
close(con)

readLines("foo.txt")
## [1] "Master PID: 421061" "Master PID: 421061"


One could argue that this should give an error, or at least a warning, but I'm not sure how easy that is to test for/implement.

Here's a related example illustrating using connections for _reading_ in forked processes:

library("parallel")
writeLines(con="letters.txt", text=letters)

con <- file("letters.txt", open="rt")
res <- mclapply(1:2, function(x) readLines(con=con, n=2L))
close(con)
print(res)

## [[1]]
## [1] "a" "b"
## 
## [[2]]
## character(0)

My $.02
Comment 2 Henrik Bengtsson 2016-06-03 20:08:04 UTC
Interestingly, when using cat(), instead of writeLines(), it does indeed work:

library("parallel")
con <- file("output.txt", open="wt")
cat(file=con, "Master PID:", Sys.getpid())
res <- mclapply(1:2, function(x) cat(file=con, "Child PID:", Sys.getpid()))
cat(file=con, "Master PID:", Sys.getpid())
close(con)
readLines("output.txt")
## [1] "Master PID: 100269Child PID: 100313Child PID: 100314Master PID: 100269"


For the record, the issue is there with writeChar():

library("parallel")
con <- file("output.txt", open="wt")
writeChar("0", con=con, eos=NULL)
res <- mclapply(c("1", "2"), function(x) writeChar(x, con=con, eos=NULL))
writeChar("0", con=con, eos=NULL)
close(con)
file.size("output.txt")
## [1] 4
readChar("output.txt", nchars=99)
## [1] "00"


and also with writeBin():

library("parallel")
con <- file("output.bin", open="wb")
writeBin(0L, size=1L, con=con)
res <- mclapply(1:2, function(x) writeBin(x, size=1L, con=con))
writeBin(0L, size=1L, con=con)
close(con)
file.size("output.bin")
## [1] 2
readBin(con="output.bin", what="integer", size=1L, n=10)
## [1] 0 0