Bug 17470 - readLines buffering with pipes
Summary: readLines buffering with pipes
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: I/O (show other bugs)
Version: R 3.5.0
Hardware: x86_64/x64/amd64 (64-bit) Windows 64-bit
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2018-09-13 07:40 UTC by Chris Culnane
Modified: 2018-09-14 14:57 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Culnane 2018-09-13 07:40:07 UTC
Bug 17432 is still a problem when using pipes for IPC. 

The bug is evident when calling R from another process and trying to communicate via StdIn. R will buffer the input and not read lines until the buffer is exceeded or StdIn is closed by the sending process. This prevents interactive communication between a calling process and a child R process. 

From a quick look at the source code, it looks like the bug is caused by only disabling buffering when isatty() returns true for a file descriptor (connections.c). This fixes Bug 17432 when the script is run in a terminal, but doesn’t help for pipes, which will return false for isatty().

An example R script and python script are provided below to demonstrate the problem:

R script (example.r):
=====================
f <- file("stdin")
open(f)
while(length(line <- readLines(f,n=1)) > 0) {
  write(line, stderr())
}

Python3 script:
===============
import sys, os, subprocess
process = subprocess.Popen(['Rscript', 'example.r'], stdin=subprocess.PIPE, stdout=subprocess.PIPE) for line in sys.stdin:
    process.stdin.write((line + '\n').encode('utf-8'))
    process.stdin.flush()


Expected Behaviour:
Run python script, each line entered is echoed back immediately by the R script - which is what happens on 3.4.4

Observed Behaviiour on >=3.5.0 (include devel):
The R script does not process lines as they are sent, it only receives them when StdIn is closed.
Comment 1 Michael Lawrence 2018-09-13 12:14:30 UTC
I guess the fix is just to require that the connection is not stdin. I'll keep the isatty() check since in principle there could be terminal input that is not stdin. Will commit the fix soon.
Comment 2 Chris Culnane 2018-09-13 12:35:25 UTC
C isn't my primary programming language, but I think that might be too narrow. I assume, but haven't tested, that the same problem would occur with something like a keep-alive socket. 

It might be more complete to test the file type with something like stat and only buffer on appropriate types.
Comment 3 Michael Lawrence 2018-09-13 13:36:48 UTC
Ok, I have read up on UNIX file types. Sounds like we only want buffering for a regular file, but that means resolving symbolic links. I guess we should also disable buffering for fifo() and pipe() connections.
Comment 4 Chris Culnane 2018-09-14 05:30:40 UTC
Yes, I think buffering should be disabled by default for everything except regular files. There might be an argument for having it as a configurable parameter, but that might be a lot more work. For example, if you knew that the socket/pipe was going close at the end of the transmission, there could be an advantage to buffering - but that will be application specific.

In terms of resolving symbolic links, I think stat does that anyway (https://www.gnu.org/software/libc/manual/html_node/Symbolic-Links.html), so it should return the type of the destination file, not the symbolic link itself.
Comment 5 Michael Lawrence 2018-09-14 14:57:20 UTC
Yea, I saw that about stat() and breathed a sigh of a relief.