Bug 17264 - R_checkActivityEx() deadlock on OSX when running as a library
Summary: R_checkActivityEx() deadlock on OSX when running as a library
Alias: None
Product: R
Classification: Unclassified
Component: Mac GUI / Mac specific (show other bugs)
Version: R-devel (trunk)
Hardware: x86_64/x64/amd64 (64-bit) OS X Mavericks
: P5 enhancement
Assignee: Simon Urbanek
Depends on:
Reported: 2017-05-01 09:46 UTC by Hannes Mühleisen
Modified: 2017-05-03 07:22 UTC (History)
1 user (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Hannes Mühleisen 2017-05-01 09:46:57 UTC
On OSX, when running R as a library, R_checkActivityEx() is sometimes hanging on fileno(stdin) due to a race condition when the "parent" program also looks at stdin. Not sure what happens on other unixes, but OSX at least uses locks in fileno(). Digging into sys-std.c, we found 

    if (ignore_stdin)
	FD_CLR(fileno(stdin), &readMask);

which clears the stdin from the list of file descriptors select() should monitor. That list is created in a call to setSelectMask(), which has the following line in it:

    if(handlers == &BasicInputHandler)
	handlers->fileDescriptor = fileno(stdin);

A solution could be to pass ignore_stdin, which is a parameter to R_checkActivityEx, on to the static method setSelectMask() and just not add the stdin() to the readMask in the first place. However, we are not certain whether InputHandler_s could also contain a stdin reference which would have to be removed as well when ignore_stdin is set. Also happy to use a workaround, but can't see any outside-controlled behaviour that could help in the code.
Comment 1 Simon Urbanek 2017-05-01 12:56:18 UTC
Thanks, however I'm not sure I understand the problem on R side. macOS only locks for the duration of the structure access in fileno(), so it will never actually deadlock on fileno() use alone since the application cannot get in between and lock and unlock. If you issue two locks, they won't deadlock since the first one will resolve immediately after the FD is fetched at which point the second is able to lock. So if there is a problem, it's more likely in the application that is embedding - if you lock the FD and then you call R API with the lock in place, then you're asking for trouble, because R can do all kinds of things. So you just have to make sure if you use explicit flockfile() to not put R API calls between flockfile() and funlockfile().

That said, if you replace all occurrences of fileno(stdin) with STDIN_FILENO in the relevant R code - does that solve your problem? In general it's hard for us to debug unless you have a reproducible example.
Comment 2 Hannes Mühleisen 2017-05-03 07:22:46 UTC
Thanks for looking into this, I would argue that R as a library should probably be robust to these kind of things as they easily happen in the enclosing software. I will try with the STDIN_FILENO method and report back.