Bug 17203 - socketSelect(..., timeout): timeout fails on Unix for certain fractional timeouts (due to 1 ms precision)
Summary: socketSelect(..., timeout): timeout fails on Unix for certain fractional time...
Status: UNCONFIRMED
Alias: None
Product: R
Classification: Unclassified
Component: System-specific (show other bugs)
Version: R-devel (trunk)
Hardware: Other Linux-Ubuntu
: P5 major
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2016-12-30 05:31 UTC by Henrik Bengtsson
Modified: 2016-12-30 05:31 UTC (History)
0 users

See Also:


Attachments
Timeout precision bug fix (635 bytes, patch)
2016-12-30 05:31 UTC, Henrik Bengtsson
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Henrik Bengtsson 2016-12-30 05:31:34 UTC
Created attachment 2206 [details]
Timeout precision bug fix

Summary
-------

For certain choices of argument 'timeout' to base::socketSelect() results in an infinite timeout on Unix.



Reproducible example
--------------------

To illustrate the problem, set up a server-client connection between the current R session and a background R session. In a fresh R session, do:

setupConnection <- function(host = "localhost", port = 11001L) {
  Rscript <- file.path(R.home("bin"), "Rscript")
  cmd <- sprintf("Sys.sleep(1); socketConnection('%s', port = %d, server = FALSE, blocking = TRUE, open = 'a+b'); repeat { Sys.sleep(10) }", host, port)
  system2(Rscript, args = c("-e", shQuote(cmd)), wait = FALSE)
  socketConnection(host, port = port, server = TRUE, blocking = TRUE, open = 'a+b')
}
con <- setupConnection()

## This will time out after 2 seconds
socketSelect(list(con), write = FALSE, timeout = 2.0)

## But this will never time out (on Unix)
socketSelect(list(con), write = FALSE, timeout = 1.9)


I've verified this on R 3.3.2 and R-devel on Linux.  I've previously reported this bug in R-devel thread 'socketSelect(..., timeout): non-integer timeouts in (0, 2) (?) equal infinite timeout on Linux - weird' on 2016-10-01 (https://stat.ethz.ch/pipermail/r-devel/2016-October/073218.html).



Troubleshooting
---------------

The test for timeout is done using a comparison 'used >= mytimeout' between two doubles in R_SocketWaitMultiple().  Here 'mytimeout' is not truncated, but 'used' is truncated to a precision of 1e-6 (=1 ms).  Due to the limitations of floating-point representations, we sometimes end up with 'used - mytimeout == -1e-6' whereas the current implementation assumes 'used - mytimeout == 0'.  For full details and tests, see https://github.com/HenrikBengtsson/Wishlist-for-R/issues/35.


Patch
-----
Updating the test in R_SocketWaitMultiple() to be 'used + 1e-6 >= mytimeout' solves the problem.  See attached patch.  (Alternatively to updating the test, one can update the line 'used += ...' by using '(tv.tv_usec + 1)'). 

The patch also includes the same update for R_SocketWait() which contains a very similar timeout comparison.  I have not identified where R_SocketWait() is used and therefore not identified a use case where the bug appears in that case, but I'd be surprised if it wouldn't occur there too.