Bug 17478 - install.packages() on Windows can upgrade R code but leave old dll in place
Summary: install.packages() on Windows can upgrade R code but leave old dll in place
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Installation (show other bugs)
Version: R 3.5.0
Hardware: Other Windows 64-bit
: P5 normal
Assignee: R-core
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-25 00:43 UTC by Matt Dowle
Modified: 2019-06-20 07:30 UTC (History)
4 users (show)

See Also:


Attachments
minimal reproducible example terminal output (6.26 KB, text/plain)
2019-06-05 23:40 UTC, Toby Hocking
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matt Dowle 2018-09-25 00:43:21 UTC
install.packages() on Windows can leave the package in an apparently functional state where the new R code is installed but the old dll is left in place. This mismatch can cause errors if the user is lucky, silent wrong results if they are not lucky.

The problem does not happen when the user installs the Windows binary from CRAN. It only happens when the user installs from source. Which install.packages() suggests to them when the source version on CRAN is later than CRAN's Windows binary. This typically happens just after the package is updated on CRAN.

Reproducible example is discussed and confirmed here : https://github.com/Rdatatable/data.table/issues/3056
Comment 1 Gabor Csardi 2018-10-01 09:13:24 UTC
This is a known issue, I believe. The old dll file is locked if the package is  loaded in _some_ R session, either the current one, or another one.

install.packages() refuses to reinstall the package if it is attached in the current session, but it tries to reinstall it if it is only loaded, but not attached. If the DLL file is locked, then you get a "warning": 

Warning: cannot remove prior installation of package ‘ps’

One "workaround" is to always run install.packages() with options(warn = 2), then at least you don't get a broken package. The problem with this is that it also stops for other warnings, e.g. for missing metadata files in the CRANextras repository. (See https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17467)

The checkMD5sums() tool is also useful:

> tools::checkMD5sums("data.table")
file ‘libs/i386/datatable.dll’ has the wrong MD5 checksum

Btw. of course this is also happening when installing binary packages. It happens whenever the package is loaded in _some_ R session.
Comment 2 Matt Dowle 2018-10-01 22:23:44 UTC
Thanks Gabor, this was really helpful.

> One "workaround" is to always run install.packages() with options(warn = 2),
> then at least you don't get a broken package.

Did you try this and confirm it avoids the broken package state? If so then perhaps it is an easy fix to change that warning to an error in R then! I'll be surprised because the R code has already been upgraded by that point but I hope it's true. I have likely misunderstood the source code of install.packages in this area.

R does a great job in detecting the in-use dll when installing from binary. It halts with error and leaves the package in an unusable state. This is the desired behaviour when installing from source too, as users are prompted to when CRAN source version > CRAN binary version. Rather than leaving the package in a broken state; i.e. new R code calling old C code in the old dll.

On checkMD5sums(), that's great, thanks. Should every package that has a dll add this check to their .onLoad(), and/or could it be added into R itself so that every package doesn't have to? I'll add it to data.table's .onLoad() so that users with older versions of R can benefit from the extra check without needing to upgrade to latest R if and when it gets an improvement in this area.
Comment 3 Gabor Csardi 2018-10-01 23:02:23 UTC
> Did you try this and confirm it avoids the broken package state?

I don't remember, but you are probably right. Nevertheless, even if it is broken, at least you get an error, so you can act on it. Not sure what else you  can do, really.

> R does a great job in detecting the in-use dll when installing from binary.
> It halts with error and leaves the package in an unusable state. 

Maybe sometimes it does, but AFAIR it did not do that for me. I think binary installation is broken as well, the warning I copied above was from a binary  installation.

The only case the installation was refused was when the package was _attached_ in the current session. A harder problem is when the DLL is locked by another R process, that can't be detected without trying the installation.

> Should every package that has a dll add this check to their .onLoad(), and/or
> could it be added into R itself so that every package doesn't have to?

That would make a lot of sense, but I would say fixing the installer is also important.

Btw. sessioninfo::session_info() now also marks the packages with DLL mismatch.
Comment 4 Matt Dowle 2018-10-02 01:06:20 UTC
> Nevertheless, even if it is broken, at least you get an error, so you can act
> on it. 

However, the consequences for the user in not acting on the error can be disastrous (wrong results silently) and persists until they next upgrade.

> Not sure what else you can do, really.

As I've already suggested, what else can be done is leave the package in a non-working state so it can't be loaded or used until it is fixed by reinstalling it. It is possible to do because install.packages() already does that when installing from binary, at least sometimes. This is better than leaving it in a mismatch R/dll faulty state. We know this faulty state persists because users keep reporting problems caused by it and the nature of the reports are because of R code calling old version C code.

> A harder problem is when the DLL is locked by another R process, that can't
> be detected without trying the installation.

Yes it's hard. But R's install.packages() already has code that is good at solving this problem. It already does that very well. The problem is that piece of very good logic is not being run in all cases, it seems.

Maybe it's not that hard. If the first thing install.packages() did was try to remove the DLL, wouldn't that be quite easy?  If it couldn't remove the DLL it wouldn't proceed further, leaving the package not upgraded but working correctly. The root problem perhaps is that install.packages() does the DLL last after it already upgraded the R code in the package. Although I am rather guessing here based on looking at the source a few days ago.

> Btw. sessioninfo::session_info() now also marks the packages with DLL
> mismatch.

I see that `sessioninfo` is your package. It looks excellent and too important to be a separate package. Could it be promoted into utils::sessionInfo()?
Comment 5 Toby Hocking 2019-06-05 23:40:59 UTC
Created attachment 2432 [details]
minimal reproducible example terminal output
Comment 6 Toby Hocking 2019-06-05 23:41:38 UTC
Hi I am having the same issue. Several others on R-devel http://r.789695.n4.nabble.com/R-pkg-install-should-fail-for-unsuccessful-DLL-copy-on-windows-td4757281.html have expressed a desire for an error in this case, instead of a warning.

At the very least I would expect that if I set options(warn=2) then R should convert that to an error, but it does not. Attached is a minimal reproducible example command line.
Comment 7 Gabor Csardi 2019-06-06 08:22:19 UTC
(In reply to Toby Hocking from comment #6)
[...]
> At the very least I would expect that if I set options(warn=2) then R should
> convert that to an error, but it does not. Attached is a minimal
> reproducible example command line.

`options(warn=2)` is not ideal, because there are other warnings, possibly. E.g. if  a PACKAGES file is not found, if tar has unknown pax headers, etc. There are a bunch of them, and they tend to be (mostly) harmless. FWIw the remotes package does this, i.e. it injects `options(warn = 2)` into the install process, but we had to create an escape hatch to work around the other warnings. 

I think `install.packages()` should just fail in this case. 

There are actually two different implementations to patch, because the binary package installation is different, that's just az unzip call essentially. But that is problematic as well, because it does not error, either, only warns. The end result in this case is that the package's directory is wiped out in the library, except the locked dll file.

This is the related issue in remotes, it also includes an investigation of the various cases (binary, sources, etc.): https://github.com/r-lib/remotes/issues/368
Comment 8 Matt Dowle 2019-06-11 20:31:31 UTC
In data.table I added a version check between the .dll and the R code by placing the version number as a compiled constant C string with a wrapper to check it matches the R level package version number, https://github.com/Rdatatable/data.table/pull/3237. This ensures that the working-but-silently-wrong state cannot occur with data.table 1.12.0+ (Jan 2019), even with old versions R if and when it is fixed in R.

When the working-but-silently-wrong state occurs, data.table prints the following message :

  "The datatable.dll version (1.12.0) does not match the package (1.12.2). Please close all R sessions to release the old DLL and reinstall data.table in a fresh R session. The root cause is that R's package installer can in some unconfirmed circumstances leave a package in a state that is apparently functional but where new R code is calling old C code silently: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17478. Once a package is in this mismatch state it may produce wrong results silently until you next upgrade the package. Please help by adding precise circumstances to 17478 to move the status to confirmed. This mismatch between R and C code can happen with any package not just data.table. It is just that data.table has added this check."
Comment 9 Tomas Kalibera 2019-06-14 07:53:46 UTC
Addressed in 76692 and 76693. Now R on Windows refuses to install a source package using install.packages() if that package is in use by the R session invoking install.packages(). Previously this check only applied to binary packages. In addition, the installation from a source package will fail with error when any file of the existing installation is locked (delete operation fails). This may unfortunately impact some users who used to update loaded packages even on Windows without running into any problems.
Comment 10 Gabor Csardi 2019-06-14 08:13:45 UTC
Based on your description the binary install case is still not fixed. I.e. if the package is loaded in another R session, then install.packages() will still wipe out the previous installation, and leave only the locked DLL behind.
Comment 11 Tomas Kalibera 2019-06-14 08:33:08 UTC
(In reply to Gabor Csardi from comment #10)
> Based on your description the binary install case is still not fixed. I.e.
> if the package is loaded in another R session, then install.packages() will
> still wipe out the previous installation, and leave only the locked DLL
> behind.

Right, so far this has been addressed only in the source installs.
Comment 12 Tomas Kalibera 2019-06-19 10:45:58 UTC
The problem with installing a binary package when an existing installation has a locked file can be solved via a per-directory lock. In released versions of R, one can pass "lock=TRUE" to install.packages() or change option "install.lock" to TRUE. See ?install.packages() for more. I've changed the default in R-devel to TRUE.
Comment 13 Matt Dowle 2019-06-19 19:52:08 UTC
Many thanks, Tomas! It's fantastic news that this is resolved. I'll update data.table's message in the next release, and I'll update the twitter thread.
Comment 14 Tomas Kalibera 2019-06-20 07:30:41 UTC
Thank you for the report. The patches have been ported to R-patched. It indeed remains important that packages being installed are not in use by any R session (even on a different machine when the library is shared), there are a number of ways this could go wrong.