Bug 17126 - S4 class hierarchy incorrectly built in rare cases (unexported S4 classes + 'load'ed S4 objects)
Summary: S4 class hierarchy incorrectly built in rare cases (unexported S4 classes + '...
Status: UNCONFIRMED
Alias: None
Product: R
Classification: Unclassified
Component: S4methods (show other bugs)
Version: R-devel (trunk)
Hardware: Other Other
: P5 minor
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2016-08-02 18:41 UTC by Kevin Ushey
Modified: 2016-08-02 18:41 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Ushey 2016-08-02 18:41:07 UTC
Attempting to call 'inherits' on an S4 object that was created by 'load'ing an R workspace can lead to an incorrect class hierarchy being cached for that object.

This bug requires a number of pre-conditions:

1) The class in question must be an unexported class,
2) The S4 object must be generated by 'load'ing an R workspace (ie, someone must have 'save'd that S4 object previously),
3) The namespace of the package defining that class must not yet be loaded.

Suppose we have an unexported S4 class 'SubMatrix', defined in a class 's4inherits', which inherits from the S4 'dsyMatrix' class defined in the Matrix package. Given this + the previous preconditions, the following code produces an unexpected output for 'inherits':

    load("workspace.RData")     ## provides 'data', an instance of SubMatrix
    inherits(data, "dsyMatrix") ## returns FALSE, uenxpectedly
    is(data, "dsyMatrix")       ## still returns TRUE

Note that, because 'load' produces un-evaluated promises, the package 's4inherits' is not loaded until the 'data' promise is actually evaluated. And, 'inherits' is a primitive that does not eagerly evaluate its arguments, so it does not attempt to evaluate 'data' until it's "too late".

When 'inherits' is called, R will call the '.extendsForS3' function to determine the class hierarchy -- this occurs before the 'data' object is evaluated and so before the 's4inherits' class is loaded. This eventually leads to the code here:

https://github.com/wch/r-source/blob/dddb0eeef7d0f2e7b2c265c8daaf24c5558de4f0/src/library/methods/R/SClasses.R#L253-L256

	if(isTRUE(nzchar(package))) {
	    whereP <- .requirePackage(package)
	    value <- get0(cname, whereP, inherits = inherits) # NULL if not existing
	}

Now, because the 'data' object has not yet been evaluated, the 's4inherits' namespace is not yet loaded. Right now, this implies that:

    1) .requirePackage(package) attaches the package and returns the package environment, not namespace;

    2) cname (.__C__SubMatrix) is not resolved, as it lives in the package namespace, not package environment.

Note that .requirePackage has the (unexpected) behavior that, for packages which are currently not loaded / attached, it attaches the package and returns the attached package environment, rather than the namespace. Martin has indicates that this behavior is likely unintended in this particular scenario, but is an artifact of methods needing to bootstrap itself (before the 'methods' package really exists) as well as other packages. To outline the issue a bit more (copying from the mailing list post):

---

If the package namespace is not loaded, the package is loaded and
attached, and the package environment is returned:

    > methods:::.requirePackage("digest")
    Loading required package: digest
    <environment: package:digest>
    attr(,"name")
    [1] "package:digest"
    attr(,"path")
    [1] "/Users/kevin/Library/R/3.3/library/digest"
    > "digest" %in% loadedNamespaces()
    [1] TRUE
    > "package:digest" %in% search()
    [1] TRUE

On the other hand, if the package namespace has already been loaded,
the package namespace is returned without attaching the package:

    > requireNamespace("digest")
    Loading required namespace: digest
    > methods:::.requirePackage("digest")
    <environment: namespace:digest>
    > "digest" %in% loadedNamespaces()
    [1] TRUE
    > "package:digest" %in% search()
    [1] FALSE

---

Martin outlined a couple solutions; copying from his mailing list post:

---

Still, it is very desirable to improve / fix the issue:
After all this musings, I'd currently guess that it would be a
good idea if  .requirePackage()  would always return the
*namespace* of the corresponding package unless that does not
yet exist, as in the the 'bootstrap' situations above.
... or we'd add a new argument to .requirePackage() dealing with
that, or we use two functions:  .requirePackage() needed for
boostrapping, returning a package envionment, and
.requireNamespace() used to access class (and generic function!)
environments.

---

I have a minimal-as-possible example here: https://github.com/kevinushey/s4inherits; you can try running it (assuming you have git installed) with something like:

    cd $TMPDIR
    git clone https://github.com/kevinushey/s4inherits
    R CMD INSTALL s4inherits
    R -f s4inherits/test-save.R
    R -f s4inherits/test-load.R
    rm -rf s4inherits
    R -e "remove.packages('s4inherits')"

Upon running 'test-load.R', you should see something like:

> load("test.RData")
> inherits(data, "dsyMatrix")
Loading required package: s4inherits
[1] FALSE
> is(data, "dsyMatrix")
[1] TRUE

Thanks,
Kevin

---

> sessionInfo()
R version 3.3.1 Patched (2016-07-30 r71015)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X El Capitan (10.11.6)