Bug 17228 - predict.gnls Gives Incorrect Results When New Factor Levels Are Given
Summary: predict.gnls Gives Incorrect Results When New Factor Levels Are Given
Alias: None
Product: R
Classification: Unclassified
Component: Add-ons (show other bugs)
Version: R-devel (trunk)
Hardware: All All
: P5 critical
Assignee: R-core
Depends on: 17226
  Show dependency treegraph
Reported: 2017-02-22 14:51 UTC by Bill Denney
Modified: 2017-02-22 19:08 UTC (History)
0 users

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Bill Denney 2017-02-22 14:51:32 UTC
With only the correct number of factors but not the correct levels, predict.gnls will predict as though the original factor levels were given.  I discovered this while working on code similar to that in https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17226

See the last line in the example below where new factor levels are given for a the gear data element and no warning nor error is raised and the result is reported as though the original factor levels are given:

d.mod <- mtcars
d.mod$gear <- factor(d.mod$gear)

mymod <-
       start=rep(0.1, nlevels(d.mod$gear)))

# Fails "contrasts can be applied only to factors with 2 or more levels"
predict(mymod, newdata=data.frame(disp=100, gear=factor("3")))
# Fails "Error in p %*% beta[pmap[[nm]]] : non-conformable arguments"
predict(mymod, newdata=data.frame(disp=100, gear=factor(c("3", "4"))))
# Succeeds
predict(mymod, newdata=data.frame(disp=100, gear=factor(c("3", "4", "5"))))
# Should fail but does not!
predict(mymod, newdata=data.frame(disp=100, gear=factor(c("6", "7", "8"))))
Comment 1 Bill Denney 2017-02-22 16:08:13 UTC
The solution appears to be that when making the model.matrix in predict.gnls, if a parameter is not a continuous variable (e.g. is a factor), then it needs to be set to NA if the value is not present in the original data.

This new version of getParsGnls ensures that only existing parameters are used:

getParsGnls <- function (plist, pmap, beta, N) {
  pars <- array(0, c(N, length(plist)), list(NULL, names(plist)))
  for (nm in names(plist)) {
    p <- plist[[nm]]
    pars[, nm] <-
      if (is.logical(p)) {
      } else {
        # Find matching contrasts
        pnames <- paste(nm, colnames(p), sep=".")
        colnames(p) <- pnames
        current.beta <- beta[pmap[[nm]]]
        matching.names <- intersect(pnames, names(current.beta))
        # Apply the contrasts
        p[,matching.names] %*% current.beta[matching.names]

This is the line in predict.gnls that should be fixed.

plist[[nm]] <-

The fix needs to ensure that it will only apply existing levels to a data frame.  The below is not at all a general solution, but it works for all of the models in the example by providing specifically the factor levels in the original data.

tmpMf <- model.frame(asOneSidedFormula(params[[nm]][[3]]), 
tmpMf$gear <- factor(tmpMf$gear, levels=c("3", "4", "5"))
plist[[nm]] <- model.matrix(asOneSidedFormula(params[[nm]][[3]]), 

What I don't see is how to determine what levels are present in the original data in a definitive way.  The main issue that I have in trying to find the original data is that it may have changed in the original environment (object$call$data).  Specifically in the example, if I set "d.mod <- iris" before the predict statements then everything would break by searching object$call$data.
Comment 2 Bill Denney 2017-02-22 19:08:46 UTC
As I looked even more, it appears that this code in predict.gnls is no longer working due to changes in how contrasts are stored:

  for(i in names(dataMod)) {
    if (inherits(dataMod[,i], "factor") &&
        !is.logical(contr[[i]]) && is.matrix(contr[[i]]) ) {
      levs <- levels(dataMod[,i])
      levsC <- dimnames(contr[[i]])[[1]]
      if (any(wch <- is.na(match(levs, levsC)))) {
                                "level %s not allowed for %s",
                                "levels %s not allowed for %s"),
                       paste(levs[wch], collapse = ",")), domain = NA)
      attr(dataMod[,i], "contrasts") <- contr[[i]][levs, , drop = FALSE]
#      if (length(levs) < length(levsC)) {
#        if (inherits(dataMod[,i], "ordered")) {
#          dataMod[,i] <- ordered(as.character(dataMod[,i]), levels = levsC)
#        } else {
#          dataMod[,i] <- factor(as.character(dataMod[,i]), levels = levsC)
#        }
#      }