Bug 17414 - splines package sometimes gives incorrect predictions
Summary: splines package sometimes gives incorrect predictions
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Models (show other bugs)
Version: R-devel (trunk)
Hardware: Other Other
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2018-04-27 15:01 UTC by Duncan Murdoch
Modified: 2018-04-30 07:52 UTC (History)
1 user (show)

See Also:


Attachments
Patch to R-devel (848 bytes, patch)
2018-04-27 15:01 UTC, Duncan Murdoch
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Duncan Murdoch 2018-04-27 15:01:27 UTC
Created attachment 2341 [details]
Patch to R-devel

This bug was reported by Hadley Wickham on R-devel today:

"Very surprising (to me!) and mystifying result from predict.glm(): the
predictions vary depending on whether or not I use ns() or
splines::ns(). Reprex follows below."

The problem is in the splines package makepredictcall.ns function. It has this test:


     if(as.character(call)[1L] != "ns") return(call)

but in the m2 case below, the spline is specified as "splines::ns", which fails the string match.

It may be that this test isn't needed, as suggested by Joris Meys on R-devel, I didn't check that.  I put together a more reliable test as

    if(!identical(eval(call[[1L]]), ns)) return(call)

and this appears to work and passes "make check".  

The same error happens in makepredictcall.bs.  I've fixed both (and one minor typo in makepredictcall.bs) and have attached a patch against R-devel.

Here is Hadley's example:


library(splines)

set.seed(12345)
dat <- data.frame(claim = rbinom(1000, 1, 0.5))
mns <- c(3.4, 3.6)
sds <- c(0.24, 0.35)
dat$wind <- exp(rnorm(nrow(dat), mean = mns[dat$claim + 1], sd =
sds[dat$claim + 1]))
dat <- dat[order(dat$wind), ]

m1 <- glm(claim ~ ns(wind, df = 5), data = dat, family = binomial)
m2 <- glm(claim ~ splines::ns(wind, df = 5), data = dat, family = binomial)

# The model coefficients are the same
unname(coef(m1))
#> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674  4.1564128
unname(coef(m2))
#> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674  4.1564128

# But the predictions are not!
newdf <- data.frame(wind = seq(min(dat$wind), max(dat$wind), length = 5))
unname(predict(m1, newdata = newdf))
#> [1] 0.51947119 0.03208719 2.82548847 3.90883496 4.06743266
unname(predict(m2, newdata = newdf))
#> [1]  0.5194712 -0.5666554 -0.1731268  2.8134844  3.9295814
Comment 1 Martin Maechler 2018-04-30 07:52:26 UTC
In the mean time, I had committed a version of the patch -- extend to  poly() <-> stats::poly()--  but then it was found that the change (svn 74663) broke 15 CRAN packages' checks ((including recommended "boot")) in all cases `call` was a symbol, and symbols are not subsettable, i.e.  call[[1L]]  gave an error.

So, I've ammended the changes to use the previous as.character(call)[1L]  and in case that did not match, for  "non-symbol" before using call[[1L]] so check gets a bit longer

    if(as.character(call)[1L] == "ns" || 
       (is.call(call) && identical(eval(call[[1L]]), ns)))  { modify-call }