Bug 16877 (mustart, stats::glm) - glm fitting with offset -- non-finite deviance
Summary: glm fitting with offset -- non-finite deviance
Status: UNCONFIRMED
Alias: mustart, stats::glm
Product: R
Classification: Unclassified
Component: Models (show other bugs)
Version: R 3.2.3
Hardware: All All
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2016-05-05 04:30 UTC by Jonathan Carroll
Modified: 2016-05-05 04:35 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Carroll 2016-05-05 04:30:15 UTC
Apologies if this is an implementation or model issue rather than a code issue...

There appears to be a bug (?) in stats::glm where start/mustart/etastart values are not passed to the fitting procedure if an offset is provided. 

    if (length(offset) && attr(mt, "intercept") > 0L) {
        fit2 <- eval(call(if (is.function(method)) "method" else method, 
            x = X[, "(Intercept)", drop = FALSE], y = Y, weights = weights, 
            offset = offset, family = family, control = control,
            intercept = TRUE))
        if (!fit2$converged) 
            warning("fitting to calculate the null deviance did not converge -- increase 'maxit'?")
        fit$null.deviance <- fit2$deviance
    }

In the deviance recalculation of glm the potentially provided start/mustart/etastart values are not passed into the call to glm.fit. This leads to glm.fit failing with 

    if (!is.finite(dev)) {
        if (is.null(coefold)) 
          stop("no valid set of coefficients has been found: please supply starting values", 
            call. = FALSE)

In my test case, dev is calculated as NaN (many values of dev.resids(y, mu, weights) are NaN. This is due to 

    (log(ifelse(y == 0, 1, y)/mu)

since some calculated values of mu are negative (hence the log fails). I can't seem to trace back where etastart and mustart flow through to eta and mu and how they interact with offset, but given the next bit, it seems this is the cause...

Ben Bolker (http://stackoverflow.com/a/8221108/4168169) provided a patch that seems to remedy this by adding 

    start = start[1], etastart = etastart[1], mustart = mustart[1],

into the call to method (default: glm.fit) and in testing this seems to work well; the deviance remains finite and the solution is reached. Is this a correct addition to the code? i.e. is the calculation of mu for the purposes of re-calculating the deviance robust?

Kind regards,

- Jonathan Carroll.