The function simulate.lm() uses wrong variance is some case (e.g. gaussian(link = "log"), but also perhaps when one uses custom prior weights in all lm and glm), which lead to the incorrect simulation. The bug applies to all versions of R since 2009 (so since I think R 2.8)!! Minimal reproducible example: set.seed(1L) var(y <- rnorm(n = 1000, mean = 10, sd = sqrt(10))) ## 10.71051 --> OK mod_glm <- glm(y ~ 1, family = gaussian(link = "log")) var(simulate(mod_glm)[, 1]) ## 0.1166989 --> Wrong: it is way too far from variance in the original data var(simulate.lm_patched(mod_glm)[, 1]) ## see below ## 11.38721 --> Seems correct Bug origin: It has been introduced in commit "update to similate.lm" on the 18/02/2009 by Brian Ripley git-svn-id: https://svn.r-project.org/R/trunk@47956 00db46b3-68df-0310-9c12-caf00c1e9a41 In the definition of simulate.lm() in src/library/stats/R/lm.R The line if (!is.null(object$weights)) vars <- vars/object$weights should instead, very likely be: if (!is.null(object$prior.weights)) vars <- vars/object$prior.weights (which is how I create the function simulate.lm_patched used above) The confusion between "prior.weights" and "weights" is clear in the history of the development of this function: In commit by Martin Maechler "first cut at simulate(<glm>)" of 13/02/2009 git-svn-id: https://svn.r-project.org/R/trunk@47919 00db46b3-68df-0310-9c12-caf00c1e9a41 , we can see that he specifically used prior weights for some families before Ripley generalized the code: "binomial" = { wts <- object$prior.weights if (any(wts %% 1 != 0)) stop("cannot simulate from non-integer prior.weights") rbinom(ntot, size = wts, prob = ftd)/wts } The confusion is not surprising because prior weights are called weights in the arguments of the lm call but prior.weights internally, while what is called weights internally is something else (working weights). Note: The bug has remained unnoticed because when gaussian(link = "identity") and when the prior weights are all set to 1 (most common situation) then there is no issue. It would be good if someone that knowns glm better than I do could make sure that the patch is correct. Best, Alex

Thank you, Alex. I agree this is a bug, and your correction seems fine. I'm running checks now with a patch (including regression check).

(In reply to Martin Maechler from comment #1) > ... and your correction seems fine. actually not: Need to consider *both* $weights and $prior.weights .. and of course it is obvious how

Required change is only obvious in hindsight: An lm() and a glm(<gaussian>) must be treated separately here, the first using `weights`, the latter `prior.weights`. Fixed in R-devel (svn 74683). Planned to port to R 3.5.0 patched after a waiting time.

Thanks a lot Martin for fixing this bug!