Bugzilla – Full Text Bug Listing

Summary: | plot.lm cooks distrance contour | ||
---|---|---|---|

Product: | R | Reporter: | Randy Lai <randy.cs.lai> |

Component: | Analyses | Assignee: | R-core <R-core> |

Status: | CLOSED FIXED | ||

Severity: | normal | CC: | k_bugzilla, maechler, pd.mes |

Priority: | P5 | ||

Version: | R 3.3.* | ||

Hardware: | Other | ||

OS: | Other | ||

Attachments: | patch plot.lm |

Description
Randy Lai
2016-10-07 20:16:03 UTC
To be specific, the cause is that the divisor p in Cook's distance has been omitted. (The proportionality factor is rstd/p rather than rstd). The patch seems sensible. Side question: cooks.distance.lm has p <- model$rank plot.lm has p <- length(coef(x)) which won't normally differ, but can they differ? If they can, which one is correct? I think they ought to be consistent. (In reply to Peter Dalgaard from comment #1) > To be specific, the cause is that the divisor p in Cook's distance has been > omitted. (The proportionality factor is rstd/p rather than rstd). The patch > seems sensible. Thank you Peter, for checking. > > Side question: > > cooks.distance.lm has p <- model$rank > plot.lm has p <- length(coef(x)) > > which won't normally differ, but can they differ? If they can, which one is > correct? I think they ought to be consistent. Yes, they can differ quite a bit as soon as you have singularities. I'm pretty sure model$rank is correct not the least because that's also used by summary.lm(). That, BTW, returns a 'df' component which is (from help page): df: degrees of freedom, a 3-vector (p, n-p, p*), the first being the number of non-aliased coefficients, the last being the total number of coefficients. and p is rank whereas p* is length(coef(.)). As I am listed as one "perpetrator" co-authoring this monster-long R function, I will commit the change (mixed with another small influence() related potential speedup). Martin While at it, I am just chipping in here with a polite request to check. In summary.lm objects we have for the 'df' component two assignments: case p = 0: ans$df <- c(0L, n, length(ans$aliased)) any other p: ans$df <- c(p, rdf, NCOL(Qr$qr)) I believe, for p!=0, p* = NCOL(Qr$qr) gives the right answer as any variables belonging to aliased coefficients are kept in Qr$qr (same goes for p=0 with length(ans$aliased). Maybe we want to be a little bit more consistent here and have length(ans$aliased) for p!=0 here as well? It seems a little more expressive to me. Mhh, also in summary.lm we have for p!=0: n <- NROW(Qr$qr) and later again a call to ans$df <- c(p, rdf, NCOL(Qr$qr)) that 'n' seems to be used in the adjusted R-squared calculation: ans$adj.r.squared <- 1 - (1 - ans$r.squared) * ((n - df.int)/rdf) ...but should it be the the number of coefficients without the aliased ones? Ok, forget that last comment. Ver sorry, I got all mixed up by NCOL and NROW... |