17161
2016-10-07 20:16:03 +0000
plot.lm cooks distrance contour
2016-10-12 11:47:03 +0000
1
1
1
Unclassified
R
Analyses
R 3.3.*
Other
Other
CLOSED
FIXED
P5
normal
---
1
randy.cs.lai
R-core
k_bugzilla
maechler
pd.mes
oldest_to_newest
92717
0
2170
randy.cs.lai
2016-10-07 20:16:03 +0000
Created attachment 2170
patch plot.lm
lm.SR <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)
plot(lm.SR, which = 6)
The plot shows that Japan has a standard residual between 0.5 and 1.
But the actual value should be
> rstandard(lm.SR)['Japan']
Japan
1.575955
It is caused by the incorrect formula used in computing the standard residuals. Check the attached patch.
92724
1
pd.mes
2016-10-11 08:46:12 +0000
To be specific, the cause is that the divisor p in Cook's distance has been omitted. (The proportionality factor is rstd/p rather than rstd). The patch seems sensible.
Side question:
cooks.distance.lm has p <- model$rank
plot.lm has p <- length(coef(x))
which won't normally differ, but can they differ? If they can, which one is correct? I think they ought to be consistent.
92725
2
maechler
2016-10-11 16:05:01 +0000
(In reply to Peter Dalgaard from comment #1)
> To be specific, the cause is that the divisor p in Cook's distance has been
> omitted. (The proportionality factor is rstd/p rather than rstd). The patch
> seems sensible.
Thank you Peter, for checking.
>
> Side question:
>
> cooks.distance.lm has p <- model$rank
> plot.lm has p <- length(coef(x))
>
> which won't normally differ, but can they differ? If they can, which one is
> correct? I think they ought to be consistent.
Yes, they can differ quite a bit as soon as you have singularities.
I'm pretty sure model$rank is correct not the least because that's also used by
summary.lm().
That, BTW, returns a 'df' component which is (from help page):
df: degrees of freedom, a 3-vector (p, n-p, p*), the first being
the number of non-aliased coefficients, the last being the
total number of coefficients.
and p is rank whereas p* is length(coef(.)).
As I am listed as one "perpetrator" co-authoring this monster-long R function,
I will commit the change (mixed with another small influence() related potential speedup).
Martin
92733
3
k_bugzilla
2016-10-12 11:37:04 +0000
While at it, I am just chipping in here with a polite request to check. In summary.lm objects we have for the 'df' component two assignments:
case p = 0: ans$df <- c(0L, n, length(ans$aliased))
any other p: ans$df <- c(p, rdf, NCOL(Qr$qr))
I believe, for p!=0, p* = NCOL(Qr$qr) gives the right answer as any variables belonging to aliased coefficients are kept in Qr$qr (same goes for p=0 with length(ans$aliased). Maybe we want to be a little bit more consistent here and have length(ans$aliased) for p!=0 here as well? It seems a little more expressive to me.
92734
4
k_bugzilla
2016-10-12 11:43:59 +0000
Mhh, also in summary.lm we have for p!=0:
n <- NROW(Qr$qr)
and later again a call to
ans$df <- c(p, rdf, NCOL(Qr$qr))
that 'n' seems to be used in the adjusted R-squared calculation:
ans$adj.r.squared <- 1 - (1 - ans$r.squared) * ((n - df.int)/rdf)
...but should it be the the number of coefficients without the aliased ones?
92735
5
k_bugzilla
2016-10-12 11:47:03 +0000
Ok, forget that last comment. Ver sorry, I got all mixed up by NCOL and NROW...
2170
2016-10-07 20:16:03 +0000
2016-10-07 20:16:03 +0000
patch plot.lm
patch.diff
text/plain
736
randy.cs.lai
ZGlmZiAtLWdpdCBhL3NyYy9saWJyYXJ5L3N0YXRzL1IvcGxvdC5sbS5SIGIvc3JjL2xpYnJhcnkv
c3RhdHMvUi9wbG90LmxtLlIKaW5kZXggYThhNTdiYy4uYWY4YzhkNyAxMDA2NDQKLS0tIGEvc3Jj
L2xpYnJhcnkvc3RhdHMvUi9wbG90LmxtLlIKKysrIGIvc3JjL2xpYnJhcnkvc3RhdHMvUi9wbG90
LmxtLlIKQEAgLTMwMiwxNCArMzAyLDE0IEBAIGZ1bmN0aW9uICh4LCB3aGljaCA9IGMoMUw6M0ws
NUwpLCAjIyB3YXMgd2hpY2ggPSAxTDo0TCwKIAl5bWF4IDwtIHVzcls0TF0KIAlmb3IoaSBpbiBz
ZXFfYWxvbmcoYnZhbCkpIHsKIAkgICAgYmkyIDwtIGJ2YWxbaV1eMgotCSAgICBpZih5bWF4ID4g
YmkyKnhtYXgpIHsKKwkgICAgaWYocCp5bWF4ID4gYmkyKnhtYXgpIHsKIAkJeGkgPC0geG1heCAr
IHN0cndpZHRoKCIgIikvMwotCQl5aSA8LSBiaTIqeGkKKwkJeWkgPC0gYmkyKnhpL3AKIAkJYWJs
aW5lKDAsIGJpMiwgbHR5ID0gMikKIAkJdGV4dCh4aSwgeWksIHBhc3RlKGJ2YWxbaV0pLCBhZGog
PSAwLCB4cGQgPSBUUlVFKQogCSAgICB9IGVsc2UgewogCQl5aSA8LSB5bWF4IC0gMS41KnN0cmhl
aWdodCgiICIpCi0JCXhpIDwtIHlpL2JpMgorCQl4aSA8LSBwKnlpL2JpMgogCQlsaW5lcyhjKDAs
IHhpKSwgYygwLCB5aSksIGx0eSA9IDIpCiAJCXRleHQoeGksIHltYXgtMC44KnN0cmhlaWdodCgi
ICIpLCBwYXN0ZShidmFsW2ldKSwKIAkJICAgICBhZGogPSAwLjUsIHhwZCA9IFRSVUUpCg==