Bug 16056 - Cook's distance inconsistent for weighted linear models
Summary: Cook's distance inconsistent for weighted linear models
Status: NEW
Alias: None
Product: R
Classification: Unclassified
Component: Analyses (show other bugs)
Version: R 3.1.2
Hardware: x86_64/x64/amd64 (64-bit) Other
: P5 enhancement
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2014-11-02 15:42 UTC by Alex Reinhart
Modified: 2014-11-02 15:42 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Reinhart 2014-11-02 15:42:01 UTC
The Cook's distances given in plot.lm are inconsistent for weighted least squares.

Randomly generated test case:

x <- c(0.254, 0.638, 0.957, 0.553, 0.983, 0.511, 0.933, 0.428, 0.486, 0.382)
y <- c(7.010, 8.400, 11.769, 7.491, 11.925, 7.197, 10.829, 7.673, 7.106, 7.160)
weights <- c(0.224, 1.188, 1.543, 0.075, 1.070, 0.648, 0.185, 0.059, 0.136, 0.271)

out <- lm(y ~ x, weights=weights)

par(mfrow=c(1,2))
plot(out, c(4,5))

The Cook's Distance plot shows that point 1 has a Cook's distance between 3 and 3.5, but the Residuals vs Leverage contours show that it is between 0.5 and 1.

cooks.distance(out) and influence.measures(out) both give output matching the Residuals vs Leverage plot. The difference results from the Residuals vs Leverage plot using weighted residuals and the Cook's Distance plot using the unweighted residuals. This can be seen in plot.lm, where the Cook's distances are calculated with

        if (any(show[4L:6L])) {
            cook <- if (isGlm) 
                cooks.distance(x)
            else cooks.distance(x, sd = s, res = r)
        }

where r is residuals(x), rather than weighted.residuals(x).

I am not sure which definition of Cook's distance is correct for weighted least squares, but at the least the plots should be consistent with each other.

> version
               _                           
platform       x86_64-apple-darwin10.8.0   
arch           x86_64                      
os             darwin10.8.0                
system         x86_64, darwin10.8.0        
status                                     
major          3                           
minor          1.2                         
year           2014                        
month          10                          
day            31                          
svn rev        66913                       
language       R                           
version.string R version 3.1.2 (2014-10-31)
nickname       Pumpkin Helmet