Bug 14837 - plot.lm(which=5) residuals are in the wrong groups (Possibly the same as Bug 14545, which is marked as closed)
Summary: plot.lm(which=5) residuals are in the wrong groups (Possibly the same as Bug ...
Alias: None
Product: R
Classification: Unclassified
Component: Graphics (show other bugs)
Version: R 2.14.1
Hardware: ix86 (32-bit) Windows 32-bit
: P5 normal
Assignee: R-core
Depends on:
Reported: 2012-03-05 16:31 UTC by alavals
Modified: 2012-06-18 12:42 UTC (History)
3 users (show)

See Also:

csv file with the data set (104 bytes, application/vnd.ms-excel)
2012-03-05 16:31 UTC, alavals
pdf with the resulting plot (4.63 KB, application/pdf)
2012-03-05 16:35 UTC, alavals

Note You need to log in before you can comment on or make changes to this bug.
Description alavals 2012-03-05 16:31:43 UTC
Created attachment 1276 [details]
csv file with the data set

In the attached example, most of the residuals are in the wrong groups, or the groups are incorrectly labeled.

My commands:

> expData<-read.csv("example.csv", header=T, sep=";")
> expData
   treatment block result
1          a     1     30
2          a     2     31
3          b     1     22
4          b     2     20
5          c     1     42
6          c     2     41
7          d     1     61
8          d     2     59
9          e     1     11
10         e     2     12
> aov.expData<-aov(result~treatment+block,data=expData)
> plot(aov.expData,which=5)

With the last command I get the attached plot. Only the residuals 3 and 4 are in the right group. (Note: I modified id.n in plot.lm so that it labels 10 residuals instead of the standard 3)
Or is there something wrong with my reasoning?
Comment 1 alavals 2012-03-05 16:35:17 UTC
Created attachment 1277 [details]
pdf with the resulting plot
Comment 2 Duncan Murdoch 2012-04-18 13:20:10 UTC
I can confirm the bug.  The problem is that an attempt is made to sort factor levels according to the mean for that level, but the sorting is done inconsistently, so the labels don't match the data.

I can understand why this might be a good idea (variances depending on the mean are a pretty common problem), but I don't think the plot is successful at displaying this, I think it's just confusing.  For example, if there are two factors in the model (e.g. by making block into a factor in the posted example), only the levels for one of them will be shown in the labels.  Plot types 1 and 3 display heterscedasticity well, so I'm going to remove the sorting.
Comment 3 Martin Maechler 2012-06-18 12:40:04 UTC
Fixed for R 2.15.1 and later