Bugzilla – Bug 14837
plot.lm(which=5) residuals are in the wrong groups (Possibly the same as Bug 14545, which is marked as closed)
Last modified: 2012-06-18 12:42:20 UTC
Created attachment 1276 [details]
csv file with the data set
In the attached example, most of the residuals are in the wrong groups, or the groups are incorrectly labeled.
> expData<-read.csv("example.csv", header=T, sep=";")
treatment block result
1 a 1 30
2 a 2 31
3 b 1 22
4 b 2 20
5 c 1 42
6 c 2 41
7 d 1 61
8 d 2 59
9 e 1 11
10 e 2 12
With the last command I get the attached plot. Only the residuals 3 and 4 are in the right group. (Note: I modified id.n in plot.lm so that it labels 10 residuals instead of the standard 3)
Or is there something wrong with my reasoning?
Created attachment 1277 [details]
pdf with the resulting plot
I can confirm the bug. The problem is that an attempt is made to sort factor levels according to the mean for that level, but the sorting is done inconsistently, so the labels don't match the data.
I can understand why this might be a good idea (variances depending on the mean are a pretty common problem), but I don't think the plot is successful at displaying this, I think it's just confusing. For example, if there are two factors in the model (e.g. by making block into a factor in the posted example), only the levels for one of them will be shown in the labels. Plot types 1 and 3 display heterscedasticity well, so I'm going to remove the sorting.
Fixed for R 2.15.1 and later