Bug 16119 - Wishlist: In lm/glm, don't ignore user-specified contrasts silently
Summary: Wishlist: In lm/glm, don't ignore user-specified contrasts silently
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Wishlist (show other bugs)
Version: R 3.1.2
Hardware: ix86 (32-bit) Windows 32-bit
: P5 enhancement
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2014-12-26 02:41 UTC by Suharto Anggono
Modified: 2014-12-27 20:26 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Suharto Anggono 2014-12-26 02:41:12 UTC
This is an example.

> set.seed(100)
> y <- rnorm(9)
> x <- factor(c(
+   "a",   "a",   "a",   "b",   "b",   "b",   "b",   "c",   "c"
+ ), levels = c("a", "b", "c"))
> sel <- c(
+  TRUE,  TRUE, FALSE,  TRUE,  TRUE,  TRUE, FALSE, FALSE, FALSE
+ )
> table(x, sel)
   sel
x   FALSE TRUE
  a     1    2
  b     1    3
  c     2    0

So, x takes value "c" entirely when sel is FALSE.

I want "b" as base category for x. So, I set contrasts.

> contrasts(x) <-
+ contr.treatment(levels(x), contrasts=FALSE)[, -2, drop=FALSE]
> contrasts(x)
  a c
a 1 0
b 0 0
c 0 1

> options("contrasts")
$contrasts
        unordered           ordered
"contr.treatment"      "contr.poly"

> lm(y ~ x, subset=sel)

Call:
lm(formula = y ~ x, subset = sel)

Coefficients:
(Intercept)           xb
    -0.1853       0.6261

> lm(y ~ x, subset=sel, singular.ok=FALSE)

Call:
lm(formula = y ~ x, subset = sel, singular.ok = FALSE)

Coefficients:
(Intercept)           xb
    -0.1853       0.6261


The result of 'lm' above has coefficient for category "b" of x. So, "b" is not base category. The first time I encountered this, I was surprised.

I see that it happens because 'lm' calls 'model.frame' with drop.unused.levels = TRUE. In function 'model.frame.default', if drop.unused.levels is TRUE and not all levels is present in a factor, [, drop = TRUE] is applied. In this example, after applying subset=sel, category "c" is not present. In function '[.factor', if drop is TRUE, the result doesn't have "contrasts" attribute. When there is no contrasts specified, options("contrasts") is used, in this case "contr.treatment". So, "a", the first category, is the base category.

I can understand that '[.factor' with drop = TRUE drops "contrasts" attribute. If the number of levels is reduced, the original contrasts matrix is no longer valid.

What I grumble about is what is done by 'lm'. I specify contrasts with a purpose, but 'lm' doesn't respect my specification, and it is silent when doing that. In this example, a way to achieve what I want is using 'relevel'. But still, I wish 'lm' not to ignore user-specified contrasts silently. Function 'glm' also does it.

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
Comment 1 Duncan Murdoch 2014-12-27 20:26:06 UTC
I'll add a warning to model.frame() in R-devel and R-patched.