Bug 17235 - named arguments in terms.formula (termsform in stats/model.c)
Summary: named arguments in terms.formula (termsform in stats/model.c)
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Models (show other bugs)
Version: R-devel (trunk)
Hardware: All Linux
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2017-03-13 15:24 UTC by Achim Zeileis
Modified: 2017-03-29 10:56 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Achim Zeileis 2017-03-13 15:24:35 UTC
Currently, in both R-release and R-devel terms.formula incorrectly handles some terms as duplicated. This seems due to a problem with termsform from stats/src/model.c but so far I couldn't track down the precise problem.

Specifically, only the order of the arguments in function calls seems to be checked but not their names. Therefore, the terms f(x, a = z) and f(x, b = z) are deemed to be duplicated and one of the terms is thus dropped.

R> attr(terms(y ~ f(x, a = z) + f(x, b = z)), "term.labels")
[1] "f(x, a = z)"

However, changing the arguments or the order of arguments keeps both 
terms:

R> attr(terms(y ~ f(x, a = z) + f(x, b = zz)), "term.labels")
[1] "f(x, a = z)"  "f(x, b = zz)"
R> attr(terms(y ~ f(x, a = z) + f(b = z, x)), "term.labels")
[1] "f(x, a = z)" "f(b = z, x)"

We (= Nikolaus Umlauf and myself) came across this problem when setting up certain smooth regressors with different kinds of patterns. As a trivial simplified example we can generate the same kind of problem with rep(). Consider the two dummy variables rep(x = 0:1, each = 4) and rep(x = 0:1, times = 4). With the response y = 1:8 I get:

R> lm((1:8) ~ rep(x = 0:1, each = 4) + rep(x = 0:1, times = 4))

Call:
lm(formula = (1:8) ~ rep(x = 0:1, each = 4) + rep(x = 0:1, times = 4))

Coefficients:
            (Intercept)  rep(x = 0:1, each = 4)
                    2.5                     4.0

So while the model is identified because the two regressors are not the 
same, terms.fomula does not recognize this and drops the second regressor. 
What I would have wanted can be obtained by switching the arguments:

R> lm((1:8) ~ rep(each = 4, x = 0:1) + rep(x = 0:1, times = 4))

Call:
lm(formula = (1:8) ~ rep(each = 4, x = 0:1) + rep(x = 0:1, times = 4))

Coefficients:
             (Intercept)   rep(each = 4, x = 0:1)  rep(x = 0:1, times = 4)
                       2                        4                        1

Of course, here I could avoid the problem by setting up proper factors 
etc. But I hope this should make the general problem clear.
Comment 1 Duncan Murdoch 2017-03-29 10:56:53 UTC
Fixed in R-devel and R 3.4.0.