Hi, R documentation describes the factors attribute of the terms.object as follows: A matrix of variables by terms showing which variables appear in which terms. The entries are 0 if the variable does not occur in the term, 1 if it does occur and should be coded by contrasts, and 2 if it occurs and should be coded via dummy variables for all levels (as when an intercept or lower-order term is missing). If there are no terms other than an intercept and offsets, this is numeric(0). (http://stat.ethz.ch/R-manual/R-patched/library/stats/html/terms.object.html) In the example below, I would expect Species to have a value of 2 since the intercept is omitted. Indeed, when using model.matrix it is clear that Species has been coded with dummy variables for all three levels. f <- ~ -1 + Species attr(terms(f, data=iris), "factors") # Species #Species 1 levels(iris$Species) #[1] "setosa" "versicolor" "virginica" colnames(model.matrix(f, iris)) #[1] "Speciessetosa" "Speciesversicolor" "Speciesvirginica" I think this is a bug in the terms function. Many thanks in advance, Pat O'Reilly

Do you know of any model fits where the wrong terms are used because of this? Maybe it's just a documentation error.

As far as I can tell, the documentation has the right intention. However, I don't think the attribute is ever used except in interaction terms --- for main effects, one just looks at whether the model contains an intercept. I haven't actually checked the code, though. Assuming that I am right, matching the documentation to the code is probably possible ("2 if it occurs in an interaction term and...") but it might be clearer to fix the code.

I've not seen this affect the fitting of models but I intend to use the terms object as a source of metadata for a model matrix. If the factors attribute is correct, one can easily work from left to right through the factors matrix to determine what each column of the associated model matrix means. For example column 1 being Age interacted with Gender = "Male". If the factor matrix is incorrect then this becomes challenging.

The calculation for this is in function TermCode in model.c in the stats package. It says that the calculation follows the heuristic on p. 38 of the white book, but on checking, I see that it doesn't -- the case of no intercept is described there and our code doesn't do what it says. I'll fix it.

Now fixed in R-devel, soon in R-patched.

This change caused several packages to fail their tests, and has been backed out. We're discussing what to do next.

Duncan, I tried your fix and I think it is incorrect. f <- ~ -1 + Petal.Length*Species attr(terms(f, data=iris), "factors") # Petal.Length Species Petal.Length:Species #Petal.Length 2 0 1 #Species 0 1 1 Since Species is the first factor, this needs to be coded with dummy variables. The modelmatrix function works because although it uses a terms object to build the design matrix, it fixes the factors pattern matrix first. Line 451 in model.c: /* If there is no intercept we look through the factor pattern */ /* matrix and adjust the code for the first factor found so that */ /* it will be coded by dummy variables rather than contrasts. */ if (!intrcept) { for (j = 0; j < nterms; j++) { for (i = risponse; i < nVar; i++) { if (INTEGER(nlevs)[i] > 1 && INTEGER(factors)[i + j * nVar] > 0) { INTEGER(factors)[i + j * nVar] = 2; goto alldone; } } } }

Pat: What change did you try? Currently there are no changes to the code in R-devel or R-patched, as noted in comment 6. R-devel currently has updated documentation to explain the current behaviour, but the behaviour hasn't changed from earlier code.

Duncan, I found your fix on github: https://github.com/wch/r-source/blob/5722f120a941741a710ed965d90ddb41d03db9d2/src/library/stats/src/model.c Would it be a correct assessment to say that the no-intercept adjustment currently in the modelmatrix function should really be in either termcode or termsform?