Bug 14992 - row names of model.matrix
Summary: row names of model.matrix
Status: REOPENED
Alias: None
Product: R
Classification: Unclassified
Component: Wishlist (show other bugs)
Version: R 2.15.x
Hardware: All All
: P5 enhancement
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2012-07-16 10:28 UTC by Sebastian Meyer
Modified: 2017-06-20 14:22 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sebastian Meyer 2012-07-16 10:28:28 UTC
Dear R core,

I just observed that model.matrix is inconsistent in returning row.names.
Here is an example (taken from the help page):

ff <- log(Volume) ~ log(Height) + log(Girth)
utils::str(m <- model.frame(ff, trees))
mat <- model.matrix(ff, m)

Here, the model.matrix derived the rownames from m. This is also true for, e.g.,

model.matrix(~ log(Height), m)

but rownames() are NULL in the cases

model.matrix(~ 1, m)
or
model.matrix(~ 0, m)

I think that currently it is not specified / documented, which row names the model.matrix should have. I would suggest to always derive the row names from the corresponding model.frame.

Best regards,
   Sebastian Meyer



R Version:
platform = x86_64-unknown-linux-gnu
arch = x86_64
os = linux-gnu
system = x86_64, linux-gnu
status = Under development (unstable)
major = 2
minor = 16.0
year = 2012
month = 07
day = 11
svn rev = 59772
language = R
version.string = R Under development (unstable) (2012-07-11 r59772)
nickname = Unsuffered Consequences
Comment 1 Martin Maechler 2012-07-20 14:56:32 UTC
Thank you, for the clear report.

Though I don't see how the current behavior can become a problem,
I found it pretty easy to fix, and I do agree that consistency is desirable.
Comment 2 Brian Ripley 2012-07-21 06:41:48 UTC
Actually, this change is a problem.  It changes the output in several packages, including betareg and gstat.
Comment 3 Sebastian Meyer 2017-06-20 14:22:04 UTC
Now, 5 years later, I again stumbled upon this issue, finding that the fix in r59911 actually does not do what the NEWS (of R 2.15.2) promise:

"model.matrix(~1, ...) now also contains the same rownames that less trivial formulae produce. (Wish of PR#14992, changes the output of several packages.)"

In fact, model.matrix.default(~ 1, ...) simply has row.names equal to automatic row.names, not using the ones from the underlying model.frame.
Here is an example:

# set some row.names to see what happens
row.names(trees) <- 42 + seq_len(nrow(trees))
ff <- log(Volume) ~ log(Height) + log(Girth)
m <- model.frame(ff, trees)
model.matrix(~ log(Height), m)
model.matrix(~ 1, m)

To fix this,

> data <- data.frame(x=rep(0, nrow(data)))
in model.matrix.default would have to be modified to retain the original row.names. Maybe by adding the argument row.names = row.names(data) or by switching to something like

> data[["x"]] <- rep.int(0, nrow(data))

A change might again affect the output of some packages...