Bug 14905 - get_all_vars mislabels output when matrices/vectors are involved
Summary: get_all_vars mislabels output when matrices/vectors are involved
Status: NEW
Alias: None
Product: R
Classification: Unclassified
Component: Models (show other bugs)
Version: R 2.14.2
Hardware: ix86 (32-bit) Linux
: P5 minor
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2012-05-03 20:19 UTC by Patrick Breheny
Modified: 2012-05-03 20:19 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Patrick Breheny 2012-05-03 20:19:22 UTC
The code:

Y <- data.frame(A=rnorm(10),B=rnorm(10))
x <- rnorm(10)
fit <- lm(Y[,1]~x)
get_all_vars(fit)

should return the same value as cbind(Y,x):

            A          B           x
1   2.8279453  0.3753366  0.16995954
2   2.4686978 -0.6721246 -0.20518980
3   1.4333861  0.9594093 -0.80190060
4   1.1557163 -0.9704202 -1.55890193
5   1.3446549 -0.8448964  0.50147239
6  -0.7310629 -0.8650427 -1.05356585
7  -2.2478913 -1.0978145 -0.96344266
8   0.5074897  0.9409493  0.43515657
9   1.2325071 -1.0028124  1.18398620
10  0.5769455  0.1782655 -0.05159594

Instead we get:

           Y          x          NA
A  2.8279453  0.3753366  0.16995954
B  2.4686978 -0.6721246 -0.20518980
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs

Two problems here:

1) Column names should be "A", "B", and "x".  The values of x are actually in the third column above.

2) A, B, and x have been truncated (there should be 10 rows, not 2).

Both problems come from the labeling of rows/columns, not from the actual construction of the data frame.  Inspecting the code for get_all_vars, the line

names(x) <- c(varnames, extranames)

is causing problem 1).  The names of the data frame need to be varnames *unless one of the variables is a data frame*, in which case we need the names of the data frame to be concatenated.  To make this explicit, we need something like:

  vn <- NULL
  for (i in 1:length(variables))
    {
      if (is.data.frame(variables[[i]])) vn <- c(vn,names(variables[[i]]))
      else vn <- c(vn,varnames[i])
    }
  names(x) <- c(vn, extranames)

Certainly, there may be a more efficient way of doing this.

This may be related to (PR#13624).  As for 2), I am not sure if the fix for (PR#14847) takes care of this or not.

-- 
Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky