Bug 16238 - Variable names with spaces or () fail with get_all_vars due to lack of quotes
Summary: Variable names with spaces or () fail with get_all_vars due to lack of quotes
Status: NEW
Alias: None
Product: R
Classification: Unclassified
Component: Models (show other bugs)
Version: R 3.1.2
Hardware: Other Other
: P5 trivial
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2015-03-05 10:11 UTC by Max Gordon
Modified: 2015-03-05 10:11 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Max Gordon 2015-03-05 10:11:41 UTC
There seems to have been a change in the stats::get_all_vars that occurred after 2014-11-28 that causes atypical variable names to fail. By atypical I mean quoted variables such as `my variable` or `my(variable)`. This seems to be due to a lack of quotes. Changing the line:

inp <- parse(text = paste("list(", paste(varnames, collapse = ","), ")"))

to

inp <- parse(text = paste0("list(`", paste(varnames, collapse = "`,`"),  "`)"))

Here is an example that illustrates the bug:

set.seed(1)
data <- data.frame(A = rnorm(10), 
                   B = rnorm(10), 
                   C = rnorm(10))
data$`Named var` <- rnorm(10)
data$`Named(type=B)` <- rnorm(10)

# This works as expected
fit <- lm(A + B ~ C, data = data)
get_all_vars(fit, data = data)
#             A           B           C
# 1  -0.6264538  1.51178117  0.91897737
# 2   0.1836433  0.38984324  0.78213630
# 3  -0.8356286 -0.62124058  0.07456498
# 4   1.5952808 -2.21469989 -1.98935170
# 5   0.3295078  1.12493092  0.61982575
# 6  -0.8204684 -0.04493361 -0.05612874
# 7   0.4874291 -0.01619026 -0.15579551
# 8   0.7383247  0.94383621 -1.47075238
# 9   0.5757814  0.82122120 -0.47815006
# 10 -0.3053884  0.59390132  0.41794156

# The following fails
fit_space <- update(fit, .~.+`Named var`)
get_all_vars(fit_space, data = data)
# Error in parse(text = paste("list(", paste(varnames, collapse = ","),  : 
#                               <text>:1:19: unexpected symbol
#                             1: list( A,B,C,Named var
#                                      ^

# As does this one
fit_fn <- update(fit, .~.+`Named(type=B)`)
get_all_vars(fit_fn, data = data)
# Error in eval(expr, envir, enclos) : could not find function "Named"

Here is the suggested alternative with the fix:

alt_get_all_vars <- function (formula, data = NULL, ...) 
{
  if (missing(formula)) {
    if (!missing(data) && inherits(data, "data.frame") && 
        length(attr(data, "terms"))) 
      return(data)
    formula <- as.formula(data)
  }
  else if (missing(data) && inherits(formula, "data.frame")) {
    if (length(attr(formula, "terms"))) 
      return(formula)
    data <- formula
    formula <- as.formula(data)
  }
  formula <- as.formula(formula)
  if (missing(data)) 
    data <- environment(formula)
  else if (!is.data.frame(data) && !is.environment(data) && 
           !is.null(attr(data, "class"))) 
    data <- as.data.frame(data)
  else if (is.array(data)) 
    stop("'data' must be a data.frame, not a matrix or an array")
  if (!inherits(formula, "terms")) 
    formula <- terms(formula, data = data)
  env <- environment(formula)
  rownames <- .row_names_info(data, 0L)
  varnames <- all.vars(formula)
  inp <- parse(text = paste0("list(`", paste(varnames, collapse = "`,`"), 
                            "`)"))
  variables <- eval(inp, data, env)
  if (is.null(rownames) && (resp <- attr(formula, "response")) > 
      0) {
    lhs <- variables[[resp]]
    rownames <- if (is.matrix(lhs)) 
      rownames(lhs)
    else names(lhs)
  }
  extras <- substitute(list(...))
  extranames <- names(extras[-1L])
  extras <- eval(extras, data, env)
  x <- setNames(as.data.frame(c(variables, extras), optional = TRUE), 
                c(varnames, extranames))
  if (!is.null(rownames)) 
    attr(x, "row.names") <- rownames
  x
}
alt_get_all_vars(fit_space, data = data)
alt_get_all_vars(fit_fn, data = data)

I've looked at the bugs #14905 and #13624 that also report problems with the stats::get_all_vars but they seem to address different issues. 

As a minor side-note I find the paste0("list(", ..., ")") hard to read, it seems that sprintf() would be a more intuitive alternative, e.g. sprintf("list(`%s`)", paste(varnames, collapse = "`,`"))). Although I appreciate that a lot of R-users aren't familiar with sprintf().