Bug 12572 - unlist on nested lists of factors
Summary: unlist on nested lists of factors
Status: NEW
Alias: None
Product: R
Classification: Unclassified
Component: Misc (show other bugs)
Version: old
Hardware: All All
: P5 normal
Assignee: Jitterbug compatibility account
URL:
Depends on:
Blocks:
 
Reported: 2008-08-20 18:20 UTC by Jitterbug compatibility account
Modified: 2008-08-20 18:20 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jitterbug compatibility account 2008-08-20 18:20:43 UTC
From: Dan Davison <davison@stats.ox.ac.uk>
Here is a description and a proposed solution for a bug in unlist().

I've used version 2.7.2 RC (2008-08-18 r46382) to look at this, under
linux.

unlist(recursive=TRUE) incorrectly returns a factor with zero levels
when passed either a nested list of factors, or a data frame
containing only factor columns. You can't print() the result.

x <- list(list(v=factor("a")))
str(unlist(x))
## Factor w/ 0 levels: NA
## - attr(*, "names")= chr "v"
## Warning message:
## In str.default(unlist(x)) : 'object' does not have valid levels() 
y <- list(data.frame(v=factor("a")))
str(unlist(y))
## Factor w/ 0 levels: NA
## - attr(*, "names")= chr "v"
## Warning message:
## In str.default(unlist(y)) : 'object' does not have valid levels()

unlist is defined as

unlist <- function(x, recursive=TRUE, use.names=TRUE)
{
    if(.Internal(islistfactor(x, recursive))) {
        lv <- unique(.Internal(unlist(lapply(x, levels), recursive, FALSE)))
        nm <- if(use.names) names(.Internal(unlist(x, recursive, use.names)))
        res <- .Internal(unlist(lapply(x, as.character), recursive, FALSE))
        res <- match(res, lv)
        ## we cannot make this ordered as level set may have been changed
        structure(res, levels=lv, names=nm, class="factor")
    } else .Internal(unlist(x, recursive, use.names))
}

The error occurs because, in both cases, at the C level, islistfactor
recurses and finds that all elements are factors, and the if test
condition is TRUE. However, the two instances of lapply do not
recurse, and return inappropriate results. A possible solution is to
replace both instances of lapply with rapply. This results in
appropriate factor answers in this case:

str(unlist(x))
## Factor w/ 1 level "a": 1
## - attr(*, "names")= chr "v"
str(unlist(y))
## Factor w/ 1 level "a": 1
## - attr(*, "names")= chr "v"

An alternative is to not return a factor result, by altering the if
test condition so that nested lists of factors, and lists of
factor-only data frames, fail.


Dan

-- 
www.stats.ox.ac.uk/~davison