Bug 17419 - unlist does not yield expected results on nested list of factors
Summary: unlist does not yield expected results on nested list of factors
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Language (show other bugs)
Version: R 3.5.0
Hardware: x86_64/x64/amd64 (64-bit) Other
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2018-05-10 13:20 UTC by swnydick
Modified: 2018-06-11 13:55 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description swnydick 2018-05-10 13:20:24 UTC
Partial duplicate of 12572, which was reported 10 years ago, although example 1) did not result in an error at that time but something similar to 2). This is different from 17418, and will have to have a different solution, so is reported as a different bug.

Examples:

1)

Steps to reproduce:

> x <- list(list(factor("A")))
> unlist(x)

What happens:

Error in as.character.factor(x) : malformed factor

What should happen:

It should return

[1] A
Levels: A

2)

Steps to reproduce:

> x <- list(list(factor("A")), factor("B"))
> unlist(x)

What happens:

[1] <NA> B   
Levels: B

What should happen:

It should return

[1] A B
Levels: A B
Comment 1 Martin Maechler 2018-05-23 13:06:33 UTC
This is messy:

I see a design-bug here:
unlist() calls .Internal(islistfactor(x, recursive))  which for the default `recursive = TRUE` is true  iff  the list/expression `x` corresponds to a tree where all leaves are factors.

However, in the case this is true, the current code in unlist() assumes that 
x is balanced tree of depth 1, i.e.,  list(<leaf>, <leaf>, ..., <leaf>) where a leaf is any non-recursive R object,
and so that code can work with simple lapply(x, ...)  which of course does not work when the assumption is wrong.
I think thatcode is correct for  recursive=FALSE  but that's the easy case and 
 not the default.

---------
However: I bet that 99% of even somewhat experienced R users are not aware of this extra hoop jumping that unlist() performs in all cases [though, .Internal(islistfactor()) is typically very fast notably when it returns FALSE],
and I think we should consider providing the simple version

unlist0 <- function(x, recursive=TRUE, use.names=TRUE)
    .Internal(unlist(x, recursive, use.names))

which does not do any hoop jumping.

To solve the problem, I think we can use a version of the  dendrapply() internal function Napply(), basically replacing  '!is.leaf(.)'  by  'is.recursive(.)' ..

Thoughts ?
Comment 2 Suharto Anggono 2018-06-01 14:39:26 UTC
(In reply to Martin Maechler from comment #1)
> This is messy:
> 
> I see a design-bug here:
> unlist() calls .Internal(islistfactor(x, recursive))  which for the default
> `recursive = TRUE` is true  iff  the list/expression `x` corresponds to a
> tree where all leaves are factors.
> 
> However, in the case this is true, the current code in unlist() assumes that 
> x is balanced tree of depth 1, i.e.,  list(<leaf>, <leaf>, ..., <leaf>)
> where a leaf is any non-recursive R object,
> and so that code can work with simple lapply(x, ...)  which of course does
> not work when the assumption is wrong.
> I think thatcode is correct for  recursive=FALSE  but that's the easy case
> and 
>  not the default.
> 
[snip]
> 
> To solve the problem, I think we can use a version of the  dendrapply()
> internal function Napply(), basically replacing  '!is.leaf(.)'  by 
> 'is.recursive(.)' ..
> 
> Thoughts ?

For recursive=TRUE, because operation would be only on leaves, I think 'rapply' would work (mentioned in Bug 12572 Description).
Comment 3 Martin Maechler 2018-06-02 06:30:36 UTC
(In reply to Suharto Anggono from comment #2)

> For recursive=TRUE, because operation would be only on leaves, I think
> 'rapply' would work (mentioned in Bug 12572 Description).

You are right -- thank you very much, Suharto! --  indeed already mentioned in PR#12572.

==> planning a change that uses 'rapply' when 'recursive=TRUE'
Comment 4 Suharto Anggono 2018-06-10 08:02:53 UTC
The code of function 'unlist' in R devel r74876 has this.
     URapply <-
            if(recursive) # use rapply()
                 function(x, Fn) .Internal(unlist(rapply(x, Fn), recursive, FALSE))
            else function(x, Fn) .Internal(unlist(lapply(x, Fn), recursive, FALSE))

By default, 'rapply' uses how="unlist" that does unlist-ing by itself.
For 'URapply' above, 'rapply' with how="list", that doesn't do unlist-ing, can be used.


Another thing: Function 'rapply' in R devel r74876 still doesn't handle component that is a list of length >= 2^31 because of a use of 'length' instead of 'xlength' in function 'do_one' in apply.c :
	R_xlen_t n = length(X);
Comment 5 Martin Maechler 2018-06-11 13:55:22 UTC
(In reply to Suharto Anggono from comment #4)

You are right on spot with both small issues, and I've now committed the corresponding changes to R-devel svn c74882 .
Thanks a lot!