Bug 17283 - Wishlist: no names in list column of 'aggregate.data.frame' result
Summary: Wishlist: no names in list column of 'aggregate.data.frame' result
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Wishlist (show other bugs)
Version: R 3.3.*
Hardware: All All
: P5 enhancement
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2017-06-05 15:50 UTC by Suharto Anggono
Modified: 2017-06-25 20:51 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Suharto Anggono 2017-06-05 15:50:56 UTC
Example, modified from "Compute the averages according to region and the occurrence of more than 130 days of frost" in "Examples" in R help on 'aggregate':
print.default(
aggregate(state.x77[,1,drop=FALSE],
          list(Region = state.region,
               Cold = state.x77[,"Frost"] > 130),
          mean, simplify = FALSE)
)

Output:
$Region
[1] Northeast     South         North Central West          Northeast    
[6] North Central West         
Levels: Northeast South North Central West

$Cold
[1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE

$Population
$Population$`1.1`
[1] 8802.8

$Population$`1.2`
[1] 4208.125

$Population$`1.3`
[1] 7233.833

$Population$`1.4`
[1] 4582.571

$Population$`2.1`
[1] 1360.5

$Population$`2.3`
[1] 2372.167

$Population$`2.4`
[1] 970.1667


attr(,"class")
[1] "data.frame"


It is with R 3.3.2. I believe that R 3.4.0 gives the same.

I think that it would be better if "1.1","1.2","1.3","1.4","2.1","2.3","2.4" were not there. They are not so meaningful and come from inner working of function 'aggregate.data.frame'.

The following, which is more usual, doesn't have such names.
print.default(
aggregate(state.x77[,1,drop=FALSE],
          list(Region = state.region,
               Cold = state.x77[,"Frost"] > 130),
          mean)
)

Output:
$Region
[1] Northeast     South         North Central West          Northeast    
[6] North Central West         
Levels: Northeast South North Central West

$Cold
[1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE

$Population
[1] 8802.8000 4208.1250 7233.8333 4582.5714 1360.5000 2372.1667  970.1667

attr(,"class")
[1] "data.frame"
Comment 1 Suharto Anggono 2017-06-05 16:08:21 UTC
To change 'aggregate.data.frame', in the anonymous function in 'lapply' that is assigned to 'z', in len == 1L case, call to 'unlist' can use use.names = FALSE.
Call to 'unlist' in len > 1L case can also use use.names = FALSE , because names from 'unlist' is subsequently unused.
Comment 2 Martin Maechler 2017-06-06 16:44:50 UTC
(In reply to Suharto Anggono from comment #1)
> To change 'aggregate.data.frame', in the anonymous function in 'lapply' that
> is assigned to 'z', in len == 1L case, call to 'unlist' can use use.names =
> FALSE.
> Call to 'unlist' in len > 1L case can also use use.names = FALSE , because
> names from 'unlist' is subsequently unused.

I think you are 100% correct and agree a change would be an improvement.
I plan to apply it -- to R-devel only though.
Comment 3 Suharto Anggono 2017-06-07 03:39:16 UTC
(In reply to Suharto Anggono from comment #1)
> To change 'aggregate.data.frame', in the anonymous function in 'lapply' that
> is assigned to 'z', in len == 1L case, call to 'unlist' can use use.names =
> FALSE.
> Call to 'unlist' in len > 1L case can also use use.names = FALSE , because
> names from 'unlist' is subsequently unused.

Sorry, the above change works on simplify = TRUE case. It doesn't address the issue.
Maybe add
names(ans) <- NULL
after
ans <- lapply(X = split(e, grp), FUN = FUN, ...)
Comment 4 Suharto Anggono 2017-06-07 04:05:38 UTC
(In reply to Suharto Anggono from comment #3)
> (In reply to Suharto Anggono from comment #1)
> > To change 'aggregate.data.frame', in the anonymous function in 'lapply' that
> > is assigned to 'z', in len == 1L case, call to 'unlist' can use use.names =
> > FALSE.
> > Call to 'unlist' in len > 1L case can also use use.names = FALSE , because
> > names from 'unlist' is subsequently unused.
> 
> Sorry, the above change works on simplify = TRUE case. It doesn't address
> the issue.
> Maybe add
> names(ans) <- NULL
> after
> ans <- lapply(X = split(e, grp), FUN = FUN, ...)

Anyway, if simplification happens, in len > 1L case and in len == 1L case where the result of 'unlist' is atomic (the usual case), names from 'unlist' (if exists) is subsequently unused. So, I think that it is still good to put use.names = FALSE in call to 'unlist'. With it, there is no (row) names in column of 'aggregate.data.frame' result in all cases (including the case of len == 1L and the result of 'unlist' is a list).
Comment 5 Martin Maechler 2017-06-21 09:31:37 UTC
(In reply to Suharto Anggono from comment #4)
....
> Anyway, if simplification happens, in len > 1L case and in len == 1L case
> where the result of 'unlist' is atomic (the usual case), names from 'unlist'
> (if exists) is subsequently unused. So, I think that it is still good to put
> use.names = FALSE in call to 'unlist'. With it, there is no (row) names in
> column of 'aggregate.data.frame' result in all cases (including the case of
> len == 1L and the result of 'unlist' is a list).

I tend to agree... conceptually it should also be more efficient in the default 'simplify = TRUE' case.
Comment 6 Martin Maechler 2017-06-23 10:09:59 UTC
Committed to R-devel  (svn r72826 | maechler | 2017-06-22 15:36:19).
Comment 7 Suharto Anggono 2017-06-24 10:23:39 UTC
In function 'aggregate.data.frame' in R devel r72852,
ans <- lapply(X = split(e, grp), FUN = FUN, ...)
has changed to
ans <- lapply(unname(split(e, grp)), FUN = FUN, ...) .

In the above 'lapply', don't delete the argument name 'X'. It guards against the possibility that the user supplies an argument named 'X'.
Comment 8 Martin Maechler 2017-06-25 20:51:54 UTC
(In reply to Suharto Anggono from comment #7)
... 
> In the above 'lapply', don't delete the argument name 'X'. It guards against
> the possibility that the user supplies an argument named 'X'.

you are right, thank you.