Bug 16918 - With near-equal numbers in 'by', aggregate.data.frame(drop=FALSE) gives extra row
Summary: With near-equal numbers in 'by', aggregate.data.frame(drop=FALSE) gives extra...
Status: UNCONFIRMED
Alias: None
Product: R
Classification: Unclassified
Component: Analyses (show other bugs)
Version: R 3.3.0
Hardware: ix86 (32-bit) Windows 32-bit
: P5 minor
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2016-05-20 17:56 UTC by Suharto Anggono
Modified: 2016-05-20 23:07 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Suharto Anggono 2016-05-20 17:56:42 UTC
This is an example.

R> group <- c(sqrt(2)^2, 2)
R> print(aggregate(data.frame(n = seq(group)), list(group = group), length,
R+ drop = FALSE), digits = 17)
               group n
1 2.0000000000000000 2
2 2.0000000000000004 0
Warning message:
In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(la
bels,  :
  duplicated levels in factors are deprecated

With sqrt(2)^2 and 2 are considered equal, there is only one group with two members. So, in the result, row 2, with 0 in 'n', should not be there.

Compare with the following that uses default 'aggregate.data.frame' (drop=TRUE).

R> group <- c(sqrt(2)^2, 2)
R> print(aggregate(data.frame(n = seq(group)), list(group = group), length),
R+ digits = 17)
               group n
1 2.0000000000000004 2

R> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows XP (build 2600) Service Pack 2

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
Comment 1 Suharto Anggono 2016-05-20 23:07:54 UTC
(In reply to Suharto Anggono from comment #0)
> With sqrt(2)^2 and 2 are considered equal, there is only one group with two
> members. So, in the result, row 2, with 0 in 'n', should not be there.

Also compare with the following.

R> group <- c(sqrt(2)^2, 2)
R> print(aggregate(data.frame(n = seq(group)), list(group = as.factor(group)),
R+ length, drop = FALSE), digits = 17)
  group n
1     2 2