Bug 17369 - unique.data.frame with numeric columns
Summary: unique.data.frame with numeric columns
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Accuracy (show other bugs)
Version: R 3.4.1
Hardware: All All
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2017-12-28 14:01 UTC by Patrick Perry
Modified: 2018-01-28 21:46 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Patrick Perry 2017-12-28 14:01:36 UTC
This one is hard to fix. Might be best to just issue a warning if the user calls `unique.data.frame` with a non-integer column:

We see correct behavior when the data frame only has one column (cf. FAQ 7.31):

unique(data.frame(x = c(.3 + .6, .9))) 
#>     x
#> 1 0.9
#> 2 0.9

We see incorrect behavior with multiple columns:

unique(data.frame(x = c(.3 + .6, .9), y = 1))
#>     x y
#> 1 0.9 1

the expected output is

#>     x y
#> 1 0.9 1
#> 2 0.9 1

The root problem is in `duplicated.data.frame`.

There's another issue with the current implementation, exhibited here:

unique(data.frame(x =  c("\r", "\r\r"), y = c("\r\r", "\r")))
#>    x    y
#> 1 \r \r\r

Users are unlikely to run into the latter issue in practice, though.


> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin16.7.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.2
Comment 1 Martin Maechler 2018-01-28 21:46:40 UTC
Actually, that was fixed with a smart 1- (or 2-) line change, svn rev 74133,
which has been basing duplicated.data.frame() on using duplicate's default method for list()s.