Bug 17381 - Incorrect POSIXct duplicates from anyDuplicated(), duplicated() and unique() for data.frame
Summary: Incorrect POSIXct duplicates from anyDuplicated(), duplicated() and unique() ...
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Accuracy (show other bugs)
Version: 3.4.0
Hardware: x86_64/x64/amd64 (64-bit) Mac OS X v10.6
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2018-01-24 00:34 UTC by Earo Wang
Modified: 2018-01-28 21:51 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Earo Wang 2018-01-24 00:34:06 UTC
The issue occurs when date-times involve both daylight savings and standard time. If it's a vector (`x_melb`), the output is as expected (no duplicates). But if it's put into a data frame, it's reported that there are duplicates but should be no duplicates.

x <- as.POSIXct(c(
  "2013-04-06 13:00:00", "2013-04-06 14:00:00",
  "2013-04-06 15:00:00", "2013-04-06 16:00:00",
  "2013-04-06 17:00:00"), tz = "UTC")
attr(x, "tzone") <- ""

x_melb <- as.POSIXct(x, tz = "Australia/Melbourne")
df <- data.frame(x = x_melb, y = 1)
anyDuplicated(x_melb)
#> [1] 0
anyDuplicated(df)
#> [1] 4
duplicated(x_melb)
#> [1] FALSE FALSE FALSE FALSE FALSE
duplicated(df)
#> [1] FALSE FALSE FALSE  TRUE FALSE
unique(x_melb)
#> [1] "2013-04-07 00:00:00 AEDT" "2013-04-07 01:00:00 AEDT"
#> [3] "2013-04-07 02:00:00 AEDT" "2013-04-07 02:00:00 AEST"
#> [5] "2013-04-07 03:00:00 AEST"
unique(df)
#>                     x y
#> 1 2013-04-07 00:00:00 1
#> 2 2013-04-07 01:00:00 1
#> 3 2013-04-07 02:00:00 1
#> 5 2013-04-07 03:00:00 1

R.version
#>                _                           
#> platform       x86_64-apple-darwin15.6.0   
#> arch           x86_64                      
#> os             darwin15.6.0                
#> system         x86_64, darwin15.6.0        
#> status                                     
#> major          3                           
#> minor          4.3                         
#> year           2017                        
#> month          11                          
#> day            30                          
#> svn rev        73796                       
#> language       R                           
#> version.string R version 3.4.3 (2017-11-30)
#> nickname       Kite-Eating Tree 

Regards,
Earo
Comment 1 Martin Maechler 2018-01-28 21:51:29 UTC
Effectively, this is very much related to bug 17369, since both exhibit the limit of the previous  duplicated.data.frame method that was built on pasting string versions of the data frame row entries.

Both have been fixed by svn rev 74133  -- which instead build on the duplicated() method for lists.