Bug 17119 - merge(all=TRUE, x, y) fails if x and y contain columns called "method", "decreasing", or other arguments to order()
Summary: merge(all=TRUE, x, y) fails if x and y contain columns called "method", "decr...
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Accuracy (show other bugs)
Version: R-devel (trunk)
Hardware: Other Other
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2016-07-14 16:02 UTC by Bill Dunlap
Modified: 2016-07-23 21:37 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bill Dunlap 2016-07-14 16:02:12 UTC
If the 'by' columns in the data.frames given to merge() have
names matching the names of the arguments to the order()
function, then merge(..., all=TRUE) will either give an error
or give an incorrect result.

> merge(data.frame(decreasing=c(1,3,2)), data.frame(decreasing=c(4,2,3)), all=TRUE)
[1] decreasing
<0 rows> (or 0-length row.names)
> merge(data.frame(method=c(1,3,2)), data.frame(method=c(4,2,3)), all=TRUE)
Error in match.arg(method) : 'arg' must be NULL or a character vector

Rename the columns to avoid matching order's argument names and those examples give good results

> merge(data.frame(Decreasing=c(1,3,2)), data.frame(Decreasing=c(4,2,3)), all=TRUE)
  Decreasing
1          1
2          2
3          3
4          4
> merge(data.frame(Method=c(1,3,2)), data.frame(Method=c(4,2,3)), all=TRUE)
  Method
1      1
2      2
3      3
4      4

This probably arises from using do.call("order", dataFrame) instead of
do.call("order", unname(dataFrame)), or unname(as.list(dataFrame)) if
dataFrames without column names are illegal.

> sessionInfo()
R Under development (unstable) (2016-07-12 r70903)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
Comment 1 Martin Maechler 2016-07-23 17:37:11 UTC
Yes, exactly (both the bug and the diagnosis !) --- thank you, Bill!

Indeed --- the help page for  order() says

Warning:

     [......]
     A sometimes-encountered unsafe practice is to call
     ‘do.call('order', df_obj)’ where ‘df_obj’ might be a data frame:
     copy ‘df_obj’ and remove any names, for example using ‘unname’.


and it is "interesting" that we have not been obeying our own advice  ;-)

----------
A quite different approach to solve this problem more cleanly would be to provide two versions of the order() function, like this:

 
 order <- function(..., na.last = TRUE, decreasing = FALSE,
                   method = c("auto", "shell", "radix"))
+    orderL(list(...), na.last=na.last, decreasing=decreasing, method=method)
+
+orderL <- function(lst, na.last = TRUE, decreasing = FALSE,
+                   method = c("auto", "shell", "radix"))
 {
-    z <- list(...)
-

this would allow  to  use  do.call(orderL, dataFrame)   directly and also help 
in similar situations.
Comment 2 Martin Maechler 2016-07-23 18:28:11 UTC
(In reply to Martin Maechler from comment #1)
> ----------

> A quite different approach to solve this problem more cleanly would be to
> provide two versions of the order() function, like this:
> 
>  
>  order <- function(..., na.last = TRUE, decreasing = FALSE,
>                    method = c("auto", "shell", "radix"))
> +    orderL(list(...), na.last=na.last, decreasing=decreasing, method=method)
> +
> +orderL <- function(lst, na.last = TRUE, decreasing = FALSE,
> +                   method = c("auto", "shell", "radix"))
>  {
> -    z <- list(...)
> -
> 
> this would allow  to  use  do.call(orderL, dataFrame)   directly and also
> help in similar situations.

yes, that would be nice, would need changes to much more than the R code...
So I will implement a solution along the  "unname(.)" proposal of Bill.
Comment 3 Martin Maechler 2016-07-23 21:37:45 UTC
Fixed in R-devel and R-patched (R 3.3.1 patched) with svn rev 70961 & 70962, 
respectively.