Bug 14786 - merge.data.frame fails when suffix addition creates duplicate column name
merge.data.frame fails when suffix addition creates duplicate column name
Status: CLOSED FIXED
Product: R
Classification: Unclassified
Component: Language
R 2.14.1 patched
x86_64/x64/amd64 (64-bit) Windows 64-bit
: P5 normal
Assigned To: R-core
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-01-17 21:01 UTC by Aman Verma
Modified: 2014-02-16 11:42 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Aman Verma 2012-01-17 21:01:19 UTC
A bug occurs in merge.data.frame under a situation where a suffix addition for columns with common names creates a situation where one of the two data.frames now has two columns with the same name. This was initially reported at: 

http://stackoverflow.com/questions/8898905/merge-data-frames-cause-match-names-error/8900743#8900743

Reproducible example:

# Create data.
df1=data.frame(rbind(c(1,10,12,NA)))
df2=data.frame(rbind(c(11,11)))

# Works fine.
merge(df1,df2,by=1,all=T)

#   X1 X2.x X3 X4 X2.y
# 1  1   10 12 NA   NA
# 2 11   NA NA NA   11

# Change the names of the columns.
names(df1)= c('v','v2.x','v2.y','v2')
names(df2)= c('x','v2')

# Same data fails!
merge(df1,df2,by=1,all=T)

# Error in match.names(clabs, names(xi)) : 
#   names do not match previous names

My own short analysis:

The error occurs in the "merge.data.frame" method, on this line:

x <- rbind(x, ya)

The problem is that "x" and "ya" don't share the same column names. That problem occurs on this line, just two lines before the previous one:

ya <- cbind(ya, x[rep.int(NA_integer_, nyy), nm.x, drop = FALSE])

nm.x" is a set of names c("v2.x","v2.y","v2.x"). and x is a data.frame with two columns with the name 'v2.x'. Interestingly, when you select the columns from this data.frame, it appears to rename one of the columns!

names(x)
[1] "v"    "v2.x" "v2.y" "v2.x"
nm.x
[1] "v2.x" "v2.y" "v2.x"
x[,nm.x]
  v2.x v2.y v2.x.1
1   10   12     10

I tried to solve this by using the position of the column, instead of the name, but the resulting name is still changed (but the values are now what you want)!

x[,c(2,3,4)]
  v v2.x v2.y v2.x.1
1 1   10   12   NA

My version of R:
               _                 
platform       x86_64-pc-mingw32            
arch           x86_64                       
os             mingw32                      
system         x86_64, mingw32              
status                                      
major          2                            
minor          14.1                         
year           2011                         
month          12                           
day            22                           
svn rev        57956                        
language       R                            
version.string R version 2.14.1 (2011-12-22)
Comment 1 Brian Ripley 2012-01-21 12:09:22 UTC
You are asking it to create a data frame with duplicated column names: this *not* the same merge.

Altered to give an explicit error message.
Comment 2 Jackie Rosen 2014-02-16 11:42:52 UTC
(spam comment removed)