Bug 14974 - "merge" function gives unexpected result
"merge" function gives unexpected result
Status: RESOLVED FIXED
Product: R
Classification: Unclassified
Component: Accuracy
R 2.15.0
x86_64/x64/amd64 (64-bit) Mac OS X v10.6
: P5 critical
Assigned To: R-core
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-03 22:15 UTC by Jon Gelfond
Modified: 2012-07-27 14:16 UTC (History)
0 users

See Also:


Attachments
Produces unexpected merging result. (882 bytes, text/plain)
2012-07-03 22:18 UTC, Jon Gelfond
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jon Gelfond 2012-07-03 22:15:30 UTC
The "merge" function gives an unexpected result when I use the all.x=TRUE option if there are columns in the data frame that are matrices (2 columns).
If have 1 data.frame with all possible combinations of 2 factors, and merge with another data.frame with 1 combination missing then the missing values in the merged data are filled in with erroneous values. 
See the code below:
set.seed(2012)

a.factor <- as.factor(rep(letters[1:2],2))
b.factor <- as.factor(rep(c(1:2),each=2))

y <- as.matrix(cbind(as.character(a.factor),b.factor))

data1 <- data.frame(a.factor,b.factor,y=NA)

data1$y <- y

data1 <- subset(data1,!((a.factor=="b")&(b.factor==2))) # Delete row 

factorial.data <- data.frame(a.factor,b.factor,row=1:length(b.factor))

print("Merged Data Frames")
print(data1)
print(factorial.data)

merged.data <- merge(factorial.data,data1,by=c("a.factor","b.factor"),all.x=TRUE)

print("Strange Result with incorrectly filled in data in row 4")
print(merged.data)

data2 <- data.frame(a.factor,b.factor,y=y)
data2 <- subset(data2,!((a.factor=="b")&(b.factor==2))) # Delete row 


merged.data2 <- merge(factorial.data,data2,by=c("a.factor","b.factor"),all.x=TRUE)

print("Expected Result with properly missing data in row 6")
print(merged.data2)

# Maybe it's just me, but this surprising result led to some errors later on.
Comment 1 Jon Gelfond 2012-07-03 22:18:08 UTC
Created attachment 1331 [details]
Produces unexpected merging result.

Produces unexpected merging result.
Comment 2 Brian Ripley 2012-07-27 14:16:26 UTC
The description of merge() is all about 'columns'.  If you put a matrix in a data frame, then you doing something undocumented.

We will improve this case, but do not expect everything to work.