Bug 14974 - "merge" function gives unexpected result
Summary: "merge" function gives unexpected result
Alias: None
Product: R
Classification: Unclassified
Component: Accuracy (show other bugs)
Version: R 2.15.0
Hardware: x86_64/x64/amd64 (64-bit) Mac OS X v10.6
: P5 critical
Assignee: R-core
Depends on:
Reported: 2012-07-03 22:15 UTC by Jon Gelfond
Modified: 2015-12-14 13:44 UTC (History)
0 users

See Also:

Produces unexpected merging result. (882 bytes, text/plain)
2012-07-03 22:18 UTC, Jon Gelfond

Note You need to log in before you can comment on or make changes to this bug.
Description Jon Gelfond 2012-07-03 22:15:30 UTC
The "merge" function gives an unexpected result when I use the all.x=TRUE option if there are columns in the data frame that are matrices (2 columns).
If have 1 data.frame with all possible combinations of 2 factors, and merge with another data.frame with 1 combination missing then the missing values in the merged data are filled in with erroneous values. 
See the code below:

a.factor <- as.factor(rep(letters[1:2],2))
b.factor <- as.factor(rep(c(1:2),each=2))

y <- as.matrix(cbind(as.character(a.factor),b.factor))

data1 <- data.frame(a.factor,b.factor,y=NA)

data1$y <- y

data1 <- subset(data1,!((a.factor=="b")&(b.factor==2))) # Delete row 

factorial.data <- data.frame(a.factor,b.factor,row=1:length(b.factor))

print("Merged Data Frames")

merged.data <- merge(factorial.data,data1,by=c("a.factor","b.factor"),all.x=TRUE)

print("Strange Result with incorrectly filled in data in row 4")

data2 <- data.frame(a.factor,b.factor,y=y)
data2 <- subset(data2,!((a.factor=="b")&(b.factor==2))) # Delete row 

merged.data2 <- merge(factorial.data,data2,by=c("a.factor","b.factor"),all.x=TRUE)

print("Expected Result with properly missing data in row 6")

# Maybe it's just me, but this surprising result led to some errors later on.
Comment 1 Jon Gelfond 2012-07-03 22:18:08 UTC
Created attachment 1331 [details]
Produces unexpected merging result.

Produces unexpected merging result.
Comment 2 Brian Ripley 2012-07-27 14:16:26 UTC
The description of merge() is all about 'columns'.  If you put a matrix in a data frame, then you doing something undocumented.

We will improve this case, but do not expect everything to work.