Bug 16666 - Wrong or inconsistent rownames in rbind.data.frame
Summary: Wrong or inconsistent rownames in rbind.data.frame
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Low-level (show other bugs)
Version: R 3.2.3
Hardware: All Windows 64-bit
: P5 critical
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2016-01-11 12:51 UTC by Eugen Massini
Modified: 2016-01-21 11:15 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eugen Massini 2016-01-11 12:51:33 UTC
Running the following code gives invalid rownames:

df <- data.frame(x=seq(1,5),y=letters[seq(1,5)])
a <- df[c(3,5),]
b <- df[c(1,2),]

res <- rbind.data.frame(a, b)
# ronames(res) are c(3, 5, 31, 4)
# expected is: c(3, 5, 1, 2)

The result is either caused by a bug or is inconsistent and it appears only in that order of cbind and if the data frames 'a' and 'b' are copied(?) form df.

Code I have used for testing:

rm(list=ls(all.names=T))

make.row.names <- T

printValue <- function(var) {
  varname <- deparse(substitute(var))
  varRownames <- paste(rownames(var), collapse=" ")
  cat("rnames of", varname, ":", varRownames, "\n")
}

printTest <- function(a, b, res, resRev)
{
  printValue(a)
  printValue(b)
  printValue(res)
  printValue(resRev)
}

#### Bug ####
print("-----bug?----")
df <- data.frame(x=seq(1,5),y=letters[seq(1,5)])

# Is here reference or copy? (1)
a <- df[c(3,5),]
b <- df[c(1,2),]

res <- rbind.data.frame(a, b, make.row.names=make.row.names)
resRev <- rbind.data.frame(b, a, make.row.names=make.row.names)

printTest(a, b, res, resRev)

#######
print("---create separately---")
a <- data.frame(x=c(3,5), y=c('c','e'))
rownames(a) <- c(3,5)

b  <- data.frame(x=c(1,2), y=c('a', 'b'))
rownames(b) <- c(1,2)

res <- rbind.data.frame(a, b, make.row.names=make.row.names)
resRev <- rbind.data.frame(b, a, make.row.names=make.row.names)

printTest(a, b, res, resRev)

####### 
#### (1) But it does not matter ###
print("-----copies----")
df <- data.frame(x=seq(1,5),y=letters[seq(1,5)])
a <- df[c(3,5),]
b <- df[c(1,2),]


a[1,1] <- -33
b[1,1] <- -55

res <- rbind.data.frame(a, b, make.row.names=make.row.names)
resRev <- rbind.data.frame(b, a, make.row.names=make.row.names)

printTest(a, b, res, resRev)
printValue(df)

##########
# Results

                  -----bug?----       | ---separately---  | -----copies----
rownames of a         : 3 5           | 3 5               | 3 5 
rownames of b         : 1 2           | 1 2               | 1 2 
rownames of res       : 3 5 31 4      | 3 5 1 2           | 3 5 31 4 
rownames of resRev    : 1 2 3 5       | 1 2 3 5           | 1 2 3 5 
rownames of df                                            | 1 2 3 4 5
Comment 1 Martin Maechler 2016-01-20 09:52:01 UTC
It is a bit more subtle than you suggested (no "reference" vs "copies") but
I agree that this is bogous. 
I already have a fix ready  (but it breaks our R checks ... though maybe only a "unofficial" case).

A simpler MRE is

> rbind(data.frame(y=1:4)[-(1:2),,drop=FALSE], data.frame(y=1:2))
   y
3  3
4  4
31 1
41 2
>
Comment 2 Martin Maechler 2016-01-21 11:15:45 UTC
Committed a patch  svn rev 69968  -- to R-devel only (-> R 3.3.0 in April).

I also plan a 2nd change to rbind.data.frame() which is for speedup of the case where the final data frame has row names 1:n (which are internally stored in short form c(NA, n)).
This is also the situation in our case, but the topic there is speedup, hence later.