Bug 14530 - data.frame(<...>, check.rows=T) not functioning properly
data.frame(<...>, check.rows=T) not functioning properly
Status: CLOSED FIXED
Product: R
Classification: Unclassified
Component: Misc
R 2.12.2 patched
All All
: P5 normal
Assigned To: R-core
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-03-12 17:46 UTC by Ben Hansen
Modified: 2014-02-23 14:47 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Hansen 2011-03-12 17:46:31 UTC
Dear R core developers,

There appears to be a small oversight in base::data.frame() that prevents its check.rows option from functioning properly.   I am submitting an example, an explanation, a patch and a description of the environment in which I confirmed the bug and tested the patch.

***Example***  
Consider

> dfA <- data.frame(A=1:2, B=3:4, row.names=letters[1:2])
> dfB <- dfA[2:1,]
> all.equal(row.names(dfA), row.names(dfB))
[1] "2 string mismatches"

Because dfA and dfB differ in their row names, attempts to merge them using data.frame() with check.rows=T should cause an exception.  However, as data.frame() is coded in current distributions of R and in the Subversion trunk, it does not: 

> data.frame(dfA, dfA[2:1,], check.rows=T)
  A B A.1 B.1
a 1 3   2   4
b 2 4   1   3

***Explanation***
A little debugging reveals that the problem is due to changes in the value of "missing(row.names)" over the course of a loop embedded in data.frame.  (A patch follows.) In my example, the present data.frame() evaluates "missing(row.names)" twice, once for each of the two data frames it's combining, and while the first time it evaluates correctly to TRUE by the second time it's evaluating to FALSE.  The result is that the nicely designed embedded function data.row.names never gets the opportunity to do what it seems intended to do, namely to compare and contrast the row names associated with multiple data frame arguments to data.frame().  

***Patch***
Index: dataframe.R
===================================================================
--- dataframe.R	(revision 54750)
+++ dataframe.R	(working copy)
@@ -418,7 +418,7 @@
                 vnames[[i]] <- tmpname
             }
         } # end of ncols[i] <= 1
-	if(missing(row.names) && nrows[i] > 0L) {
+	if(mrn && nrows[i] > 0L) {
             rowsi <- attr(xi, "row.names")
             ## Avoid all-blank names
             nc <- nchar(rowsi, allowNA = FALSE)

*** My setup and checks***
I checked that with this modification to the code I am still able to configure and make R from the code in the Subversion trunk (Rev 54750), and that the modification doesn't change the results of running make check-devel.  In particular the tests of examples for packages "base" and "tools" are successful.  (My check-devel stopped there, with or without the modification, probably because I hadn't also set up the necessary external packages.)

My setup: I am running Mac OS X 10.5.  Further:
> sessionInfo()
R version 2.13.0 Under development (unstable) (--)
Platform: powerpc-apple-darwin9.8.0 (32-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     


--Ben
Comment 1 Brian Ripley 2011-03-15 11:51:59 UTC
fixed in pre-2.13.0.

And please not, 'merge' is not involved and T should not be used in place of TRUE in portable code.
Comment 2 Brian Ripley 2011-03-15 16:21:32 UTC
On Tue, 15 Mar 2011, r-bugs@r-project.org wrote:

> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14530
>
> Brian Ripley <ripley@stats.ox.ac.uk> changed:
>
>           What    |Removed                     |Added
> ----------------------------------------------------------------------------
>             Status|NEW                         |CLOSED
>         Resolution|                            |FIXED
>
> --- Comment #1 from Brian Ripley <ripley@stats.ox.ac.uk> 2011-03-15 06:51:59 EDT ---
> fixed in pre-2.13.0.
>
> And please not, 'merge' is not involved and T should not be used in place of
> TRUE in portable code.


And for the record, the supplied patch was incorrect and 'make check' 
failed.

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


Comment 3 Jackie Rosen 2014-02-16 11:42:37 UTC
(spam comment removed)