Bug 8192 - [ subscripting sometimes loses names
Summary: [ subscripting sometimes loses names
Status: NEW
Alias: None
Product: R
Classification: Unclassified
Component: Wishlist (show other bugs)
Version: old
Hardware: All All
: P5 normal
Assignee: Jitterbug compatibility account
URL:
Depends on:
Blocks:
 
Reported: 2005-10-10 00:04 UTC by Jitterbug compatibility account
Modified: 2005-10-19 17:33 UTC (History)
0 users

See Also:


Attachments
(14.22 KB, text/plain)
2005-10-10 00:04 UTC, Jitterbug compatibility account
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jitterbug compatibility account 2005-10-10 00:04:22 UTC
From: Andrew Piskorski <atp@piskorski.com>
PARTS: 2
R, like recent versions of S-Plus, sometimes - but not always - loses
names when subscripting objects with "[".  (Earlier versions of S and
S-Plus had the correct, name-preserving behavior.)  This seems bad, it
would be better to remove names only by explicit request, not as an
accidental side-effect of some (but not all) subscripting operations.

This issue was also discusses back in 2001 on the S-News list:

  http://www.biostat.wustl.edu/archives/html/s-news/2001-09/msg00020.html

The attached file, "fix-names.s", is also available here:

  http://www.piskorski.com/R/patches/fix-names.s

It includes:

1. The function dtk.test.brace.names(), which demonstrates name losing
problem, and can automatically report which test cases pass/fail, etc.

2. Wrappers for the "[" and "[.data.frame" functions which fix the
losing names problem for all the cases I've tried.

Note that dtk.test.brace.names(T) will always run all its test cases
and return their output for human inspection.  However, its checks to
see whether each test passes or fails only work correctly with the
patched all.equal() in PR#8191.

My coworkers and I have been using these wrapper functions for ALL
code we run for many months now, with no problems so far.  However,
there are probably some cases we don't use, like objects with S4
classes, which don't work right with these wrappers.

I assume the R core team would NOT want to use these wrapper
functions, but would instead prefer to change the underlying code
directly.  However, I offer them as an example of one way to achieve
what we believe to be the correct name-preserving behavior in R.

I would appreciate any suggestions on how to better implement this
name-preserving behavior for all R subscripting operations.

-- 
Andrew Piskorski <atp@piskorski.com>
http://www.piskorski.com/

(Attached 'fix-names.s' of type 'text/plain')

**END
Comment 1 Jitterbug compatibility account 2005-10-10 00:04:22 UTC
Created attachment 1070 [details]
Comment 2 Jitterbug compatibility account 2005-10-13 01:09:01 UTC
Audit (from Jitterbug):
Wed Oct 12 20:09:01 2005	ripley	moved from incoming to wishlist
Comment 3 Jitterbug compatibility account 2005-10-19 17:33:50 UTC
From: Martin Maechler <maechler@stat.math.ethz.ch>
Andy,

that's interesting, but honestly your posting only *talked*
about your perceptions of bogous behavior of R and gave link to 
a quite extensive S source file --- which re-defines basic
functions so it's not a file I'd just want to source into my R
session.

Proper R bug reports provide short "cut & paste" executable
example code {i.e. no prompt, no output} or at least the 
transcript of such code {transcript : input (+ prompt) + output}.

Also your script is for R and S-plus and at least in some places 
it seems you think R has a bug because it behaves differently
than S or S-plus.   
Now I'm sure you know from the R-FAQ that there are quite a few
intentional differences between the two dialects of S,
and dealing with data frames is definitely one situation where
we have tried to do better than "the prototype", so we would say
the bug is with S(-plus).

In spite of all the above, I'd well expect that you still know
about problematic or even bogous behavior of "[" subscripting,
but we'd rather see small reproducible code snippets rather than
scripts that redefine "[" and "[.data.frame" and further assume
a patched all.equal()..

Best regards,
Martin Maechler

Comment 4 Jitterbug compatibility account 2005-10-20 00:46:05 UTC
From: Andrew Piskorski <atp@piskorski.com>
On Wed, Oct 19, 2005 at 02:33:50PM +0200, Martin Maechler wrote:

> Proper R bug reports provide short "cut & paste" executable
> example code {i.e. no prompt, no output} or at least the 
> transcript of such code {transcript : input (+ prompt) + output}.

My patch includes the function dtk.test.brace.names() which
demonstrates the problem.  If you source just that function into a
completely stock R, you can see the losing names problem by running:

  dtk.test.brace.names(return.results.p=T ,only="all")

To make it easier to see just what the problem is, I'll send example
output in my next email.

> Also your script is for R and S-plus and at least in some places 
> it seems you think R has a bug because it behaves differently
> than S or S-plus.   

No, I don't think that.  If comments in my code give that impression
then that's a bug in my comments, it was not my intention.

My coworkers and I originally fixed the name losing problem in S-Plus,
then later did so in R, so in some places I might have sloppily said,
"R is different than S-Plus" when what I REALLY meant was, "Stock R is
different than our fixed/patched S-Plus where we've already solved
these name-losing problems."

Stock S-Plus and R both suffer from losing names when they shouldn't.
Since I use both dialects, I've included (ugly) fixes for both.  Of
course you probably only care about the R part, but I didn't think it
would hurt to include both.

> Now I'm sure you know from the R-FAQ that there are quite a few
> intentional differences between the two dialects of S,

Yes, I'm aware of that FAQ.  I also just finished porting a large body
of code from S-Plus to R a few months ago, so I have a very concrete
appreciation of the MANY little S-Plus vs. R differences, many more
than are mentioned in that FAQ.

Some of those differences are simply arbitrary or accidental, but
others are places where S-Plus was basically doing something dumb and
the R behavior is better.  I have no complaints about this.  :)

(The converse, where R's behavior is definitely inferior to that of
S-Plus, seems to be a lot less common, and are usually more minor.)

-- 
Andrew Piskorski <atp@piskorski.com>
http://www.piskorski.com/

Comment 5 Jitterbug compatibility account 2005-10-20 00:50:25 UTC
From: Andrew Piskorski <atp@piskorski.com>

Here is an example of the losing names problem in stock R 2.2.0.  Note
that below, only stock R packages are loaded, and then I manually
source in just my dtk.test.brace.names() testing function, nothing
else.

Since the list-of-lists output of dtk.test.brace.names() is very
lengthy, I've manually cut-and-pasted it into a tabular format to save
space and make inspection easier.  As you can see, out of its 15 test
cases, stock R 2.2.0 fails 4 of them while the other 12 are Ok.

Too see what these simple subscripting tests actually DO, please refer
to the body of dtk.test.brace.names() from my previous emails above.


R : Copyright 2005, The R Foundation for Statistical Computing
Version 2.2.0  (2005-10-06 r35749)
> search()
[1] ".GlobalEnv"        "package:methods"   "package:graphics" 
[4] "package:grDevices" "package:datasets"  "package:utils"    
[7] "package:stats"     "Autoloads"         "package:base"     

> dtk.test.brace.names(return.results.p=T ,only="all")

Ok?  Actual Result         Desired Result
---  ------------------    ------------------
     $vec.1
BAD  $vec.1[[1]]           $vec.1[[2]]
        a    c <NA>         a  c no
        1    3   NA         1  3 NA

     $diag.1
Ok   $diag.1[[1]]          $diag.1[[2]]
     [1]  1  7 13 19 25    [1]  1  7 13 19 25

     $diag.2
Ok   $diag.2[[1]]          $diag.2[[2]]
     [1]  1  7 13 19 25    [1]  1  7 13 19 25

     $df.a.1
Ok   $df.a.1[[1]]          $df.a.1[[2]]
     a b                   a b
     4 5                   4 5

     $df.b.1
BAD  $df.b.1[[1]]          $df.b.1[[2]]
     [1] 4 5               a b
                           4 5

     $df.a.2
Ok   $df.a.2[[1]]          $df.a.2[[2]]
     c b a                 c b a
     6 5 4                 6 5 4

     $df.b.2
BAD  $df.b.2[[1]]          $df.b.2[[2]]
     [1] 6 5 4             c b a
                           6 5 4

     $df.a.3
Ok   $df.a.3[[1]]          $df.a.3[[2]]
     a b                   a b
     3 4                   3 4

     $df.b.3
BAD  $df.b.3[[1]]          $df.b.3[[2]]
     [1] 3 4               a b
                           3 4

     $df.a.4
Ok   $df.a.4[[1]]          $df.a.4[[2]]
     col1 col2             col1 col2
        2    4                2    4

     $df.b.4
Ok   $df.b.4[[1]]          $df.b.4[[2]]
       col1 col2             col1 col2
     b    2    4           b    2    4

     $df.a.5
Ok   $df.a.5[[1]]          $df.a.5[[2]]
     col1 col2             col1 col2
        2    4                2    4

     $df.b.5
Ok   $df.b.5[[1]]          $df.b.5[[2]]
     $df.b.5[[1]]$col1     $df.b.5[[2]]$col1
     [1] 2                 [1] 2
     $df.b.5[[1]]$col2     $df.b.5[[2]]$col2
     [1] 4                 [1] 4

     $df.a.6
Ok   $df.a.6[[1]]          $df.a.6[[2]]
       col1 col2             col1 col2
     b    2    4           b    2    4

     $df.b.6
Ok   $df.b.6[[1]]          $df.b.6[[2]]
       col1 col2             col1 col2
     b    2    4           b    2    4

-- 
Andrew Piskorski <atp@piskorski.com>
http://www.piskorski.com/