Bug 14354 - wish to provide a means to mark non-standard evaluation to be skipped by codetools
Summary: wish to provide a means to mark non-standard evaluation to be skipped by code...
Status: ASSIGNED
Alias: None
Product: R
Classification: Unclassified
Component: Language (show other bugs)
Version: R 2.11.1
Hardware: ix86 (32-bit) Linux-Ubuntu
: P5 enhancement
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2010-08-12 09:00 UTC by Mario Frasca
Modified: 2011-10-12 11:25 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mario Frasca 2010-08-12 09:00:30 UTC
the package where this happens is on r-forge, the name is delftfews.  it's not the only package receiving false positive NOTEs like this.

for example:
actualdata <- subset(actualdata, !is.na(column))

actualdata[[column]] exists.

but R CMD check pkg gives 
write.PI.zoo : CreateSeriesNode: no visible binding for global variable
  'column'

is there a way for disabling this NOTE?

as far as I understand, R CMD check is static while the information needed is dynamic, so I am not surprised about the NOTE.  nevertheless, it would be nice if I could add some sort of directive to R CMD check informing it about extra available bindings.  something like:

## R CMD check - (actualdata column) (object field1 field2 ... fieldn)
actualdata <- subset(actualdata, !is.na(column))

thanks
Comment 1 Brian Ripley 2010-08-13 11:49:09 UTC
The problem is the non-standard scoping of subset(), which is a 
convenience wrapper not intended to be use in programming.

Using the conventional extraction operators (e.g. [) is safer and 
avoids the NOTE.

On Thu, 12 Aug 2010, r-bugs@r-project.org wrote:

> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14354
>
>           Summary: false positive NOTE using subset - no visible binding
>                    for global variable
>           Product: R
>           Version: R 2.11.1
>          Platform: PC/x86
>        OS/Version: Linux-Ubuntu
>            Status: NEW
>          Severity: enhancement
>          Priority: P5
>         Component: Language
>        AssignedTo: R-core@R-project.org
>        ReportedBy: mfrasca@zonnet.nl
>   Estimated Hours: 0.0
>
>
> the package where this happens is on r-forge, the name is delftfews.  it's not
> the only package receiving false positive NOTEs like this.
>
> for example:
> actualdata <- subset(actualdata, !is.na(column))
>
> actualdata[[column]] exists.
>
> but R CMD check pkg gives
> write.PI.zoo : CreateSeriesNode: no visible binding for global variable
>  'column'
>
> is there a way for disabling this NOTE?
>
> as far as I understand, R CMD check is static while the information needed is
> dynamic, so I am not surprised about the NOTE.  nevertheless, it would be nice
> if I could add some sort of directive to R CMD check informing it about extra
> available bindings.  something like:
>
> ## R CMD check - (actualdata column) (object field1 field2 ... fieldn)
> actualdata <- subset(actualdata, !is.na(column))
>
> thanks
>
> -- 
> Configure bugmail: https://bugs.r-project.org/bugzilla3/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are the assignee for the bug.
>
> _______________________________________________
> R-core list: https://stat.ethz.ch/mailman/listinfo/r-core
>


-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


Comment 2 Martin Maechler 2010-08-13 18:16:19 UTC
>>>>> Prof Brian Ripley <ripley@stats.ox.ac.uk>
>>>>>     on Fri, 13 Aug 2010 07:49:09 +0100 (BST) writes:


    > The problem is the non-standard scoping of subset(), which is a 
    > convenience wrapper not intended to be use in programming.

    > Using the conventional extraction operators (e.g. [) is safer and 
    > avoids the NOTE.

This is correct in the current case.

However, there are many such NOTEs,
and I'd claim that many are FP indeed,
as they happen often when programmerRs use  expressions,
and I'd argue I would not want to say 
     "not intended to be used in programming".

So, indeed, it would be convenient to be able to specify to the
codetools, that
"yes, I use expressions here, and yes, of course, they will
 contain some variables that are unbound in the current environment".

Martin

    > On Thu, 12 Aug 2010, r-bugs@r-project.org wrote:

    >> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14354
    >> 
    >> Summary: false positive NOTE using subset - no visible binding
    >> for global variable
    >> Product: R
    >> Version: R 2.11.1
    >> Platform: PC/x86
    >> OS/Version: Linux-Ubuntu
    >> Status: NEW
    >> Severity: enhancement
    >> Priority: P5
    >> Component: Language
    >> AssignedTo: R-core@R-project.org
    >> ReportedBy: mfrasca@zonnet.nl
    >> Estimated Hours: 0.0
    >> 
    >> 
    >> the package where this happens is on r-forge, the name is delftfews.  it's not
    >> the only package receiving false positive NOTEs like this.
    >> 
    >> for example:
    >> actualdata <- subset(actualdata, !is.na(column))
    >> 
    >> actualdata[[column]] exists.
    >> 
    >> but R CMD check pkg gives
    >> write.PI.zoo : CreateSeriesNode: no visible binding for global variable
    >> 'column'
    >> 
    >> is there a way for disabling this NOTE?
    >> 
    >> as far as I understand, R CMD check is static while the information needed is
    >> dynamic, so I am not surprised about the NOTE.  nevertheless, it would be nice
    >> if I could add some sort of directive to R CMD check informing it about extra
    >> available bindings.  something like:
    >> 
    >> ## R CMD check - (actualdata column) (object field1 field2 ... fieldn)
    >> actualdata <- subset(actualdata, !is.na(column))
    >> 
    >> thanks
    >> 
    >> -- 
    >> Configure bugmail: https://bugs.r-project.org/bugzilla3/userprefs.cgi?tab=email
    >> ------- You are receiving this mail because: -------
    >> You are the assignee for the bug.
    >> 
    >> _______________________________________________
    >> R-core list: https://stat.ethz.ch/mailman/listinfo/r-core
    >> 

    > -- 
    > Brian D. Ripley,                  ripley@stats.ox.ac.uk
    > Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
    > University of Oxford,             Tel:  +44 1865 272861 (self)
    > 1 South Parks Road,                     +44 1865 272866 (PA)
    > Oxford OX1 3TG, UK                Fax:  +44 1865 272595

    > _______________________________________________
    > R-core list: https://stat.ethz.ch/mailman/listinfo/r-core


Comment 3 Duncan Murdoch 2010-08-13 21:57:17 UTC
On 13/08/2010 12:26 PM, Thomas Lumley wrote:
> On Fri, 13 Aug 2010, Martin Maechler wrote:
>
> >>>>>> Prof Brian Ripley <ripley@stats.ox.ac.uk>
> >>>>>>     on Fri, 13 Aug 2010 07:49:09 +0100 (BST) writes:
> >
> >    > The problem is the non-standard scoping of subset(), which is a
> >    > convenience wrapper not intended to be use in programming.
> >
> >    > Using the conventional extraction operators (e.g. [) is safer and
> >    > avoids the NOTE.
> >
> > This is correct in the current case.
> >
> > However, there are many such NOTEs,
> > and I'd claim that many are FP indeed,
> > as they happen often when programmerRs use  expressions,
> > and I'd argue I would not want to say
> >     "not intended to be used in programming".
> >
> > So, indeed, it would be convenient to be able to specify to the
> > codetools, that
> > "yes, I use expressions here, and yes, of course, they will
> > contain some variables that are unbound in the current environment".
>
> Martin,
>
> That's why these are notes, not even warnings.   I still expect that for most people, most of the notes will indicate errors.  That was certainly true for my 'survey' package, which relies quite heavily on rewriting expressions.
>   
> My experience of false positives for this check is more often code guarded by an if(), where CMD check doesn't know that the if() test is adequate.  For example, I have separate code to handle different versions of the hexbin package, and the code for the version that isn't present gives a false positive test.
>   


All of that is true, but I think Martin's point is still valid.  It 
would be nice to be able to suppress warnings, without changing the 
actual code.  (I.e. you could suppress the subset warning by creating a 
variable named "column", but that would be a bad idea.)  If codetools 
could accept some directives telling it Martin's message, it would mean 
you could suppress the false-positive note in your code, and then if a 
note ever did show up, you'd know it was worth investigating.

I don't know a good way to specify those directives, but I think it 
would be reasonable to add a general mechanism to the parser to insert 
non-executable information into the parsed result, and codetools could 
make use of it.

Duncan Murdoch

>
> >
> >    > On Thu, 12 Aug 2010, r-bugs@r-project.org wrote:
> >
> >    >> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14354
> >    >>
> >    >> Summary: false positive NOTE using subset - no visible binding
> >    >> for global variable
> >    >> Product: R
> >    >> Version: R 2.11.1
> >    >> Platform: PC/x86
> >    >> OS/Version: Linux-Ubuntu
> >    >> Status: NEW
> >    >> Severity: enhancement
> >    >> Priority: P5
> >    >> Component: Language
> >    >> AssignedTo: R-core@R-project.org
> >    >> ReportedBy: mfrasca@zonnet.nl
> >    >> Estimated Hours: 0.0
> >    >>
> >    >>
> >    >> the package where this happens is on r-forge, the name is delftfews.  it's not
> >    >> the only package receiving false positive NOTEs like this.
> >    >>
> >    >> for example:
> >    >> actualdata <- subset(actualdata, !is.na(column))
> >    >>
> >    >> actualdata[[column]] exists.
> >    >>
> >    >> but R CMD check pkg gives
> >    >> write.PI.zoo : CreateSeriesNode: no visible binding for global variable
> >    >> 'column'
> >    >>
> >    >> is there a way for disabling this NOTE?
> >    >>
> >    >> as far as I understand, R CMD check is static while the information needed is
> >    >> dynamic, so I am not surprised about the NOTE.  nevertheless, it would be nice
> >    >> if I could add some sort of directive to R CMD check informing it about extra
> >    >> available bindings.  something like:
> >    >>
> >    >> ## R CMD check - (actualdata column) (object field1 field2 ... fieldn)
> >    >> actualdata <- subset(actualdata, !is.na(column))
> >    >>
> >    >> thanks
> >    >>
> >    >> --
> >    >> Configure bugmail: https://bugs.r-project.org/bugzilla3/userprefs.cgi?tab=email
> >    >> ------- You are receiving this mail because: -------
> >    >> You are the assignee for the bug.
> >    >>
> >    >> _______________________________________________
> >    >> R-core list: https://stat.ethz.ch/mailman/listinfo/r-core
> >    >>
> >
> >    > --
> >    > Brian D. Ripley,                  ripley@stats.ox.ac.uk
> >    > Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> >    > University of Oxford,             Tel:  +44 1865 272861 (self)
> >    > 1 South Parks Road,                     +44 1865 272866 (PA)
> >    > Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> >
> >    > _______________________________________________
> >    > R-core list: https://stat.ethz.ch/mailman/listinfo/r-core
> >
> > _______________________________________________
> > R-core list: https://stat.ethz.ch/mailman/listinfo/r-core
> >
>
> Thomas Lumley
> Professor of Biostatistics
> University of Washington, Seattle
>
> _______________________________________________
> R-core list: https://stat.ethz.ch/mailman/listinfo/r-core
>



Comment 4 Martin Maechler 2010-08-14 01:42:14 UTC
On Fri, Aug 13, 2010 at 18:57, Duncan Murdoch <murdoch.duncan@gmail.com> wrote:
> On 13/08/2010 12:26 PM, Thomas Lumley wrote:
>>
>> On Fri, 13 Aug 2010, Martin Maechler wrote:
>>
>> >>>>>> Prof Brian Ripley <ripley@stats.ox.ac.uk>
>> >>>>>>     on Fri, 13 Aug 2010 07:49:09 +0100 (BST) writes:
>> >
>> >    > The problem is the non-standard scoping of subset(), which is a
>> >    > convenience wrapper not intended to be use in programming.
>> >
>> >    > Using the conventional extraction operators (e.g. [) is safer and
>> >    > avoids the NOTE.
>> >
>> > This is correct in the current case.
>> >
>> > However, there are many such NOTEs,
>> > and I'd claim that many are FP indeed,
>> > as they happen often when programmerRs use  expressions,
>> > and I'd argue I would not want to say
>> >     "not intended to be used in programming".
>> >
>> > So, indeed, it would be convenient to be able to specify to the
>> > codetools, that
>> > "yes, I use expressions here, and yes, of course, they will
>> > contain some variables that are unbound in the current environment".
>>
>> Martin,
>>
>> That's why these are notes, not even warnings.   I still expect that for
>> most people, most of the notes will indicate errors.  That was certainly
>> true for my 'survey' package, which relies quite heavily on rewriting
>> expressions.
>>  My experience of false positives for this check is more often code
>> guarded by an if(), where CMD check doesn't know that the if() test is
>> adequate.  For example, I have separate code to handle different versions of
>> the hexbin package, and the code for the version that isn't present gives a
>> false positive test.
>>
>
> All of that is true, but I think Martin's point is still valid.  It would be
> nice to be able to suppress warnings, without changing the actual code.
>  (I.e. you could suppress the subset warning by creating a variable named
> "column", but that would be a bad idea.)  If codetools could accept some
> directives telling it Martin's message, it would mean you could suppress the
> false-positive note in your code, and then if a note ever did show up, you'd
> know it was worth investigating.

Exactly!  I now have several packages that produce a dozen (or so)
lines of such notes.
The consequence now has been that I skim over the whole part, and have
more than once overlooked
"notes" that I should have taken note of ..

> I don't know a good way to specify those directives, but I think it would be
> reasonable to add a general mechanism to the parser to insert non-executable
> information into the parsed result, and codetools could make use of it.

that would be useful and progress, indeed..

Thank you, Duncan.


Comment 5 Brian Ripley 2010-08-17 13:33:25 UTC
Wish to allow an exception mechanism for other cases: this one
is simply bad practice.
Comment 6 Luke Tierney 2010-08-20 02:54:01 UTC
A couple of comments on this thread.

With functions like subset() it may make sense for codetools not to
follow the arguments that are known to use non-standard evaluation
(and it does this already in some cases).  There is always a trade-off
though, as reducing false positives also increases false negatives.
In principle subset() is probably a case where this is worth doing,
but I'm not in a rush to do so as it is fairly painlul to do with the
way codetools is currently set up and as I agree with Brian that it
really isn't a good idea to use subset() in package code since it
leads to assumptions on variable names being hard wired in and making
the code brittle.

In terms of being able to add some sort of declarations, that would be
useful but I don't think we want to rush into a syntax for this until
we have a better sense for the required semantics.

This subset() situation is a case where you probably wouldn't want
non-executable declarations only -- if someone really insists on using
subset() and assuming a data frame contains variables by certain
specific names it would probably be best to test for that (at which
point variables by those names can be defined and avoid the false
positive issue).

luke

On Fri, 13 Aug 2010, Martin Maechler wrote:

> On Fri, Aug 13, 2010 at 18:57, Duncan Murdoch <murdoch.duncan@gmail.com> wrote:
>> On 13/08/2010 12:26 PM, Thomas Lumley wrote:
>>>
>>> On Fri, 13 Aug 2010, Martin Maechler wrote:
>>>
>>>>>>>>> Prof Brian Ripley <ripley@stats.ox.ac.uk>
>>>>>>>>>     on Fri, 13 Aug 2010 07:49:09 +0100 (BST) writes:
>>>>
>>>>    > The problem is the non-standard scoping of subset(), which is a
>>>>    > convenience wrapper not intended to be use in programming.
>>>>
>>>>    > Using the conventional extraction operators (e.g. [) is safer and
>>>>    > avoids the NOTE.
>>>>
>>>> This is correct in the current case.
>>>>
>>>> However, there are many such NOTEs,
>>>> and I'd claim that many are FP indeed,
>>>> as they happen often when programmerRs use  expressions,
>>>> and I'd argue I would not want to say
>>>>     "not intended to be used in programming".
>>>>
>>>> So, indeed, it would be convenient to be able to specify to the
>>>> codetools, that
>>>> "yes, I use expressions here, and yes, of course, they will
>>>> contain some variables that are unbound in the current environment".
>>>
>>> Martin,
>>>
>>> That's why these are notes, not even warnings.   I still expect that for
>>> most people, most of the notes will indicate errors.  That was certainly
>>> true for my 'survey' package, which relies quite heavily on rewriting
>>> expressions.
>>>  My experience of false positives for this check is more often code
>>> guarded by an if(), where CMD check doesn't know that the if() test is
>>> adequate.  For example, I have separate code to handle different versions of
>>> the hexbin package, and the code for the version that isn't present gives a
>>> false positive test.
>>>
>>
>> All of that is true, but I think Martin's point is still valid.  It would be
>> nice to be able to suppress warnings, without changing the actual code.
>>  (I.e. you could suppress the subset warning by creating a variable named
>> "column", but that would be a bad idea.)  If codetools could accept some
>> directives telling it Martin's message, it would mean you could suppress the
>> false-positive note in your code, and then if a note ever did show up, you'd
>> know it was worth investigating.
>
> Exactly!  I now have several packages that produce a dozen (or so)
> lines of such notes.
> The consequence now has been that I skim over the whole part, and have
> more than once overlooked
> "notes" that I should have taken note of ..
>
>> I don't know a good way to specify those directives, but I think it would be
>> reasonable to add a general mechanism to the parser to insert non-executable
>> information into the parsed result, and codetools could make use of it.
>
> that would be useful and progress, indeed..
>
> Thank you, Duncan.
>
> _______________________________________________
> R-core list: https://stat.ethz.ch/mailman/listinfo/r-core
>


-- 
Luke Tierney
Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke@stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu