Bug 16484 - regexpr() capture is nondeterministic when text contains NA
Summary: regexpr() capture is nondeterministic when text contains NA
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Low-level (show other bugs)
Version: R-devel (trunk)
Hardware: x86_64/x64/amd64 (64-bit) All
: P5 minor
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2015-07-23 16:26 UTC by Mikko Korpela
Modified: 2015-12-14 13:48 UTC (History)
0 users

See Also:


Attachments
Proposed patch (454 bytes, patch)
2015-08-07 12:09 UTC, Mikko Korpela
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mikko Korpela 2015-07-23 16:26:38 UTC
The result returned by regexpr() changes between repeated identical calls when the input 'text' contains NA_character_.

For example, repeated calls to

> regexpr("(.)", NA_character_, perl = TRUE)

return various numeric values for the attributes "capture.start" and "capture.length". The other parts of the result don't change.

I would expect the values of those attributes to stay the same (possibly NA_real_ or NA_integer_) in repeated calls. Of course, looking for patterns in a missing string is not supposed to return anything useful, but I think the result should be predictable.

This was tested on a computer with Ubuntu 14.04, in both UTF-8 and Latin-1 locales, running the following R versions:

> R.version
               _                                                 
platform       x86_64-unknown-linux-gnu                          
arch           x86_64                                            
os             linux-gnu                                         
system         x86_64, linux-gnu                                 
status         Under development (unstable)                      
major          3                                                 
minor          3.0                                               
year           2015                                              
month          07                                                
day            23                                                
svn rev        68728                                             
language       R                                                 
version.string R Under development (unstable) (2015-07-23 r68728)
nickname       Unsuffered Consequences                           

> R.version
               _                            
platform       x86_64-unknown-linux-gnu     
arch           x86_64                       
os             linux-gnu                    
system         x86_64, linux-gnu            
status                                      
major          2                            
minor          15.0                         
year           2012                         
month          03                           
day            30                           
svn rev        58871                        
language       R                            
version.string R version 2.15.0 (2012-03-30)
nickname                                    

and a computer with OS X 10.7.5, UTF-8 and Latin-1 locales, running the following R version:

> R.version
               _                           
platform       x86_64-apple-darwin10.8.0   
arch           x86_64                      
os             darwin10.8.0                
system         x86_64, darwin10.8.0        
status                                     
major          3                           
minor          2.1                         
year           2015                        
month          06                          
day            18                          
svn rev        68531                       
language       R                           
version.string R version 3.2.1 (2015-06-18)
nickname       World-Famous Astronaut
Comment 1 Mikko Korpela 2015-08-07 12:09:11 UTC
Created attachment 1879 [details]
Proposed patch

Patch for "R Under development (unstable) (2015-08-07 r68892)". Initialize previously uninitialized variables to NA_integer_.
Comment 2 Brian Ripley 2015-08-11 09:00:55 UTC
Patch applies in R-devel and will be for 3.2.2 patched (pre-3.2.2 is in code freeze).