Bug 14431 - Invalid back-reference in sub(pcre=TRUE) may cause R to terminate
Invalid back-reference in sub(pcre=TRUE) may cause R to terminate
Status: CLOSED FIXED
Product: R
Classification: Unclassified
Component: Misc
R 2.12.0 patched
ix86 (32-bit) Windows 32-bit
: P5 normal
Assigned To: R-core
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-11-05 18:28 UTC by Brian Diggs
Modified: 2014-02-16 11:43 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Brian Diggs 2010-11-05 18:28:49 UTC
A back-reference greater than the number of referred sub-expressions in a regular expression may cause R to exit with the error "Process R trace trap at " and the date/time stamp.  This behavior is present in both R 2.12.0 and R 2.12.0 Patched (2010-11-04 r53526).

Although no specific behavior is defined if there are more back-references than referred sub-expressions in the regular expression, termination of the R process does not seem appropriate. In some cases, the back-reference is substituted with something that results in a blank string; this would be reasonable behavior to expect.  An error would also be reasonable behavior.

I have not attempted to test this on a platform other than Windows, so I do not know if it is Windows specific.

Code to reproduce:

x <- paste(letters, collapse="")
regex <- "([[:alpha:]]).*"
sub(regex, "\\1", x, perl=TRUE)
sub(regex, "\\3", x, perl=TRUE)
sub(regex, "\\2", x, perl=TRUE)


Result of code (with sessionInfo at beginning) on R 2.12.0:

> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
> x <- paste(letters, collapse="")
> regex <- "([[:alpha:]]).*"
> sub(regex, "\\1", x, perl=TRUE)
[1] "a"
> sub(regex, "\\3", x, perl=TRUE)
[1] ""
> sub(regex, "\\2", x, perl=TRUE)

Process R trace trap at Fri Nov 05 10:08:40 2010


Result of code on R 2.12.0 Patched (r53526):

> sessionInfo()
R version 2.12.0 Patched (2010-11-04 r53526)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
> x <- paste(letters, collapse="")
> regex <- "([[:alpha:]]).*"
> sub(regex, "\\1", x, perl=TRUE)
[1] "a"
> sub(regex, "\\3", x, perl=TRUE)
[1] ""
> sub(regex, "\\2", x, perl=TRUE)

Process R trace trap at Fri Nov 05 09:55:49 2010
Comment 1 Brian Ripley 2010-11-08 09:01:27 UTC
The subject line failed to mention that this was
- for sub()
- for pcre = TRUE
both of which are crucial.

It used unitialized memory so was platform-dependent.

Changed in 2.12.0 patched.
Comment 2 Jackie Rosen 2014-02-16 11:43:37 UTC
(spam comment removed)