Bug 15012 - gsub() with regex "\\\\)" makes R eat memory and CPU
Summary: gsub() with regex "\\\\)" makes R eat memory and CPU
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Misc (show other bugs)
Version: R 2.15.1
Hardware: x86_64/x64/amd64 (64-bit) All
: P5 minor
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2012-08-08 16:33 UTC by Daniel Wollschlaeger
Modified: 2017-08-18 10:33 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Wollschlaeger 2012-08-08 16:33:08 UTC
The following command makes R eat up all available resources (memory, CPU) until the system freezes completely:

gsub("\\\\)", "", "A")

Tested with R 2.15.1 under Windows 7 and R 2.13.1 under Linux.

Windows 7 64bit build 7601, Service Pack 1:
platform       x86_64-pc-mingw32            
arch           x86_64                       
os             mingw32                      
system         x86_64, mingw32              
status                                      
major          2                            
minor          15.1                         
year           2012                         
month          06                           
day            22                           
svn rev        59607                        
language       R                            
version.string R version 2.15.1 (2012-06-22)
nickname       Roasted Marshmallows

Linux 3.0.0-23:
platform       i686-pc-linux-gnu            
arch           i686                         
os             linux-gnu                    
system         i686, linux-gnu              
status                                      
major          2                            
minor          13.1                         
year           2011                         
month          07                           
day            08                           
svn rev        56322                        
language       R                            
version.string R version 2.13.1 (2011-07-08)
Comment 1 Duncan Murdoch 2012-08-10 15:39:11 UTC
In R-patched on Win32, I get an "out of memory" error, not a freeze, but it's still something that should be fixed.  

I think the problem is in src/extra/tre/tre-parse.c, around line 1344, in this code:

	    case CHAR_RPAREN:  /* end of current subexpression */
	      if ((ctx->cflags & REG_EXTENDED && depth > 0)
		  || (ctx->re > ctx->re_start
		      && *(ctx->re - 1) == CHAR_BACKSLASH))


The "depth > 0" check only applies to the first part of the disjunction; I think it counts parenthesis depth.  With the bad expression, depth is 0, and goes more and more negative as we keep returning to this line.  (I'm not sure why we keep returning, so the bug may be somewhere else.)

I don't understand the code well enough to fix this, but hopefully this is a bit of help...
Comment 2 Duncan Murdoch 2012-08-10 15:42:49 UTC
(In reply to comment #1)
> In R-patched on Win32, I get an "out of memory" error, not a freeze, but it's
> still something that should be fixed.  
> 
> I think the problem is in src/extra/tre/tre-parse.c, around line 1344, in this
> code:
> 
>         case CHAR_RPAREN:  /* end of current subexpression */
>           if ((ctx->cflags & REG_EXTENDED && depth > 0)
>           || (ctx->re > ctx->re_start
>               && *(ctx->re - 1) == CHAR_BACKSLASH))
> 
> 
> The "depth > 0" check only applies to the first part of the disjunction; I
> think it counts parenthesis depth.  With the bad expression, depth is 0, and
> goes more and more negative as we keep returning to this line.  (I'm not sure
> why we keep returning, so the bug may be somewhere else.)

Oops, sorry, depth stays at zero in an "infinite" loop.


> 
> I don't understand the code well enough to fix this, but hopefully this is a
> bit of help...
Comment 3 Mikko Korpela 2017-03-09 07:40:51 UTC
For the record, I just made a pull request to the upstream TRE repository: https://github.com/laurikari/tre/pull/48

It seems to fix the issue (tested with R-devel r72320 on Linux, also 'make check-devel'). The patch is based on an educated guess, so I hope the pull request will receive some comments.
Comment 4 Tomas Kalibera 2017-08-18 10:33:43 UTC
I've spent a bit of time debugging the parser to convince myself I knew enough to fix it and then I came up without looking with the very same fix. I confirm the error is in the condition that handles the case that a subexpression ended up prematurely. The first part "(ctx->cflags & REG_EXTENDED && depth > 0)" handles extended regular expressions, which we use and where groups are marked by unquoted parentheses. The second part "(ctx->re > ctx->re_start && *(ctx->re - 1) == CHAR_BACKSLASH)" handles non-extended regular expressions where groups are marked by quoted parenthesis. The second part was accidentally used also for an extended regular expression, the parser got confused by the backslash before the parenthesis. The same code pattern is present when the normal group close is handled, and there the part for non-extended regular expressions is indeed prefixed by " !(ctx->cflags & REG_EXTENDED)" to avoid similar issues. In 73107.