Bug 17231 - R crashes when a source file contains escaped string at 4096 position
Summary: R crashes when a source file contains escaped string at 4096 position
Status: UNCONFIRMED
Alias: None
Product: R
Classification: Unclassified
Component: Low-level (show other bugs)
Version: R 3.3.*
Hardware: Other Other
: P5 enhancement
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2017-03-03 09:14 UTC by Vaidotas Zemlys
Modified: 2017-03-07 17:42 UTC (History)
1 user (show)

See Also:


Attachments
the source code which is mentioned in the bug report (4.01 KB, text/plain)
2017-03-03 09:14 UTC, Vaidotas Zemlys
Details
Suggested patch (4.38 KB, patch)
2017-03-07 17:36 UTC, Mikko Korpela
Details | Diff
Test program (3.30 KB, text/plain)
2017-03-07 17:42 UTC, Mikko Korpela
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Vaidotas Zemlys 2017-03-03 09:14:20 UTC
Created attachment 2232 [details]
the source code which is mentioned in the bug report

The R crashes when running 

R --file=bu4.txt 

The feature of the file is that it has a valid R code which contains a very long string. The string has an escaped quote at precisely 4096 position in the line. If additional character is added or removed in the string the R executes the file succesfully.

Note that the code is nonsensical, but nevertheless should be executed. I got this bug on R 3.3.0 on Mac OS X:

> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)

locale:
[1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base 

and on R 3.3.2 on Debian

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
Comment 1 Mikko Korpela 2017-03-07 17:36:08 UTC
Created attachment 2234 [details]
Suggested patch

The attachment is an attempt to fix this issue and related problems (spurious errors) in src/main/gram.y. Also src/main/gram.c needs to be regenerated. Testing with 'make check-devel' succeeds after patching, but I'm not familiar enough with the R parser to guarantee this patch is flawless. A small test program follows.
Comment 2 Mikko Korpela 2017-03-07 17:42:14 UTC
Created attachment 2235 [details]
Test program

This test program detects some false errors in the R parser. Tested on the following platform, also in latin1 and C locales. Will probably not work on Windows due to "\U" Unicode notation being used. With the patched version, all 128 tests run OK. Without the patch, there are 58 (not unique) failures.

> sessionInfo()
R Under development (unstable) (2017-03-06 r72315)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /home/mvkorpel/root_R-devel-r72315-gram/lib/R/lib/libRblas.so
LAPACK: /home/mvkorpel/root_R-devel-r72315-gram/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=fi_FI.UTF-8    LC_NUMERIC=C            LC_TIME=en_GB          
 [4] LC_COLLATE=en_GB        LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB      
 [7] LC_PAPER=en_GB          LC_NAME=C               LC_ADDRESS=C           
[10] LC_TELEPHONE=C          LC_MEASUREMENT=en_GB    LC_IDENTIFICATION=C    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.0 tools_3.4.0