Bug 17254 - utils::getParseData() returns wrong column numbers in case of multibyte symbols
Summary: utils::getParseData() returns wrong column numbers in case of multibyte symbols
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Language (show other bugs)
Version: R 3.3.*
Hardware: Other Other
: P5 enhancement
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2017-04-07 20:21 UTC by Yihui Xie
Modified: 2017-04-18 09:46 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Yihui Xie 2017-04-07 20:21:21 UTC
Here is a minimal example:

> getParseData(parse(text = 'γ <- α + β ', keep.source = TRUE))
   line1 col1 line2 col2 id parent       token terminal text
11     1    0     1    7 11      0        expr    FALSE     
1      1    0     1    0  1      3      SYMBOL     TRUE    γ
3      1    0     1    0  3     11        expr    FALSE     
2      1    2     1    3  2     11 LEFT_ASSIGN     TRUE   <-
10     1    4     1    7 10     11        expr    FALSE     
4      1    4     1    4  4      6      SYMBOL     TRUE    α
6      1    4     1    4  6     10        expr    FALSE     
5      1    6     1    6  5     10         '+'     TRUE    +
7      1    7     1    7  7      9      SYMBOL     TRUE    β
9      1    7     1    7  9     10        expr    FALSE     

You can see that γ's col1 is 0 and col2 is also 0. <- should be from column 3 to 4 instead of 2 to 3. α should be from 6 to 6 instead of 4 to 4, etc.

My session info:

> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.4

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
Comment 1 Duncan Murdoch 2017-04-18 09:46:42 UTC
Confirmed and fixed.  I'll commit to R-devel soon, and backport to R-patched after 3.4.0 is released.