Bug 14408 - Incorrect results from the TRE engine
Incorrect results from the TRE engine
Status: ASSIGNED
Product: R
Classification: Unclassified
Component: Low-level
R 2.12.0
All All
: P5 minor
Assigned To: R-core
: 14984 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-10-15 05:58 UTC by Brian Ripley
Modified: 2014-03-02 21:13 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Brian Ripley 2010-10-15 05:58:04 UTC
> grepl("^[[:alpha:]]{2,}", "123")
[1] TRUE
> grepl("^[[:alpha:]]{2,}", "123", perl = TRUE)
[1] FALSE

The PCRE result is clearly the correct one.

Unfortunately the TRE support is currently down (and has been since July),
so we are unable to ask the developer for help.  So I'm filing this for
the record.
Comment 1 Henric Winell 2011-09-07 06:41:39 UTC
More recent versions of R doesn't seem to suffer from this anymore:

> grepl("^[[:alpha:]]{2,}", "123")
[1] FALSE
> grepl("^[[:alpha:]]{2,}", "123", perl = TRUE)
[1] FALSE
>
> sessionInfo()
R version 2.13.1 Patched (2011-08-13 r56726)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Swedish_Sweden.1252  LC_CTYPE=Swedish_Sweden.1252
[3] LC_MONETARY=Swedish_Sweden.1252 LC_NUMERIC=C
[5] LC_TIME=Swedish_Sweden.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
Comment 2 Brian Ripley 2011-09-07 18:05:50 UTC
Yes it does, in a UTF-8 locale.
Nothing has changed in the TRE implemntation.
Comment 3 Brian Ripley 2011-11-05 12:35:57 UTC
In an 8-bit locale this computes ranges as a mask, and the grammar is correct.
In a larger locale, the grammar it computes for [[:alpha:]]{1,} checks
the type of the character, but that computed for [[:alpha:]]{2,} never does.
Comment 4 Brian Ripley 2012-02-15 21:23:34 UTC
It now does this right for ASCII text even in a MBCS.  New UTF-8 example
> grepl("^[[:alpha:]]{2,}", "123")
[1] FALSE
> grepl("^[[:alpha:]]{2,}", "12£")
[1] TRUE
Comment 5 Brian Ripley 2012-07-11 17:31:52 UTC
*** Bug 14984 has been marked as a duplicate of this bug. ***
Comment 6 Brian Ripley 2014-03-02 21:13:19 UTC
Confirmed as still present in the git sources for TRE on 2014-03-02.