Bug 14408 - Incorrect results from the TRE engine
Summary: Incorrect results from the TRE engine
Alias: None
Product: R
Classification: Unclassified
Component: Low-level (show other bugs)
Version: R 2.12.0
Hardware: All All
: P5 minor
Assignee: R-core
: 14984 15943 (view as bug list)
Depends on:
Reported: 2010-10-15 05:58 UTC by Brian Ripley
Modified: 2017-03-09 14:54 UTC (History)
4 users (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Brian Ripley 2010-10-15 05:58:04 UTC
> grepl("^[[:alpha:]]{2,}", "123")
[1] TRUE
> grepl("^[[:alpha:]]{2,}", "123", perl = TRUE)

The PCRE result is clearly the correct one.

Unfortunately the TRE support is currently down (and has been since July),
so we are unable to ask the developer for help.  So I'm filing this for
the record.
Comment 1 Henric Winell 2011-09-07 06:41:39 UTC
More recent versions of R doesn't seem to suffer from this anymore:

> grepl("^[[:alpha:]]{2,}", "123")
> grepl("^[[:alpha:]]{2,}", "123", perl = TRUE)
> sessionInfo()
R version 2.13.1 Patched (2011-08-13 r56726)
Platform: x86_64-pc-mingw32/x64 (64-bit)

[1] LC_COLLATE=Swedish_Sweden.1252  LC_CTYPE=Swedish_Sweden.1252
[3] LC_MONETARY=Swedish_Sweden.1252 LC_NUMERIC=C
[5] LC_TIME=Swedish_Sweden.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
Comment 2 Brian Ripley 2011-09-07 18:05:50 UTC
Yes it does, in a UTF-8 locale.
Nothing has changed in the TRE implemntation.
Comment 3 Brian Ripley 2011-11-05 12:35:57 UTC
In an 8-bit locale this computes ranges as a mask, and the grammar is correct.
In a larger locale, the grammar it computes for [[:alpha:]]{1,} checks
the type of the character, but that computed for [[:alpha:]]{2,} never does.
Comment 4 Brian Ripley 2012-02-15 21:23:34 UTC
It now does this right for ASCII text even in a MBCS.  New UTF-8 example
> grepl("^[[:alpha:]]{2,}", "123")
> grepl("^[[:alpha:]]{2,}", "12£")
[1] TRUE
Comment 5 Brian Ripley 2012-07-11 17:31:52 UTC
*** Bug 14984 has been marked as a duplicate of this bug. ***
Comment 6 Brian Ripley 2014-03-02 21:13:19 UTC
Confirmed as still present in the git sources for TRE on 2014-03-02.
Comment 7 Brian Ripley 2014-08-24 20:42:23 UTC
*** Bug 15943 has been marked as a duplicate of this bug. ***
Comment 8 Mikko Korpela 2017-03-09 14:54:33 UTC
It appears this was fixed in r66731 (2014-10-08) and R 3.1.2, related to Bug 16009.