Bug 14622

Summary: grepl does not work for some characters in Japanese Windows
Product: R Reporter: George Yoshida <dynkin>
Component: Low-levelAssignee: R-core <R-core>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P5    
Version: R 2.13.0   
Hardware: x86_64/x64/amd64 (64-bit)   
OS: Windows 64-bit   

Description George Yoshida 2011-07-02 04:40:12 UTC
Following regex pattern does not work.

> grepl("ー", c("a", "b"))
Error in grepl("ー", c("a", "b")) : 
  invalid regular expression 'ー', reason 'Missing ']''


# OS/locale info
OS Windows 7 64-bit
Japanese Environment
Charset : CP932

---

In the CP932 charset, "ー"(double byte one character) is '\x81\x5b',
and '\x5b' in ascii is '['.
Comment 1 Brian Ripley 2011-11-04 12:46:42 UTC
Seems specific to DBCS character sets, and is using third-party code (TRE).
Would need a Japanese-language-enabled Windows to reproduce.
Comment 2 Brian Ripley 2011-11-05 06:48:27 UTC
So this can be reproduced in European Windows by

env LC_CTYPE=ja Rterm

> grepl("\x81\x5b", c("a", "b"))

Fixed for 2.14.0 patched.
Comment 3 George Yoshida 2012-01-01 12:46:53 UTC
(In reply to comment #2)
> So this can be reproduced in European Windows by
> 
> env LC_CTYPE=ja Rterm
> 
> > grepl("\x81\x5b", c("a", "b"))
> 
> Fixed for 2.14.0 patched.

Thank you for your fix.
Tested with 2.14.1 and it works just fine!