I'm trying to read a string as a vector with scan(), and the string consists of Chinese Characters seperated by spaces, but the newest version of R seems to have a bug.
For instance, the input is:
> scan(text="R语言 是 一门 统计 专用 语言",what="character",encoding="UTF-8")
which should be seperated into 6 words by 5 spaces, but the output is:
Read 4 items
 "R语言" "是 一门 统计" "专用" "语言"
I found this bug on R 3.1.1 and R 3.1.0 (both 32&64bit versions of R) on Windows 7 64bit.
For R 3.0.3 (32&64 bit) on Windows 7, or R 3.1.0 (64bit) on Ubuntu 14.04, the function just works normally and returns 6 words.
For strings all in english, this function also works normally.
I finished my work with strsplit() function, but can anyone check this issue and fix it? Thanks!