I'm trying to read a string as a vector with scan(), and the string consists of Chinese Characters seperated by spaces, but the newest version of R seems to have a bug. For instance, the input is: > scan(text="R语言 是 一门 统计 专用 语言",what="character",encoding="UTF-8") which should be seperated into 6 words by 5 spaces, but the output is: Read 4 items [1] "R语言" "是 一门 统计" "专用" "语言" I found this bug on R 3.1.1 and R 3.1.0 (both 32&64bit versions of R) on Windows 7 64bit. For R 3.0.3 (32&64 bit) on Windows 7, or R 3.1.0 (64bit) on Ubuntu 14.04, the function just works normally and returns 6 words. For strings all in english, this function also works normally. I finished my work with strsplit() function, but can anyone check this issue and fix it? Thanks!