Bug 16520 - Chinese Locale breaks File Read with UTF-8
Summary: Chinese Locale breaks File Read with UTF-8
Status: UNCONFIRMED
Alias: None
Product: R
Classification: Unclassified
Component: I/O (show other bugs)
Version: R 3.2.1
Hardware: Other Windows 32-bit
: P5 enhancement
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2015-08-24 06:08 UTC by zjunksend
Modified: 2015-08-24 06:08 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description zjunksend 2015-08-24 06:08:55 UTC
I have a UTF-8 no BOM encoded file.txt made in notepad++:
日期,推广计划名称,浏览来源,点击转化率
2015-08-11,传播,计算机站内,0.01
2015-08-11,传播,多动站内,0.05
2015-08-12,传播,计算机站内,0.03
2015-08-12,传播,多动站内,0.09

I alternate locales while reading it in:
Sys.setlocale(category = "LC_CTYPE", locale = "eng")
myData1 <- read.csv("file.txt", encoding="UTF-8", check.names = FALSE, stringsAsFactors = FALSE)
Sys.setlocale(category = "LC_CTYPE", locale = "chs")
myData2 <- read.csv("file.txt", encoding="UTF-8", check.names = FALSE, stringsAsFactors = FALSE)
Sys.setlocale(category = "LC_CTYPE", locale = "eng")
myData3 <- read.csv("file.txt", encoding="UTF-8", check.names = FALSE, stringsAsFactors = FALSE)

myData1 and myData3 are good 4x4 data frames but myData2 is a mangled 2x6 data frame.

Is this a bug or am I missing something?

Initial discovery with R version 3.2.1 - Windows 7 32 bit.
Reproduced with R version 3.2.2 - Windows 7 32 bit.