Bug 16232 - R cannot display foreign characters (reading UTF-8 text)
Summary: R cannot display foreign characters (reading UTF-8 text)
Status: NEW
Alias: None
Product: R
Classification: Unclassified
Component: Windows GUI / Window specific (show other bugs)
Version: R 3.1.2
Hardware: x86_64/x64/amd64 (64-bit) Windows 64-bit
: P5 major
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2015-03-03 11:59 UTC by mlinchits
Modified: 2015-03-03 11:59 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description mlinchits 2015-03-03 11:59:44 UTC
This problem has existed for years. I know this is not a trivial fix, but its really a pain to work with foreign text in R under windows. ***This is a Windows only problem***. R will typically read foreing characters correctly, but it will not display them in the output. This problem exists regardless of file format, language, structure etc. If the character is foreign, the best R can do is show the UTF-encoding. The problem has been widely observed and documented over the years.

stndard code to reproduce the problem:

raw_table1 <- read.csv("UTF8_nobom_cyrillic.csv", header = FALSE, sep = ",", quote = "\"", dec = ".", fill = TRUE, comment.char = "", encoding = "UTF-8")

In the best case scenario youll get somethibg like this:

<U+041C><U+0438><U+043D><U+0435><U+043C><U+0443><U+043C>(...)

more info about the problem on stackoverflow:

http://stackoverflow.com/questions/18789330/r-on-windows-character-encoding-hell

Is there any hope that R on Windows will ever fully support foreign text?