Bug 16064 - Writing UTF8 on Windows
Summary: Writing UTF8 on Windows
Status: NEW
Alias: None
Product: R
Classification: Unclassified
Component: I/O (show other bugs)
Version: R 3.1.2
Hardware: Other Windows 32-bit
: P5 enhancement
Assignee: R-core
Depends on:
Reported: 2014-11-07 21:51 UTC by Jeroen Ooms
Modified: 2014-11-07 21:51 UTC (History)
0 users

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Jeroen Ooms 2014-11-07 21:51:21 UTC
See this post [1] on r-devel for more details. I think the following two examples indicate bugs.

1) When writing a UTF8 string to a binary UTF8 connection on windows, it generates Latin1:

> string <- enc2utf8("Zürich")
> Encoding(string)
[1] "UTF-8"

> con <- file("test1.txt", open="wb", encoding = "UTF-8")
> writeLines(string, con)
> close(con)
> system("file test1.txt")
test1.txt: ISO-8859 text

2) Another probably related problem: when writing a UTF8 string to a UTF8 text  connection with useBytes=TRUE, the string seems to be re-coded one too many times resulting in invalid characters:

> con <- file("test3.txt", open="w", encoding = "UTF-8")
> writeLines(string, con, useBytes = TRUE)
> close(con)
> system("file test3.txt")
test3.txt: UTF-8 Unicode text, with CRLF line terminators
> readLines("test3.txt", encoding="UTF-8")
[1] "Zürich

[1] https://stat.ethz.ch/pipermail/r-devel/2014-October/069944.html