Bug 16737 - File connections write UTF-16 incorrectly in Windows
Summary: File connections write UTF-16 incorrectly in Windows
Alias: None
Product: R
Classification: Unclassified
Component: I/O (show other bugs)
Version: R 3.2.3
Hardware: Other Other
: P5 enhancement
Assignee: R-core
Depends on:
Reported: 2016-02-29 18:46 UTC by Duncan Murdoch
Modified: 2017-05-03 17:21 UTC (History)
0 users

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Duncan Murdoch 2016-02-29 18:46:05 UTC
This bug report follows discussion in the R-devel mailing list starting with this post:


In summary:  writing a file using 

x <- data.frame(a = I("a \" quote"), b = pi)
write.csv(x, file = "foo.csv", fileEncoding = "UTF-16LE")

produced bad results.  Prior to R-devel revision 70247 there was an issue with
strings being truncated at null bytes, but that has been fixed.  However, there
are still problems on Windows, because it will insert single byte \r characters
as it writes the file in text mode, leading to an invalid file.

There appear to be two approaches for a solution on Windows:  First, we could tell Windows the encoding as part of the mode argument when the output file was opened.  Then it would insert the correct two-byte \r character.

An alternative requires a bigger change, but I think would be better:  we could handle the \r insertions ourselves, rather than telling Windows to do it.  This would have the advantage that we would not be restricted to the limited set
of encodings that Windows text mode can handle (UNICODE, UTF-8, and UTF-16LE).  If all text file handling were done in R, we could make it easier for both Unix and Windows to write text files with line-endings in either format.  This would require adding an "eol" argument to a number of functions, e.g. to file().
Comment 1 Duncan Murdoch 2017-05-03 17:21:01 UTC
I've now put the simpler fix into R-devel rev 72650.