Bug 16543 - Writing UTF-8 strings to stderr under Windows causes incorrect display
Summary: Writing UTF-8 strings to stderr under Windows causes incorrect display
Status: UNCONFIRMED
Alias: None
Product: R
Classification: Unclassified
Component: I/O (show other bugs)
Version: R-devel (trunk)
Hardware: Other Windows 64-bit
: P5 minor
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2015-09-22 18:34 UTC by Richard Cotton
Modified: 2016-05-24 13:45 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Cotton 2015-09-22 18:34:04 UTC
Under Windows (tested with Win7), printing of UTF-8 strings to stderr causes many characters to not display properly.

To reproduce:

# a euro symbol, followed by cyrillic, hebrew, arabic, 
# chinese, japanese, and korean characters
x <- "\u20ac \u0434 \u05E9 \u0645 \u60A8 \u306F \ub124"
cat(x, file = stderr())
## € <U+0434> <U+05E9> <U+0645> <U+60A8> <U+306F> <U+B124>

That is, only the euro symbol prints correctly.

Writing to stdout works as expected.

cat(x, file = stdout())
## € д ש م 您 は 네

This is problematic for error messages with translations into non-European languages.
Comment 1 Richard Cotton 2015-09-28 10:14:43 UTC
The problem disappears if the LC_CTYPE locale has been set, for example, under a Korean locale the Korean character displays correctly. 

Sys.setlocale("LC_TYPE", "Korean_Korea")
cat(x, file = stderr())
## € д <U+05E9> <U+0645> <U+60A8> は

Howver, since printing is always correct when the target is stdout, it suggests that there is still a problem with character encodings when printing to stderr.
Comment 2 Peter Meissner 2016-05-24 13:44:34 UTC
I get the same results on Win10 machine:

cat(x, file = stderr())
## € <U+0434> <U+05E9> <U+0645> <U+60A8> <U+306F> <U+B124>

cat(x, file = stdout())
## € д ש م 您 は 네


Though I cannot switch the locale:

Sys.setlocale("LC_TYPE", "Korean_Korea")
## Error in Sys.setlocale("LC_TYPE", "Korean_Korea") : invalid 'category' argument


System: 

R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252
Comment 3 Peter Meissner 2016-05-24 13:45:39 UTC
I get the same results on Win10 machine:

cat(x, file = stderr())
## € <U+0434> <U+05E9> <U+0645> <U+60A8> <U+306F> <U+B124>

cat(x, file = stdout())
## € д ש م 您 は 네


Though I cannot switch the locale:

Sys.setlocale("LC_TYPE", "Korean_Korea")
## Error in Sys.setlocale("LC_TYPE", "Korean_Korea") : invalid 'category' argument


System: 

R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252