Bug 16732 - Unicode characters are garbled when printing function
Summary: Unicode characters are garbled when printing function
Alias: None
Product: R
Classification: Unclassified
Component: Windows GUI / Window specific (show other bugs)
Version: R 3.2.3
Hardware: x86_64/x64/amd64 (64-bit) Windows 64-bit
: P5 minor
Assignee: R-core
Depends on:
Reported: 2016-02-27 10:48 UTC by Michio Oguro
Modified: 2017-06-25 08:22 UTC (History)
2 users (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Michio Oguro 2016-02-27 10:48:15 UTC
In windows version of Rgui, Unicode characters (I tested using Japanese) in a function are garbled when printing the function in console.


# This can work
> print("日本語")
[1] "日本語"

# This makes garbled characters
> function(){"日本語"}
function() {'譌・譛ャ隱
Comment 1 Duncan Murdoch 2017-05-22 14:03:13 UTC
This is a response to your bug posting in Feb, 2016.

Would you be able to test your examples in a current version of R-devel?  When I run them on MacOS things are fine, and in Windows in a Latin1 locale, I get

> function(){"日本語"}

which I think is an acceptable result, because those escapes match the Japanese characters.  If I run that function it prints the result as 

[1] "日本語"

which is what you'd want.  However, you are probably running Windows in a different locale, so it would be helpful to see what results you get.
Comment 2 Michio Oguro 2017-05-25 12:00:50 UTC
Thank you for your reply.

I tried following code on R Under development (unstable) (2017-05-22 r72718) -- "Unsuffered Consequences" on Windows 10.
It still print garbled characters

> test = function(){"日本語"}
> test()
[1] "日本語"
> test

My locale information is as follow:

> Sys.getlocale()
[1] "LC_COLLATE=Japanese_Japan.932;LC_CTYPE=Japanese_Japan.932;LC_MONETARY=Japanese_Japan.932;LC_NUMERIC=C;LC_TIME=Japanese_Japan.932"

If you need other information on my system, please feel free to ask me.
Comment 3 Duncan Murdoch 2017-06-20 18:13:50 UTC
I think there are likely two problems here.

First, in your locale, Japanese characters aren't stored in UTF-8, a different encoding is used, code page 932.  This is a one or two byte encoding based on Shift JIS, according to Wikipedia.

The first problem is that R's parser is not counting bytes properly.  It is counting each character as a single byte, when the Japanese chars each take two bytes.

The second problem is with the source references.  At some point the string is converted to UTF-8, but not marked as UTF-8, so the print routines interpret it as code page 932, and you get the garbage output.

I will try to track down and fix both problems, but it may take a while.  In the meantime, if you want to display your functions properly, you can simply remove source references and I think you will avoid both problems.  For example,

> Sys.setlocale("LC_CTYPE", "Japanese")
> test <- function(){"日本語"}
> test
> test <- removeSource(test)
> test
function () 
Comment 4 Duncan Murdoch 2017-06-22 21:38:13 UTC
Both problems should now be fixed in R-devel and R-patched (to become 3.4.1 soon).  Please test quickly; if there are problems, I don't want them to make it into the release.
Comment 5 Michio Oguro 2017-06-23 16:06:05 UTC
Thank you for the reply and fix.
I tried on both 64bit and 32bit version of R 2017-06-21 r72824, but unfortunately it still produces same garbage.

> function(){"日本語"}

> removeSource(function(){"日本語"})
function () 
Comment 6 Duncan Murdoch 2017-06-23 16:30:33 UTC
Sorry, I should have said:  you need R-devel r72839 or R-patched (3.4.1 beta) r72840.  The next nightly build should have the fix.
Comment 7 Michio Oguro 2017-06-25 08:22:26 UTC
I tried on R devel 2017-06-23 r72852 on both 32-bit and 64-bit version.
Now it worked well!!

> function(){"日本語"}

Thanks for your great effort!