Bug 16732 - Unicode characters are garbled when printing function
Summary: Unicode characters are garbled when printing function
Alias: None
Product: R
Classification: Unclassified
Component: Windows GUI / Window specific (show other bugs)
Version: R 3.2.3
Hardware: x86_64/x64/amd64 (64-bit) Windows 64-bit
: P5 minor
Assignee: Martin Maechler
Depends on:
Reported: 2016-02-27 10:48 UTC by Michio Oguro
Modified: 2017-10-06 16:03 UTC (History)
3 users (show)

See Also:

Transcript from RGui session with very recent R-devel (1.51 KB, text/plain)
2017-07-22 18:53 UTC, Martin Maechler

Note You need to log in before you can comment on or make changes to this bug.
Description Michio Oguro 2016-02-27 10:48:15 UTC
In windows version of Rgui, Unicode characters (I tested using Japanese) in a function are garbled when printing the function in console.


# This can work
> print("日本語")
[1] "日本語"

# This makes garbled characters
> function(){"日本語"}
function() {'譌・譛ャ隱
Comment 1 Duncan Murdoch 2017-05-22 14:03:13 UTC
This is a response to your bug posting in Feb, 2016.

Would you be able to test your examples in a current version of R-devel?  When I run them on MacOS things are fine, and in Windows in a Latin1 locale, I get

> function(){"日本語"}

which I think is an acceptable result, because those escapes match the Japanese characters.  If I run that function it prints the result as 

[1] "日本語"

which is what you'd want.  However, you are probably running Windows in a different locale, so it would be helpful to see what results you get.
Comment 2 Michio Oguro 2017-05-25 12:00:50 UTC
Thank you for your reply.

I tried following code on R Under development (unstable) (2017-05-22 r72718) -- "Unsuffered Consequences" on Windows 10.
It still print garbled characters

> test = function(){"日本語"}
> test()
[1] "日本語"
> test

My locale information is as follow:

> Sys.getlocale()
[1] "LC_COLLATE=Japanese_Japan.932;LC_CTYPE=Japanese_Japan.932;LC_MONETARY=Japanese_Japan.932;LC_NUMERIC=C;LC_TIME=Japanese_Japan.932"

If you need other information on my system, please feel free to ask me.
Comment 3 Duncan Murdoch 2017-06-20 18:13:50 UTC
I think there are likely two problems here.

First, in your locale, Japanese characters aren't stored in UTF-8, a different encoding is used, code page 932.  This is a one or two byte encoding based on Shift JIS, according to Wikipedia.

The first problem is that R's parser is not counting bytes properly.  It is counting each character as a single byte, when the Japanese chars each take two bytes.

The second problem is with the source references.  At some point the string is converted to UTF-8, but not marked as UTF-8, so the print routines interpret it as code page 932, and you get the garbage output.

I will try to track down and fix both problems, but it may take a while.  In the meantime, if you want to display your functions properly, you can simply remove source references and I think you will avoid both problems.  For example,

> Sys.setlocale("LC_CTYPE", "Japanese")
> test <- function(){"日本語"}
> test
> test <- removeSource(test)
> test
function () 
Comment 4 Duncan Murdoch 2017-06-22 21:38:13 UTC
Both problems should now be fixed in R-devel and R-patched (to become 3.4.1 soon).  Please test quickly; if there are problems, I don't want them to make it into the release.
Comment 5 Michio Oguro 2017-06-23 16:06:05 UTC
Thank you for the reply and fix.
I tried on both 64bit and 32bit version of R 2017-06-21 r72824, but unfortunately it still produces same garbage.

> function(){"日本語"}

> removeSource(function(){"日本語"})
function () 
Comment 6 Duncan Murdoch 2017-06-23 16:30:33 UTC
Sorry, I should have said:  you need R-devel r72839 or R-patched (3.4.1 beta) r72840.  The next nightly build should have the fix.
Comment 7 Michio Oguro 2017-06-25 08:22:26 UTC
I tried on R devel 2017-06-23 r72852 on both 32-bit and 64-bit version.
Now it worked well!!

> function(){"日本語"}

Thanks for your great effort!
Comment 8 Martin Maechler 2017-07-22 18:53:48 UTC
Created attachment 2277 [details]
Transcript from RGui session with very recent R-devel

Unfortunately, the patch that fixed this broke other print() functionality,
one could argue even more important one, so part of that patch has been reverted.

Using the most current R-devel for windows (64 bit) where the reversion is already active, reveals a different behavior than the one you saw,
but a "broken" one, too. 
To me it looks the bug now shows only when the source ref is used inside the printing of a function.... but still there's something more we should do to get this fixed.
Comment 9 Suharto Anggono 2017-10-06 16:03:17 UTC
If I am not wrong, a fix to 'for' loop in function 'PrintLanguageEtc' in print.c is using 'translateChar' instead of 'CHAR'. Element of 'deparse1w' result is in current locale encoding, but element from applying 'as.character' to "srcref" attribute may be not.