Bug 16732 - Unicode characters are garbled when printing function
Summary: Unicode characters are garbled when printing function
Status: REOPENED
Alias: None
Product: R
Classification: Unclassified
Component: Windows GUI / Window specific (show other bugs)
Version: R 3.2.3
Hardware: x86_64/x64/amd64 (64-bit) Windows 64-bit
: P5 minor
Assignee: Martin Maechler
URL:
Depends on:
Blocks:
 
Reported: 2016-02-27 10:48 UTC by Michio Oguro
Modified: 2017-07-22 18:53 UTC (History)
3 users (show)

See Also:


Attachments
Transcript from RGui session with very recent R-devel (1.51 KB, text/plain)
2017-07-22 18:53 UTC, Martin Maechler
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michio Oguro 2016-02-27 10:48:15 UTC
In windows version of Rgui, Unicode characters (I tested using Japanese) in a function are garbled when printing the function in console.

e.g.

# This can work
> print("日本語")
[1] "日本語"

# This makes garbled characters
> function(){"日本語"}
function() {'譌・譛ャ隱
Comment 1 Duncan Murdoch 2017-05-22 14:03:13 UTC
This is a response to your bug posting in Feb, 2016.

Would you be able to test your examples in a current version of R-devel?  When I run them on MacOS things are fine, and in Windows in a Latin1 locale, I get

> function(){"日本語"}
function(){"\u65e5\u672c\u8a9e"}

which I think is an acceptable result, because those escapes match the Japanese characters.  If I run that function it prints the result as 

[1] "日本語"


which is what you'd want.  However, you are probably running Windows in a different locale, so it would be helpful to see what results you get.
Comment 2 Michio Oguro 2017-05-25 12:00:50 UTC
Thank you for your reply.

I tried following code on R Under development (unstable) (2017-05-22 r72718) -- "Unsuffered Consequences" on Windows 10.
It still print garbled characters

> test = function(){"日本語"}
> test()
[1] "日本語"
> test
function(){"譌・譛ャ隱

My locale information is as follow:

> Sys.getlocale()
[1] "LC_COLLATE=Japanese_Japan.932;LC_CTYPE=Japanese_Japan.932;LC_MONETARY=Japanese_Japan.932;LC_NUMERIC=C;LC_TIME=Japanese_Japan.932"

If you need other information on my system, please feel free to ask me.
Comment 3 Duncan Murdoch 2017-06-20 18:13:50 UTC
I think there are likely two problems here.

First, in your locale, Japanese characters aren't stored in UTF-8, a different encoding is used, code page 932.  This is a one or two byte encoding based on Shift JIS, according to Wikipedia.

The first problem is that R's parser is not counting bytes properly.  It is counting each character as a single byte, when the Japanese chars each take two bytes.

The second problem is with the source references.  At some point the string is converted to UTF-8, but not marked as UTF-8, so the print routines interpret it as code page 932, and you get the garbage output.

I will try to track down and fix both problems, but it may take a while.  In the meantime, if you want to display your functions properly, you can simply remove source references and I think you will avoid both problems.  For example,

> Sys.setlocale("LC_CTYPE", "Japanese")
> test <- function(){"日本語"}
> test
function(){"譌・譛ャ隱
> test <- removeSource(test)
> test
function () 
{
    "日本語"
}
Comment 4 Duncan Murdoch 2017-06-22 21:38:13 UTC
Both problems should now be fixed in R-devel and R-patched (to become 3.4.1 soon).  Please test quickly; if there are problems, I don't want them to make it into the release.
Comment 5 Michio Oguro 2017-06-23 16:06:05 UTC
Thank you for the reply and fix.
I tried on both 64bit and 32bit version of R 2017-06-21 r72824, but unfortunately it still produces same garbage.

> function(){"日本語"}
function(){"譌・譛ャ隱

> removeSource(function(){"日本語"})
function () 
{
    "日本語"
}
Comment 6 Duncan Murdoch 2017-06-23 16:30:33 UTC
Sorry, I should have said:  you need R-devel r72839 or R-patched (3.4.1 beta) r72840.  The next nightly build should have the fix.
Comment 7 Michio Oguro 2017-06-25 08:22:26 UTC
I tried on R devel 2017-06-23 r72852 on both 32-bit and 64-bit version.
Now it worked well!!

> function(){"日本語"}
function(){"日本語"}

Thanks for your great effort!
Comment 8 Martin Maechler 2017-07-22 18:53:48 UTC
Created attachment 2277 [details]
Transcript from RGui session with very recent R-devel

Unfortunately, the patch that fixed this broke other print() functionality,
one could argue even more important one, so part of that patch has been reverted.

Using the most current R-devel for windows (64 bit) where the reversion is already active, reveals a different behavior than the one you saw,
but a "broken" one, too. 
To me it looks the bug now shows only when the source ref is used inside the printing of a function.... but still there's something more we should do to get this fixed.