Bug 17329 - translateCharUTF8 broken on Windows
Summary: translateCharUTF8 broken on Windows
Status: UNCONFIRMED
Alias: None
Product: R
Classification: Unclassified
Component: Windows GUI / Window specific (show other bugs)
Version: R 3.4.1
Hardware: Other Other
: P5 minor
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2017-08-24 17:31 UTC by Patrick Perry
Modified: 2017-08-24 21:41 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Patrick Perry 2017-08-24 17:31:47 UTC
On Windows,

enc2utf8("ΓΌ")

yields "|".

It's telling that the UTF-16 representation of the input is 00 FC, and the
UTF-8 representation of the output is 7C.

I think that line sysutils.c line 1001:
 inbuf = ans; inb = strlen(inbuf);

 (https://github.com/wch/r-source/blob/trunk/src/main/sysutils.c#L1001)


Should be
 inbuf = ans; inb = LENGTH(x);

like the analogous line in do_iconv (https://github.com/wch/r-source/blob/trunk/src/main/sysutils.c#L680 ).
Comment 1 Patrick Perry 2017-08-24 17:32:41 UTC
More info: https://github.com/juliasilge/tidytext/issues/80
Comment 2 Patrick Perry 2017-08-24 21:41:29 UTC
Even more info:

https://github.com/patperry/r-corpus/issues/5


And a work-around implementation for translateCharUTF8:

https://github.com/patperry/r-corpus/blob/master/src/utf8.c#L755