Bug 16098 - Windows doesn't handle high Unicode code points
Summary: Windows doesn't handle high Unicode code points
Status: NEW
Alias: None
Product: R
Classification: Unclassified
Component: Low-level (show other bugs)
Version: R 3.1.2
Hardware: Other Windows 64-bit
: P5 enhancement
Assignee: Duncan Murdoch
URL:
Depends on:
Blocks:
 
Reported: 2014-12-04 21:27 UTC by Duncan Murdoch
Modified: 2014-12-07 17:18 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Duncan Murdoch 2014-12-04 21:27:36 UTC
On Windows,

 as.hexmode(utf8ToInt("\U1d4d0"))

returns

[1] "d4d0"

because the parser stores the 0x1d4d0 value in a wchar_t variable, which is only 16 bits, and it gets truncated.
Comment 1 Richard Cotton 2014-12-07 17:18:26 UTC
This behaviour is mentioned in r-lang 10.3.1:
http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Literal-constants

> \Unnnnnnnn \U{nnnnnnnn}
>     (where multibyte locales are supported and not on Windows, otherwise an error)

but not in ?Quotes, so an initial fix may be to simply update the documentation on that page.