Bug 16807 - sprintf seems to ignore some special characters in OSX and Linux
Summary: sprintf seems to ignore some special characters in OSX and Linux
Status: NEW
Alias: None
Product: R
Classification: Unclassified
Component: System-specific (show other bugs)
Version: R 3.2.4 revised
Hardware: x86_64/x64/amd64 (64-bit) OS X Mavericks
: P5 enhancement
Assignee: R-core
URL: https://stackoverflow.com/questions/3...
Depends on:
Blocks:
 
Reported: 2016-04-08 13:33 UTC by Robin
Modified: 2016-04-11 14:52 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Robin 2016-04-08 13:33:47 UTC
sprintf seems to ignore special characters - seemingly anything with a hat.  Problem reproduced on Mac 64bit El Capitan and Ubuntu 64 bit, but not on Windows machines tested.  This demonstrates:

> nchar(sprintf("%-20s", "Sao Paulo"))
[1] 20
> nchar(sprintf("%-20s", "São Paulo"))
[1] 19

Background SO question: https://stackoverflow.com/questions/36500467/sprintf-seems-to-ignore-some-special-characters
Comment 1 Peter Dalgaard 2016-04-08 14:23:00 UTC
Confirmed.

Variation on the same theme:

> sprintf("%13s", "blåbærgrød")
[1] "blåbærgrød"
> sprintf("%13s", "blabergrod")
[1] "   blabergrod"

It looks like something is counting bytes instead of characters.

As the docs say, this wraps the system sprintf C function and that is out of our control. However, it is a question whether we ought to be using the locale-extended version (sprintf_l), but that could be opening a can of worms if we want it to work cross-platform.
Comment 2 Martin Maechler 2016-04-08 14:37:55 UTC
If you read the help page of sprintf,  it talks about the fact Encodings are important.
If you look at the help page of nchar, you also learn that there are different types.

As a consequence, I see the following  (on Linux, R 3.3.0 beta): 

> nchars <- function(x) vapply(c("bytes","chars","width"), function(typ) nchar(x, type=typ), 1)
> sp <- "São Paulo"
> Encoding(sp)
[1] "UTF-8"
> nchars(sp)
bytes chars width 
   10     9     9 
> nchars(sprintf("%-20s", sp))
bytes chars width 
   20    19    19 
> 


So I'm claiming there is no bug at all. ... (but am not yet closing the report)
Comment 3 Suharto Anggono 2016-04-11 14:52:17 UTC
The help page of 'sprintf', in "Details" section near the end, says:
Field widths and precisions of %s conversions are interpreted as bytes, not characters, as described in the C standard.