Bug 17362 - Character corruption on installation (Japanese)
Summary: Character corruption on installation (Japanese)
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Language (show other bugs)
Version: R-devel (trunk)
Hardware: x86_64/x64/amd64 (64-bit) Windows 64-bit
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2017-12-05 09:58 UTC by Ryota Suzuki
Modified: 2017-12-10 10:52 UTC (History)
1 user (show)

See Also:


Attachments
Screenshots that contain corruptions. R 3.4.2 on the left (correct) and 3.4.3 on the right (corrupted). (286.05 KB, application/x-zip-compressed)
2017-12-05 09:58 UTC, Ryota Suzuki
Details
Screenshots with the 3.4.2 installer using the same installation system (29.42 KB, image/png)
2017-12-05 11:21 UTC, Ryota Suzuki
Details
Changed but still corrupted in another way (32.70 KB, image/png)
2017-12-06 04:00 UTC, Ryota Suzuki
Details
Screenshot after fix on the instlattion script (29.85 KB, image/png)
2017-12-06 10:59 UTC, Ryota Suzuki
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ryota Suzuki 2017-12-05 09:58:48 UTC
Created attachment 2303 [details]
Screenshots that contain corruptions. R 3.4.2 on the left (correct) and 3.4.3 on the right (corrupted).

This bug was found on the released version of R-3.4.3 64bit installer, but there was no appropriate choice on this Bugzilla so I chose R-devel (trunk).

Some (not all but many) Japanese sentences are completely corrupted on the installer. Please see the attachment that contains corrupted characters that I found. R 3.4.2 is shown on the left (correct) and 3.4.3 on the right (corrupted).
Comment 1 Jeroen Ooms 2017-12-05 10:54:41 UTC
Hi Royota, thank you for reporting. We recently switched to another build system and this may be a side effect.

Could you help us narrow down the problem by testing if this also appears in the R 3.4.2 installer that we built using the new system? You can download a copy from https://ftp.opencpu.org/archive/r-release/R-3.4.2/

Moreover could report if you find other internationalization issues in R 3.4.3, or if was it only the installer that had the problem?
Comment 2 Ryota Suzuki 2017-12-05 10:58:32 UTC
Hi Jeroen, thanks for your prompt answer.

Sure, I'll try 3.4.2 version soon. Currently only the installer has the i18n problem as far as I tried, but yes I will report another issue if exists.
Comment 3 Ryota Suzuki 2017-12-05 11:21:37 UTC
Created attachment 2304 [details]
Screenshots with the 3.4.2 installer using the same installation system

The same problem occurs with the new 3.4.2 installer.
Comment 4 Ryota Suzuki 2017-12-05 11:23:55 UTC
It seems that corrupted characters are actually in UTF-8 encoding but read as CP932 (default on Japanese Windows system). For example the code below shows the same result as 1.png that I uploaded at first.

> scan(file = "test.txt", what = character(), fileEncoding = "UTF-8")
Read 1 item
[1] "利用者向けインストール"

> scan(file = "test.txt", what = character(), fileEncoding = "cp932")
Read 1 item
[1] "蛻ゥ逕ィ閠"
Warning message:
In scan(file = "test.txt", what = character(), fileEncoding = "cp932") :
  invalid input found on input connection 'test.txt'

So the problem would be fixed if we convert the corrupted files into CP932 encoding (from UTF-8), or modify the build system settings to read them as UTF-8.

If there's anything I can help, I will try it.
Comment 5 Jeroen Ooms 2017-12-05 12:39:08 UTC
Turns out there are two flavors of InnoSetup: standard and unicode: http://www.jrsoftware.org/isdl.php

I have been using the standard version, which may be the problem. I am going to recreate the R installer using the unicode version of InnoSetup. This takes a while; will get back to you later today.
Comment 6 Brian Ripley 2017-12-05 12:49:11 UTC
From the manual:

'To make the installer package (R-3.4.3-win.exe) we currently require the Unicode version of Inno Setup 5.3.7 or later from http://jrsoftware.org/. This is not included in Rtools*.exe.'
Comment 7 Jeroen Ooms 2017-12-05 13:01:02 UTC
OK clear, so hopefully this fixes the issue. I had tested locally in English and Dutch so obviously encoding issues did not pop up.

The good thing is that we may just be able to update the R-3.4.3 installer on CRAN without a version bump or anything because the installed version of R is exactly identical. Only installer text is affected.
Comment 8 Jeroen Ooms 2017-12-05 17:20:53 UTC
OK we have fresh builds of r-release r-patched and r-devel. Ryota could you try this new installer and see if the problem is resolved: https://ftp.opencpu.org/archive/r-release/R-3.4.3/

Alternatively you can try r-devel or r-patched from: https://ftp.opencpu.org/current/
Comment 9 Ryota Suzuki 2017-12-06 04:00:27 UTC
Created attachment 2305 [details]
Changed but still corrupted in another way

Thank you, things have been changed, but are still corrupted in another way...

It seems to be identical to UTF-8 text read as Latin-1 (ISO-8859-1):

> scan(file = "test.txt", what = character(), fileEncoding = "latin1")
Read 1 item
[1] "å\u0088©ç\u0094¨è\u0080\u0085å\u0090\u0091ã\u0081\u0091ã\u0082¤ã\u0083³ã\u0082¹ã\u0083\u0088ã\u0083¼ã\u0083«"

(The above was run on Linux since it failed on Windows)
Comment 10 Ryota Suzuki 2017-12-06 04:06:33 UTC
I found information that Inno Setup treat a file as UTF-8 if it has the BOM.

https://stackoverflow.com/questions/38968230/inno-setup-reading-file-in-ansi-and-unicode-encoding/38969655#38969655

It's just a guess but it could answer the question why some sentences are displayed correctly while others are corrupted. Or there could be another setting parameter or etc.
Comment 11 Jeroen Ooms 2017-12-06 10:36:26 UTC
That may be a correct guess. I have pushed a small fix to the build script and ran yet another build just now. Could you try the new installer from https://ftp.opencpu.org/archive/r-release/R-3.4.3/ (with timestamp 2017-12-06 10:28)?
Comment 12 Ryota Suzuki 2017-12-06 10:54:39 UTC
Hi Jeroen, it works perfectly!
Comment 13 Ryota Suzuki 2017-12-06 10:59:27 UTC
Created attachment 2306 [details]
Screenshot after fix on the instlattion script
Comment 14 Jeroen Ooms 2017-12-06 11:53:55 UTC
OK fantastic. Thank you Ryota, for reporting this and your help in debugging it. I'll try to publish the fixed installer to CRAN asap.
Comment 15 Ryota Suzuki 2017-12-07 15:20:40 UTC
Thank you Jeroen, I really appreciate the fact that Windows installers are still available on CRAN.