Bug 14935 - xgettext2pot() extracts wrong strings from code to template
xgettext2pot() extracts wrong strings from code to template
Status: CLOSED FIXED
Product: R
Classification: Unclassified
Component: Misc
R 2.15.0 patched
x86_64/x64/amd64 (64-bit) Linux
: P5 minor
Assigned To: R-core
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-01 18:05 UTC by Mikko Korpela
Modified: 2014-04-15 20:04 UTC (History)
1 user (show)

See Also:


Attachments
Minimal R package "Foo1" (1.68 KB, application/x-gzip)
2012-06-01 18:05 UTC, Mikko Korpela
Details
Minimal R package "Foo2" (1.69 KB, application/x-gzip)
2012-06-01 18:06 UTC, Mikko Korpela
Details
Minimal R package "Foo3" (1.68 KB, application/x-gzip)
2012-06-01 18:07 UTC, Mikko Korpela
Details
Minimal R package "Foo4" (1.69 KB, application/x-gzip)
2012-06-01 18:07 UTC, Mikko Korpela
Details
Minimal R package "Foo5" (1.69 KB, application/x-gzip)
2012-06-01 18:07 UTC, Mikko Korpela
Details
Minimal R package "Foo6" (999 bytes, application/x-gzip)
2012-06-06 15:16 UTC, Mikko Korpela
Details
Proposed patch (2.16 KB, patch)
2012-06-06 15:17 UTC, Mikko Korpela
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mikko Korpela 2012-06-01 18:05:21 UTC
Created attachment 1301 [details]
Minimal R package "Foo1"

Overview:

When creating translation templates with the method described in section "1.9.4 Makefile support" of the "Writing R Extensions" manual, wrong strings are found in the resulting template (.pot) file. Sometimes, the value of the "domain" parameter (which should not be translated) is found in addition to the actual text (which should be translated). Sometimes, the actual text is not found in the template at all, but the spurious "domain" is. The problem can be traced to function xgettext2pot(), and further to xgettext(), in package "tools" (base package).

Steps to Reproduce:

Packages Foo1, Foo2, Foo3, Foo4, and Foo5 (attached) are dummy packages with one function, foo(). The source code of the functions, different in each package, is attached below. Things that vary between the packages:
- gettext() or gettextf()
- gettext() / gettextf() inside or outside stop()
- order of arguments in the call to gettextf()
* Foo1:
foo <- function() {
    abc <- gettext("Foo", domain="R-Foo1")
    stop(abc, domain=NA)
}
* Foo2:
foo <- function() {
    stop(gettext("Foo", domain="R-Foo2"), domain=NA)
}
* Foo3:
foo <- function() {
    stop(gettextf("Foo %d", 5, domain="R-Foo3"), domain=NA)
}
* Foo4:
foo <- function() {
    stop(gettextf(domain="R-Foo4", fmt="Foo %d", 5), domain=NA)
}
* Foo5:
foo <- function() {
    abc <- gettextf(domain="R-Foo5", fmt="Foo %d", 5)
    stop(abc, domain=NA)
}

To reproduce, follow the instructions in "1.9.4 Makefile support":
Run
  make pkg-update PKG=pkg PKGDIR=pkgdir
in
  R_BUILD_DIR/po
for PKG in c("Foo1", "Foo2", "Foo3", "Foo4", "Foo5").
In the attached packages, the resulting files "po/R-FooX.pot", X in 1:5, are already included, produced by R version 2.15.0 Patched (2012-05-31 r59485). If you want to reproduce the bug, the files in the package archives must be extracted, and "make pkg-update" run with PKGDIR set to the location of the extracted archive.

Only the msgid / ids of the actual message(s) are shown in the results below.

Actual Results:
* R-Foo1.pot:
msgid "Foo"

* R-Foo2.pot:
msgid "Foo"
msgid "R-Foo2"

* R-Foo3.pot:
msgid "Foo %d"

* R-Foo4.pot:
msgid "R-Foo4"

* R-Foo5.pot:
msgid "Foo %d"

Expected Results (how they differ from Actual Results):
* R-Foo2.pot (remove spurious occurrence of the domain):
msgid "Foo"

* R-Foo4.pot (replace spurious domain with the actual text):
msgid "Foo %d"

In the other cases, actual results were as expected

Build Date & Platform:
> R.version
               _                                           
platform       x86_64-unknown-linux-gnu                    
arch           x86_64                                      
os             linux-gnu                                   
system         x86_64, linux-gnu                           
status         Patched                                     
major          2                                           
minor          15.0                                        
year           2012                                        
month          05                                          
day            31                                          
svn rev        59485                                       
language       R                                           
version.string R version 2.15.0 Patched (2012-05-31 r59485)
nickname       Easter Beagle
Comment 1 Mikko Korpela 2012-06-01 18:06:36 UTC
Created attachment 1302 [details]
Minimal R package "Foo2"
Comment 2 Mikko Korpela 2012-06-01 18:07:00 UTC
Created attachment 1303 [details]
Minimal R package "Foo3"
Comment 3 Mikko Korpela 2012-06-01 18:07:22 UTC
Created attachment 1304 [details]
Minimal R package "Foo4"
Comment 4 Mikko Korpela 2012-06-01 18:07:41 UTC
Created attachment 1305 [details]
Minimal R package "Foo5"
Comment 5 Mikko Korpela 2012-06-06 15:16:11 UTC
Created attachment 1310 [details]
Minimal R package "Foo6"
Comment 6 Mikko Korpela 2012-06-06 15:17:09 UTC
Created attachment 1311 [details]
Proposed patch
Comment 7 Mikko Korpela 2012-06-06 15:18:39 UTC
I attached a proposed bug fix which solves the problems reported above. I believe that the patch does not break anything. The affected file is src/library/tools/R/xgettext.R

I found and fixed a related bug in the extraction of arguments "msg1" and "msg2" of ngettext() by xgettext2pot(). Let's call this bug 2. It was marked with a "FIXME" comment in xgettext.R.

Also, a duplicate entry with an empty msgid would be produced if "ordinary translations", i.e. not due to ngettext(), were not found. Let's call this bug 3. The patch also corrects this.

A combined example of bugs 2 and 3, in the style of the previously attached minimal example packages, is in the attached package "Foo6". Results of running "make pkg-update PKG=Foo6 PKGDIR=/path/to/pkg":

Actual Results:
msgid ""
msgstr ""

Expected Results:
msgid        "message 1"
msgid_plural "message 2"

In addition to the unexpected results in the .pot file, the following error message was produced by a non-patched R:

creating R-Foo6.pot and translations
  en@quot:/usr/bin/msgfmt: R-en@quot.po: warning: PO file header missing or invalid
                               warning: charset conversion will not work
/usr/bin/msgfmt: found 1 fatal error
 done.

After applying the bug fix, the error is eliminated.

Build Date & Platform (relevant parts of R were unchanged between r59485 and r59517):
> R.version
               _                                           
platform       x86_64-unknown-linux-gnu                    
arch           x86_64                                      
os             linux-gnu                                   
system         x86_64, linux-gnu                           
status         Patched                                     
major          2                                           
minor          15.0                                        
year           2012                                        
month          06                                          
day            04                                          
svn rev        59517                                       
language       R                                           
version.string R version 2.15.0 Patched (2012-06-04 r59517)
nickname       Easter Beagle
Comment 8 Duncan Murdoch 2012-06-13 14:06:50 UTC
Thanks for the patches.  

BTW, we generally apply patches to the trunk version (R-devel), and then port them to the branches.  There were several changes to that file in the trunk since the version you were working from, but they didn't affect the patch.
Comment 9 Mikko Korpela 2012-06-15 15:29:46 UTC
Thanks for fixing this!

Good that you noticed an error in my patch: when arguments 'msg1' and 'msg2' are correctly identified, of course it is still necessary to check that they have character values. Sorry. This would have introduced more spurious strings to the translation template.

I noticed that no changes were made to find_strings2(), up to and including snapshot R-devel_2012-06-14, svn revision 59564. Therefore, the bugs demonstrated in packages "Foo2" and "Foo4" still remain. I am just trying to make sure you did not miss that part of the patch. Thanks again!
Comment 10 Duncan Murdoch 2012-06-15 20:59:40 UTC
On 12-06-15 10:29 AM, r-bugs@r-project.org wrote:
> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14935
>
> --- Comment #9 from Mikko Korpela<mvkorpel@iki.fi>  2012-06-15 10:29:46 EDT ---
> Thanks for fixing this!
>
> Good that you noticed an error in my patch: when arguments 'msg1' and 'msg2'
> are correctly identified, of course it is still necessary to check that they
> have character values. Sorry. This would have introduced more spurious strings
> to the translation template.
>
> I noticed that no changes were made to find_strings2(), up to and including
> snapshot R-devel_2012-06-14, svn revision 59564. Therefore, the bugs
> demonstrated in packages "Foo2" and "Foo4" still remain. I am just trying to
> make sure you did not miss that part of the patch. Thanks again!
>


I thought that I applied the patch, but perhaps some of it was missed.  
I'll take another look later.

Duncan Murdoch


Comment 11 Mikko Korpela 2012-06-25 09:03:12 UTC
I see that it has been completely fixed for about one week already. Thanks for the quick response and good work!

- Mikko
Comment 12 Wade Colson 2014-04-15 20:01:15 UTC
*** Bug 260998 has been marked as a duplicate of this bug. ***
Seen from the domain http://volichat.com
Page where seen: http://volichat.com/adult-chat-rooms
Marked for reference. Resolved as fixed @bugzilla.