Bug 15087 - sub() behavior on HP-UX prevents installation
sub() behavior on HP-UX prevents installation
Status: CLOSED FIXED
Product: R
Classification: Unclassified
Component: Installation
R 2.15.1 patched
Other Other
: P5 minor
Assigned To: R-core
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-26 15:02 UTC by Bret Musser
Modified: 2014-03-11 06:26 UTC (History)
5 users (show)

See Also:


Attachments
Incomplete test case (757 bytes, text/x-csrc)
2013-01-07 23:09 UTC, Johannes Ranke
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bret Musser 2012-10-26 15:02:12 UTC
The sub() function works differently on HP-UX 11.31 than on other platforms, leading to a failure to build a complete R 2.14.2 or 2.15.2-prerelease distribution.

Example: 
HP-UX
> sub(".*\\.", "", "/path/to/file.ext", perl=FALSE) 
[1] "/path/to/fileext"
> sub(".*\\.", "", "/path/to/file.ext", perl=TRUE) 
[1] "ext"

As compared to MacOS:
> sub(".*\\.", "", "/path/to/file.ext", perl=FALSE) 
[1] "ext"
> sub(".*\\.", "", "/path/to/file.ext", perl=TRUE) 
[1] "ext"

The above excerpt was taken from the data() function, where the consequence is that example data files can't be found by the system.  This also affects the compilation of packages in install.R because SHLIB_LIBADD and SHLIB_LIBEXT are not properly defined.  

This is presumably a HP-UX issue.  I'm not sure which library is providing regexp parsing, but if there is a GNU library that I can substitute, please tell me as that may be easier than reworking the code.  

Platform is HP-UX 11.31, gcc 4.2.3, Intel F90.
Comment 1 The Written Word 2012-11-11 16:12:31 UTC
This is not only an HP-UX issue. Linux/arm has problems as well:
  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=679180
  https://bugs.launchpad.net/raspbian/+bug/1007014

The problem is the updated src/extra/tre in R-15. If you replace it with a copy of src/extra/tre from R-2.14 or older, then the problem goes away.
Comment 2 Brian Ripley 2012-12-03 17:44:12 UTC
If someone provides a tested patch containing a minimal set of changes needed we may have some progress.  There is no indication that this is not a compiler/OS error as all the common and uncommon R platforms work.

No evidence is presented that the Debian ARM issue is the same one.
Comment 3 Johannes Ranke 2012-12-08 01:39:28 UTC
(Mit Bezug zu comment 2)
> No evidence is presented that the Debian ARM issue is the same one.

Here is some evidence: I just ran into the same bug on Debian armel:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=695411

Thanks,

Johannes
Comment 4 Johannes Ranke 2012-12-08 01:49:50 UTC
Well, not evidence, indication. rather ..
Comment 5 Johannes Ranke 2012-12-13 21:38:04 UTC
(In reply to comment #2)
> If someone provides a tested patch containing a minimal set of changes needed
> we may have some progress.  There is no indication that this is not a
> compiler/OS error as all the common and uncommon R platforms work.

Here is evidence that the bug may depend on the compiler. I am showing the output of R 2.15.2 compiled on Debian stable and unstable using the armel platform.



R 2.15.2 compiled on Debian stable:

> sub(".*\\.", "", "/path/to/file.ext")
[1] "ext"

ranke@qnap:~/svn/r-backports/r-base-2.15.2$ gcc -v
Using built-in specs.
Target: arm-linux-gnueabi
Configured with: ../src/configure -v --with-pkgversion='Debian 4.4.5-8' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.4 --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --disable-sjlj-exceptions --enable-checking=release --build=arm-linux-gnueabi --host=arm-linux-gnueabi --target=arm-linux-gnueabi
Thread model: posix
gcc version 4.4.5 (Debian 4.4.5-8) 


R 2.15.2 compiled in a Debian unstable chroot:

> sub(".*\\.", "", "/path/to/file.ext")
[1] "/path/to/fileext"

(sid)ranke@qnap:~$ gcc -v
Using built-in specs.                                                                                    
COLLECT_GCC=gcc                                                                                          
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabi/4.6/lto-wrapper                                       
Target: arm-linux-gnueabi                                                                                
Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.3-14' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --disable-sjlj-exceptions --with-arch=armv4t --with-float=soft --enable-checking=release --build=arm-linux-gnueabi --host=arm-linux-gnueabi --target=arm-linux-gnueabi
Thread model: posix                                                                                      
gcc version 4.6.3 (Debian 4.6.3-14)
Comment 6 Johannes Ranke 2012-12-13 22:12:45 UTC
(In reply to comment #5)

> Here is evidence that the bug may depend on the compiler. I am showing the
> output of R 2.15.2 compiled on Debian stable and unstable using the armel
> platform.

Another note. This does not depend on the locale.
Comment 7 Brian Ripley 2013-01-02 17:01:58 UTC
Still no one is submitting a minimal patch to make this work.  We cannot fix an unreproducible bug without a patch.
Comment 8 Johannes Ranke 2013-01-07 23:09:08 UTC
Created attachment 1401 [details]
Incomplete test case
Comment 9 Johannes Ranke 2013-01-07 23:12:39 UTC
> Incomplete test case

Any help appreciated, my abilities in C are very limited. I would like to test different versions of libtre, and possibly compiler versions.
Comment 10 Johannes Ranke 2013-08-28 20:57:13 UTC
I confirm that this bug is still valid, using R-patched from 2013-08-18 on the arm architecture. It may have seemed not reproducible as it depends on the environment variable:

ranke@qnap:~/tmp/R-patched/bin$ LANG= ./R -q -e 'data(Loblolly); head(Loblolly)'
> data(Loblolly); head(Loblolly)
Warning message:
In data(Loblolly) : data set 'Loblolly' not found
Error in head(Loblolly) : object 'Loblolly' not found
Execution halted

ranke@qnap:~/tmp/R-patched/bin$ LANG=de_DE.UTF-8 ./R -q -e 'data(Loblolly); head(Loblolly)'
> data(Loblolly); head(Loblolly)
   height age Seed
1    4.51   3  301
15  10.89   5  301
29  28.72  10  301
43  41.74  15  301
57  52.70  20  301
71  60.92  25  301

The problem is also obvious in the Debian build logs for armel and armhf (search for "data set"), for example:

https://buildd.debian.org/status/fetch.php?pkg=r-base&arch=armel&ver=3.0.1-6&stamp=1375156616

Other Debian architectures are not affected. The problem is the behaviour of sub, exactly as shown in the original report, but it does depend on the value of $LANG:

ranke@qnap:~/tmp/R-patched/bin$ LANG= ./R -q -e 'sub(".*\\.", "", "/path/to/file.ext")'
> sub(".*\\.", "", "/path/to/file.ext", perl=FALSE)
[1] "/path/to/fileext"

I see two ways forward: 
a) locate the bug in TRE and fix it (again, I am sorry not to be of much help here)
b) make perl=TRUE the default on ARM

Adding perl=TRUE to the relevant parts of the data() function would be pretending that the problem was solved and would cause headaches for future R users on arm. But maybe it could still be considered in combination with a warning in the regex help pages.
Comment 11 Johannes Ranke 2013-08-28 21:02:14 UTC
I am not able set the status of the bug correctly. I think it should be set to "confirmed".
Comment 12 Orion Poplawski 2014-01-31 20:39:22 UTC
I've been trying to get this fixed in Fedora.  My preferred solution would be for the tre library to accept the R modifications so that R could be built against a system tre library (which is a core goal of the Fedora project).  I've made this request here:

https://github.com/laurikari/tre/pull/14

This does not encompass all of the R changes to tre, but it does add the missing routines so that R can be compiled against it.

I built this version of the tre library on arm, built R against it, and confirmed that the problem is then fixed.

Another solution might be to merge in the current tre library into the R version as the upstream tre library does not appear to suffer from this problem.  It is also possible that some other R modification to tre causes this issue as well.
Comment 13 Radford Neal 2014-02-05 23:26:59 UTC
This bug report is probably related to a fix done in pqR (which is based on R-2.15.0), which is documented as follows in the pqR NEWS file:

    o Fixed (by a kludge, not a proper fix) a bug in the "tre" package
      for regular expression matching (eg, in sub), which shows up when
      WCHAR_MAX doesn't fit in an "int".  The kludge reduces WCHAR_MAX
      to fit, but really the "int" variables ought to be bigger.  (This
      problem showed up on a Raspberry Pi running Raspbian.)
Comment 14 Orion Poplawski 2014-02-28 18:45:01 UTC
FYI - Upstream tre has now merged in a minimal set of R changes to the tre library to allow using it: https://github.com/laurikari/tre/pull/14

If there are any other R changes to the tre library that are beneficial, please submit them upstream.
Comment 15 Brian Ripley 2014-03-03 07:41:26 UTC
R-devel has been synced with the current TRE git repository.  I believe this removes some mis-guided changes merged in around the time of R 2.12.x.

We do not have access to any of the affected systems (and no one who does has submitted a patch).  Please test.
Comment 16 Johannes Ranke 2014-03-05 13:02:39 UTC
I have disintegrated my arm system recently, but will set it up again during the next weeks. When it is back up I will be happy to test and produce arm binaries of the upcoming R 3.0.3 for Debian stable on CRAN.
Comment 17 Johannes Ranke 2014-03-11 05:38:15 UTC
I can now confirm that this is fixed in current r-devel compiled on arm, i.e. sub works as expected, regardless of the setting of the LANG variable.
Comment 18 Brian Ripley 2014-03-11 06:26:51 UTC
Thanks for the confirmation.