Bug 14114 - Crash with Unicode and sub
Crash with Unicode and sub
Status: CLOSED FIXED
Product: R
Classification: Unclassified
Component: Low-level
old
ix86 (32-bit) Windows 32-bit
: P5 normal
Assigned To: Jitterbug compatibility account
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-12-07 17:10 UTC by Jitterbug compatibility account
Modified: 2009-12-08 20:31 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jitterbug compatibility account 2009-12-07 17:10:22 UTC
From: g.russell@eos-solutions.com
Full_Name: George Russell
Version: 2.10.0
OS: Windows XP Version 2002 SP 2
Submission from: (NULL) (217.111.3.131)


The following typed into R --vanilla induces a crash:
-- cut here --
gctorture()
u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2))))
v <- rep(u,1e2)
v <- sub(" ","",v)
v %in% ""
-- cut here --

sessionInfo() says:

-- cut here --
R version 2.10.0 (2009-10-26) 
i386-pc-mingw32 

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base
-- cut here --

I apologise for not testing this with R-2.10.1 but as far as I can see there are
only source releases available so far, which I am not able to compile.

Best wishes and thanks,

George Russell

Comment 1 Jitterbug compatibility account 2009-12-08 16:24:50 UTC
From: Peter Dalgaard <P.Dalgaard@biostat.ku.dk>
g.russell@eos-solutions.com wrote:
> Full_Name: George Russell
> Version: 2.10.0
> OS: Windows XP Version 2002 SP 2
> Submission from: (NULL) (217.111.3.131)
> 
> 
> The following typed into R --vanilla induces a crash:
> -- cut here --
> gctorture()
> u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2))))
> v <- rep(u,1e2)
> v <- sub(" ","",v)
> v %in% ""
> -- cut here --
> 
> sessionInfo() says:
> 
> -- cut here --
> R version 2.10.0 (2009-10-26) 
> i386-pc-mingw32 
> 
> locale:
> [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
> [5] LC_TIME=German_Germany.1252    
> 
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     methods   base
> -- cut here --
> 
> I apologise for not testing this with R-2.10.1 but as far as I can see there are
> only source releases available so far, which I am not able to compile.
>

2.10.1 RC is available now. Please check. It does seem to be
reproducible in the Windows version, or at least it takes a very long
time, but that means running under Wine on SUSE for me. I don't see the
effect with the Linux build.

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)              FAX: (+45) 35327907

Comment 2 Jitterbug compatibility account 2009-12-08 17:09:33 UTC
From: g.russell@eos-solutions.com
Dear Peter Dalgaard,

I have now installed R-2.10.1 RC (sessionInfo() says "R version 2.10.1 RC 
(2009-12-06 r50684)", the rest I believe is as before). The following code 
always brings R --vanilla down (with a crash, not a normal exit):
-- cut here --
gctorture()
u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2))))
v <- rep(u,1e2)
v <- sub(" ","",v)
v %in% ""
q()
-- cut here --

I've tried this several times now, with different effects. Sometimes R 
crashes after 'v %in% ""'. Sometimes it survives that command, but crashes 
during the q(). I have also had the error message "Fehler in match(x, 
table, nomatch = 0L) > 0L : Vergleich (6) ist nur für atomare und 
Listentypen möglich" from that command (the match seems to be the 
problem), when I type q() R still crashes. 

Best wishes,

George Russell | KG EOS Holding GmbH & Co

Tel: +49 40 2850 – 1574  | g.russell@eos-solutions.com

EOS. With head and heart in finance

KG EOS Holding GmbH & Co | Steindamm 71, 20099 Hamburg | AG Hamburg HRA 95 
748
Persönlich haftend | EOS Holding GmbH | AG Hamburg HRB 78 748
Geschäftsführer | Hans-Werner Scherer, Klaus Engberding, Justus 
Hecking-Veltman, Paul Leary sen., Christos Savvides, Dr. Andreas Witzig
Vorsitzender des Beirates | Jürgen Schulte-Laggenbeck 

Save a tree. Don’t print this email unless it’s really necessary.

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte 
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail 
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und 
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte 
Weitergabe dieser Mail ist nicht gestattet.

This email may contain confidential and/or privileged information.
If you are not the intended recipient or have received this email in 
error, please notify the sender immediately and destroy this email.
Any unauthorized copying, disclosure or distribution of the material in 
this email is strictly forbidden.



Peter Dalgaard <P.Dalgaard@biostat.ku.dk> 
08.12.2009 11:24

An
g.russell@eos-solutions.com
Kopie
r-devel@stat.math.ethz.ch, R-bugs@r-project.org
Thema
Re: [Rd] Crash with Unicode and sub (PR#14114)






g.russell@eos-solutions.com wrote:
> Full_Name: George Russell
> Version: 2.10.0
> OS: Windows XP Version 2002 SP 2
> Submission from: (NULL) (217.111.3.131)
> 
> 
> The following typed into R --vanilla induces a crash:
> -- cut here --
> gctorture()
> u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2))))
> v <- rep(u,1e2)
> v <- sub(" ","",v)
> v %in% ""
> -- cut here --
> 
> sessionInfo() says:
> 
> -- cut here --
> R version 2.10.0 (2009-10-26) 
> i386-pc-mingw32 
> 
> locale:
> [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252 
> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C 
> [5] LC_TIME=German_Germany.1252 
> 
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     methods   base
> -- cut here --
> 
> I apologise for not testing this with R-2.10.1 but as far as I can see 

there are
> only source releases available so far, which I am not able to compile.
>


2.10.1 RC is available now. Please check. It does seem to be
reproducible in the Windows version, or at least it takes a very long
time, but that means running under Wine on SUSE for me. I don't see the
effect with the Linux build.

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)              FAX: (+45) 35327907



Comment 3 Jitterbug compatibility account 2009-12-08 20:31:57 UTC
From: Martin Maechler <maechler@stat.math.ethz.ch>
>>>>> "PD" == Peter Dalgaard <P.Dalgaard@biostat.ku.dk>
>>>>>     on Tue, 08 Dec 2009 11:24:50 +0100 writes:

    PD> g.russell@eos-solutions.com wrote:
    >> Full_Name: George Russell
    >> Version: 2.10.0
    >> OS: Windows XP Version 2002 SP 2
    >> Submission from: (NULL) (217.111.3.131)
    >> 
    >> 
    >> The following typed into R --vanilla induces a crash:
    >> -- cut here --
    >> gctorture()
    >> u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2))))
    >> v <- rep(u,1e2)
    >> v <- sub(" ","",v)
    >> v %in% ""
    >> -- cut here --
    >> 
    >> sessionInfo() says:
    >> 
    >> -- cut here --
    >> R version 2.10.0 (2009-10-26) 
    >> i386-pc-mingw32 
    >> 
    >> locale:
    >> [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
    >> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
    >> [5] LC_TIME=German_Germany.1252    
    >> 
    >> attached base packages:
    >> [1] stats     graphics  grDevices datasets  utils     methods   base
    >> -- cut here --
    >> 
    >> I apologise for not testing this with R-2.10.1 but as far as I can see there are
    >> only source releases available so far, which I am not able to compile.
    >> 

    PD> 2.10.1 RC is available now. Please check. 

I just did, on our
 "Windows Server 2003 R2 \\ Standard x64 edition" 

with 
   > sessionInfo()
   R version 2.10.1 RC (2009-12-06 r50684) 
   i386-pc-mingw32 

   locale:
   [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252   
   [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C                       
   [5] LC_TIME=German_Switzerland.1252    

   attached base packages:
   [1] stats     graphics  grDevices utils     datasets  methods   base     
   > 

It does "crash" i.e. you get a popup window about an exception
with a hex code.
And indeed, I don't see a problem in Linux.

Martin Maechler, ETH Zurich

    PD> It does seem to be
    PD> reproducible in the Windows version, or at least it takes a very long
    PD> time, but that means running under Wine on SUSE for me. I don't see the
    PD> effect with the Linux build.

    PD> -- 
    PD> O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
    PD> c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
    PD> (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
    PD> ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)              FAX: (+45) 35327907

    PD> ______________________________________________
    PD> R-devel@r-project.org mailing list
    PD> https://stat.ethz.ch/mailman/listinfo/r-devel

Comment 4 Jitterbug compatibility account 2009-12-09 15:35:09 UTC
From: g.russell@eos-solutions.com
Hello Peter,

I have now installed R-2.10.1 RC (sessionInfo() says "R version 2.10.1 RC 
(2009-12-06 r50684)", the rest I believe is as before). The following code 
always brings R --vanilla down (with a crash, not a normal exit):
-- cut here --
gctorture()
u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2))))
v <- rep(u,1e2)
v <- sub(" ","",v)
v %in% ""
q()
-- cut here --

I've tried this several times now, with different effects. Sometimes R 
crashes after 'v %in% ""'. Sometimes it survives that command, but crashes 
during the q(). I have also had the error message "Fehler in match(x, 
table, nomatch = 0L) > 0L : Vergleich (6) ist nur für atomare und 
Listentypen möglich" from that command (the match seems to be the 
problem), when I type q() R still crashes. 

Best wishes,

George Russell | KG EOS Holding GmbH & Co

Tel: +49 40 2850 – 1574  | g.russell@eos-solutions.com

EOS. With head and heart in finance

KG EOS Holding GmbH & Co | Steindamm 71, 20099 Hamburg | AG Hamburg HRA 95 
748
Persönlich haftend | EOS Holding GmbH | AG Hamburg HRB 78 748
Geschäftsführer | Hans-Werner Scherer, Klaus Engberding, Justus 
Hecking-Veltman, Paul Leary sen., Christos Savvides, Dr. Andreas Witzig
Vorsitzender des Beirates | Jürgen Schulte-Laggenbeck 

Save a tree. Don’t print this email unless it’s really necessary.

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte 
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail 
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und 
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte 
Weitergabe dieser Mail ist nicht gestattet.

This email may contain confidential and/or privileged information.
If you are not the intended recipient or have received this email in 
error, please notify the sender immediately and destroy this email.
Any unauthorized copying, disclosure or distribution of the material in 
this email is strictly forbidden.

Peter Dalgaard <P.Dalgaard@biostat.ku.dk> schrieb am 08.12.2009 11:24:50:

> g.russell@eos-solutions.com wrote:
> > Full_Name: George Russell
> > Version: 2.10.0
> > OS: Windows XP Version 2002 SP 2
> > Submission from: (NULL) (217.111.3.131)
> > 
> > 
> > The following typed into R --vanilla induces a crash:
> > -- cut here --
> > gctorture()
> > u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2))))
> > v <- rep(u,1e2)
> > v <- sub(" ","",v)
> > v %in% ""
> > -- cut here --
> > 
> > sessionInfo() says:
> > 
> > -- cut here --
> > R version 2.10.0 (2009-10-26) 
> > i386-pc-mingw32 
> > 
> > locale:
> > [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252 
> > [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C 
> > [5] LC_TIME=German_Germany.1252 
> > 
> > attached base packages:
> > [1] stats     graphics  grDevices datasets  utils     methods   base
> > -- cut here --
> > 
> > I apologise for not testing this with R-2.10.1 but as far as I can
> see there are
> > only source releases available so far, which I am not able to compile.
> >
> 
> 2.10.1 RC is available now. Please check. It does seem to be
> reproducible in the Windows version, or at least it takes a very long
> time, but that means running under Wine on SUSE for me. I don't see the
> effect with the Linux build.
> 
> -- 
>    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)              FAX: (+45) 35327907
> 


Comment 5 Jitterbug compatibility account 2009-12-10 13:00:36 UTC
From: Prof Brian Ripley <ripley@stats.ox.ac.uk>
It seems (from the debugger output) that this is corruption in the R 
memory allocation routines.  Such things can usually be tracked down 
via valgrind and a valgrind-instrumented build of R, but I cannot 
trigger this on any system with valgrind.  I've tried 64- and 32-bit 
versions, and Latin-1 locales as well as UTF-8.

So I am inclining to think this is Windows-specific.  One thing that 
is specific to Windows is UCS-2 (16-bit) wide characters, which might 
be the issue.  But we simply don't have the tools on Windows that we 
do on other platforms.

On Wed, 9 Dec 2009, g.russell@eos-solutions.com wrote:

> Hello Peter,
>
> I have now installed R-2.10.1 RC (sessionInfo() says "R version 2.10.1 RC
> (2009-12-06 r50684)", the rest I believe is as before). The following code
> always brings R --vanilla down (with a crash, not a normal exit):
> -- cut here --
> gctorture()
> u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2))))
> v <- rep(u,1e2)
> v <- sub(" ","",v)
> v %in% ""
> q()
> -- cut here --
>
> I've tried this several times now, with different effects. Sometimes R
> crashes after 'v %in% ""'. Sometimes it survives that command, but crashes
> during the q(). I have also had the error message "Fehler in match(x,
> table, nomatch = 0L) > 0L : Vergleich (6) ist nur für atomare und
> Listentypen möglich" from that command (the match seems to be the
> problem), when I type q() R still crashes.
>
> Best wishes,
>
> George Russell | KG EOS Holding GmbH & Co
>
> Tel: +49 40 2850 – 1574  | g.russell@eos-solutions.com
>
> EOS. With head and heart in finance
>
> KG EOS Holding GmbH & Co | Steindamm 71, 20099 Hamburg | AG Hamburg HRA 95
> 748
> Persönlich haftend | EOS Holding GmbH | AG Hamburg HRB 78 748
> Geschäftsführer | Hans-Werner Scherer, Klaus Engberding, Justus
> Hecking-Veltman, Paul Leary sen., Christos Savvides, Dr. Andreas Witzig
> Vorsitzender des Beirates | Jürgen Schulte-Laggenbeck
>
> Save a tree. Don’t print this email unless it’s really necessary.
>
> Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
> Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
> irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
> vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
> Weitergabe dieser Mail ist nicht gestattet.
>
> This email may contain confidential and/or privileged information.
> If you are not the intended recipient or have received this email in
> error, please notify the sender immediately and destroy this email.
> Any unauthorized copying, disclosure or distribution of the material in
> this email is strictly forbidden.
>
> Peter Dalgaard <P.Dalgaard@biostat.ku.dk> schrieb am 08.12.2009 11:24:50:
>
>> g.russell@eos-solutions.com wrote:
>>> Full_Name: George Russell
>>> Version: 2.10.0
>>> OS: Windows XP Version 2002 SP 2
>>> Submission from: (NULL) (217.111.3.131)
>>>
>>>
>>> The following typed into R --vanilla induces a crash:
>>> -- cut here --
>>> gctorture()
>>> u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2))))
>>> v <- rep(u,1e2)
>>> v <- sub(" ","",v)
>>> v %in% ""
>>> -- cut here --
>>>
>>> sessionInfo() says:
>>>
>>> -- cut here --
>>> R version 2.10.0 (2009-10-26)
>>> i386-pc-mingw32
>>>
>>> locale:
>>> [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252
>>> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
>>> [5] LC_TIME=German_Germany.1252
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices datasets  utils     methods   base
>>> -- cut here --
>>>
>>> I apologise for not testing this with R-2.10.1 but as far as I can
>> see there are
>>> only source releases available so far, which I am not able to compile.
>>>
>>
>> 2.10.1 RC is available now. Please check. It does seem to be
>> reproducible in the Windows version, or at least it takes a very long
>> time, but that means running under Wine on SUSE for me. I don't see the
>> effect with the Linux build.
>>
>> --
>>    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>>   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>>  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
>> ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)              FAX: (+45) 35327907
>>
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
Comment 6 Jitterbug compatibility account 2009-12-10 14:43:41 UTC
From: g.russell@eos-solutions.com
I don't know about the technicalities, but Peter Dalgaard said the 
offending code also causes R to come to a stop using SUSE + WINE. Is it 
possible to run that lot on top of valgrind? Of course, it will probably 
take all day ...

If not, I have a  clue which might help. The problem seems to lie in the 
"sub" routine. In the original report I used
-- cut here --
gctorture()
u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2))))
v <- rep(u,1e2)
v <- sub(" ","",v)
v %in% ""
-- cut here --

I've tried reducing this a bit more. Replacing intToUtf8 with a direct 
assignment writing out the string with Unicode escapes seems to make no 
difference. The %in% can be replaced with "match", leaving the following:
-- cut here --
gctorture()
u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2))))
v <- rep(u,1e2)
v <- sub(" ","",v)
match(v,"")
-- cut here --
This also crashes R-2.10.0 and R-2.10.1 RC (2009-12-06 r50684).

The sub line is essential, so far as I can see, without it we don't get 
the crash. If we add "perl = TRUE" this seems to make no difference (there 
is still a crash). If instead we use "fixed = TRUE", the result is strange 
and differs for R-2.10.0 and R-2.10.1 RC. This is especially strange, 
because in an unbugged R, the result of v returned from sub should be the 
same either with fixed = TRUE or perl = TRUE.

R-2.10.0 pauses several seconds, then produces the enigmatic output:
> match(v,"")

  [1] 00 00 06 9d 78 9c cd 54 5d 4f 83 30 14 2d ec 9b a9 33 99 2f fe 89 65 
1a e3
 [26] c3 de 8c 26 be 38 7d d5 c7 4a af 0c 57 ca 42 cb 8c bf dc 18 93 61 29 
1d 83
 [51] 8e 7d c4 18 23 49 a1 f4 de 9e 7b 4e ef 81 47 07 21 64 23 5b 3e ec 1a 
42 56
 [76] 4b de ea b6 5c b3 e4 e8 c8 d1 f2 80 41 e4 bb 32 7e 5c 58 ae f3 49 f8 
66 a6
[101] ce b0 3b c5 1e c8 69 31 b5 15 80 98 84 84 cb e9 22 db 61 27 1b 53 4a 
80 0d
[126] 2f 0a e3 99 9c f4 91 ba 4a 41 67 8e 69 0c d7 14 73 ae d1 cc 8c 0e f7 
3d 86
[151] 45 1c 81 41 be 19 3e bf 82 2b 4c 40 ee 07 33 0a 0f 8c be a7 6f 3a 62 
ad 68
[176] af e8 12 78 c1 31 15 c8 aa 9d 9a a1 5c 89 dd 57 cb eb 27 da 14 38 f2 
20 dd
[201] 5c 45 ba c1 70 00 9b 14 35 dc 4c 6e 49 4d 51 e6 8e d3 4d 95 54 aa f1 
19 90
[226] 32 21 27 29 73 e8 26 bf 53 d6 32 71 0a 4e da 01 51 49 e3 84 48 77 ce 
81 dc
[251] 64 2d 19 ab 0d 7b 33 42 9f 99 42 64 af a7 a8 5e 9d 0f ce 86 83 a1 d9 
c1 a5
[276] 7f d0 97 86 69 16 a2 dd 54 90 a6 93 21 1f 25 ba c2 d2 54 68 25 c8 31 
49 d6
[301] ae ee 9f 2a ba d4 96 a6 89 03 60 22 83 2b 3b 17 53 3a ce 79 17 3e 96 
b5 d3
[326] ea ea b4 3b 9f 8b 9f b9 a5 cd a7 40 41 84 4c a9 ce dd dd 49 fe c6 3e 
07 7f
[351] 54 e7 7f d9 f4 50 77 5c 19 a9 e0 b9 5e b2 dd 60 79 6c 13 ad 1e 17 98 
11 1c
[376] 91 db e5 5f 7e df 0f e7 43 57 c9 7d 01 6c 3e 1a 5d 0c 2f ab 99 6e a9 
a8 88
[401] 57 1c 35 5a 7c 03 73 22 e4 b1

R-2.10.1 RC produces the following equally enigmatic output:
> match(v,"")

NULL
Fehler: 'getEncChar' muss für CHARSXP aufgerufen werden

So my provisional guess is the bug is somewhere in the part of the 
internal code for sub which is invoked whatever the value of fixed or 
perl. It is strange though that it makes a difference whether you specify 
fixed = TRUE or not.

George Russell


Prof Brian Ripley <ripley@stats.ox.ac.uk> schrieb am 10.12.2009 08:00:36:

> It seems (from the debugger output) that this is corruption in the R 
> memory allocation routines.  Such things can usually be tracked down 
> via valgrind and a valgrind-instrumented build of R, but I cannot 
> trigger this on any system with valgrind.  I've tried 64- and 32-bit 
> versions, and Latin-1 locales as well as UTF-8.
> 
> So I am inclining to think this is Windows-specific.  One thing that 
> is specific to Windows is UCS-2 (16-bit) wide characters, which might 
> be the issue.  But we simply don't have the tools on Windows that we 
> do on other platforms.
> 
> On Wed, 9 Dec 2009, g.russell@eos-solutions.com wrote:
> 
> > Hello Peter,
> >
> > I have now installed R-2.10.1 RC (sessionInfo() says "R version 2.10.1 

RC
> > (2009-12-06 r50684)", the rest I believe is as before). The following 

code
> > always brings R --vanilla down (with a crash, not a normal exit):
> > -- cut here --
> > gctorture()
> > u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2))))
> > v <- rep(u,1e2)
> > v <- sub(" ","",v)
> > v %in% ""
> > q()
> > -- cut here --
> >
> > I've tried this several times now, with different effects. Sometimes R
> > crashes after 'v %in% ""'. Sometimes it survives that command, but 

crashes
> > during the q(). I have also had the error message "Fehler in match(x,
> > table, nomatch = 0L) > 0L : Vergleich (6) ist nur für atomare und
> > Listentypen möglich" from that command (the match seems to be the
> > problem), when I type q() R still crashes.
> >
> > Best wishes,
> >
> > George Russell | KG EOS Holding GmbH & Co
> >
> > Tel: +49 40 2850 – 1574  | g.russell@eos-solutions.com
> >
> > EOS. With head and heart in finance
> >
> > KG EOS Holding GmbH & Co | Steindamm 71, 20099 Hamburg | AG Hamburg 

HRA 95
> > 748
> > Persönlich haftend | EOS Holding GmbH | AG Hamburg HRB 78 748
> > Geschäftsführer | Hans-Werner Scherer, Klaus Engberding, Justus
> > Hecking-Veltman, Paul Leary sen., Christos Savvides, Dr. Andreas 

Witzig
> > Vorsitzender des Beirates | Jürgen Schulte-Laggenbeck
> >
> > Save a tree. Don’t print this email unless it’s really necessary.
> >
> > Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
> > Informationen. Wenn Sie nicht der richtige Adressat sind oder diese 

E-Mail
> > irrtümlich erhalten haben, informieren Sie bitte sofort den Absender 

und
> > vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
> > Weitergabe dieser Mail ist nicht gestattet.
> >
> > This email may contain confidential and/or privileged information.
> > If you are not the intended recipient or have received this email in
> > error, please notify the sender immediately and destroy this email.
> > Any unauthorized copying, disclosure or distribution of the material 

in
> > this email is strictly forbidden.
> >
> > Peter Dalgaard <P.Dalgaard@biostat.ku.dk> schrieb am 08.12.2009 

11:24:50:
> >
> >> g.russell@eos-solutions.com wrote:
> >>> Full_Name: George Russell
> >>> Version: 2.10.0
> >>> OS: Windows XP Version 2002 SP 2
> >>> Submission from: (NULL) (217.111.3.131)
> >>>
> >>>
> >>> The following typed into R --vanilla induces a crash:
> >>> -- cut here --
> >>> gctorture()
> >>> u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2))))
> >>> v <- rep(u,1e2)
> >>> v <- sub(" ","",v)
> >>> v %in% ""
> >>> -- cut here --
> >>>
> >>> sessionInfo() says:
> >>>
> >>> -- cut here --
> >>> R version 2.10.0 (2009-10-26)
> >>> i386-pc-mingw32
> >>>
> >>> locale:
> >>> [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252
> >>> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
> >>> [5] LC_TIME=German_Germany.1252
> >>>
> >>> attached base packages:
> >>> [1] stats     graphics  grDevices datasets  utils     methods   base
> >>> -- cut here --
> >>>
> >>> I apologise for not testing this with R-2.10.1 but as far as I can
> >> see there are
> >>> only source releases available so far, which I am not able to 

compile.
> >>>
> >>
> >> 2.10.1 RC is available now. Please check. It does seem to be
> >> reproducible in the Windows version, or at least it takes a very long
> >> time, but that means running under Wine on SUSE for me. I don't see 

the
> >> effect with the Linux build.
> >>
> >> --
> >>    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
> >>   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
> >>  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 

35327918
> >> ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)              FAX: (+45) 

35327907
> >>
> >
> > ______________________________________________
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> 
> -- 
> Brian D. Ripley,                  ripley@stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Comment 7 Jitterbug compatibility account 2009-12-10 20:15:09 UTC
From: Peter Dalgaard <P.Dalgaard@biostat.ku.dk>
g.russell@eos-solutions.com wrote:
> SSBkb24ndCBrbm93IGFib3V0IHRoZSB0ZWNobmljYWxpdGllcywgYnV0IFBldGVyIERhbGdhYXJk
> IHNhaWQgdGhlIA0Kb2ZmZW5kaW5nIGNvZGUgYWxzbyBjYXVzZXMgUiB0byBjb21lIHRvIGEgc3Rv
> cCB1c2luZyBTVVNFICsgV0lORS4gSXMgaXQgDQpwb3NzaWJsZSB0byBydW4gdGhhdCBsb3Qgb24g
[...Argh!, Jitterbug must die....]


For those who cannot read base64 coded mails by eye, these are the
contents (an unmangled version reached r-devel, but probably not r-bugs):


I don't know about the technicalities, but Peter Dalgaard said the
offending code also causes R to come to a stop using SUSE + WINE. Is it
possible to run that lot on top of valgrind? Of course, it will probably
take all day ...

If not, I have a  clue which might help. The problem seems to lie in the
"sub" routine. In the original report I used
-- cut here --
gctorture()
u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2))))
v <- rep(u,1e2)
v <- sub(" ","",v)
v %in% ""
-- cut here --

I've tried reducing this a bit more. Replacing intToUtf8 with a direct
assignment writing out the string with Unicode escapes seems to make no
difference. The %in% can be replaced with "match", leaving the following:
-- cut here --
gctorture()
u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2))))
v <- rep(u,1e2)
v <- sub(" ","",v)
match(v,"")
-- cut here --
This also crashes R-2.10.0 and R-2.10.1 RC (2009-12-06 r50684).

The sub line is essential, so far as I can see, without it we don't get
the crash. If we add "perl = TRUE" this seems to make no difference (there
is still a crash). If instead we use "fixed = TRUE", the result is strange
and differs for R-2.10.0 and R-2.10.1 RC. This is especially strange,
because in an unbugged R, the result of v returned from sub should be the
same either with fixed = TRUE or perl = TRUE.

R-2.10.0 pauses several seconds, then produces the enigmatic output:
> match(v,"")
  [1] 00 00 06 9d 78 9c cd 54 5d 4f 83 30 14 2d ec 9b a9 33 99 2f fe 89 65
1a e3
 [26] c3 de 8c 26 be 38 7d d5 c7 4a af 0c 57 ca 42 cb 8c bf dc 18 93 61 29
1d 83
 [51] 8e 7d c4 18 23 49 a1 f4 de 9e 7b 4e ef 81 47 07 21 64 23 5b 3e ec 1a
42 56
 [76] 4b de ea b6 5c b3 e4 e8 c8 d1 f2 80 41 e4 bb 32 7e 5c 58 ae f3 49 f8
66 a6
[101] ce b0 3b c5 1e c8 69 31 b5 15 80 98 84 84 cb e9 22 db 61 27 1b 53 4a
80 0d
[126] 2f 0a e3 99 9c f4 91 ba 4a 41 67 8e 69 0c d7 14 73 ae d1 cc 8c 0e f7
3d 86
[151] 45 1c 81 41 be 19 3e bf 82 2b 4c 40 ee 07 33 0a 0f 8c be a7 6f 3a 62
ad 68
[176] af e8 12 78 c1 31 15 c8 aa 9d 9a a1 5c 89 dd 57 cb eb 27 da 14 38 f2
20 dd
[201] 5c 45 ba c1 70 00 9b 14 35 dc 4c 6e 49 4d 51 e6 8e d3 4d 95 54 aa f1
19 90
[226] 32 21 27 29 73 e8 26 bf 53 d6 32 71 0a 4e da 01 51 49 e3 84 48 77 ce
81 dc
[251] 64 2d 19 ab 0d 7b 33 42 9f 99 42 64 af a7 a8 5e 9d 0f ce 86 83 a1 d9
c1 a5
[276] 7f d0 97 86 69 16 a2 dd 54 90 a6 93 21 1f 25 ba c2 d2 54 68 25 c8 31
49 d6
[301] ae ee 9f 2a ba d4 96 a6 89 03 60 22 83 2b 3b 17 53 3a ce 79 17 3e 96
b5 d3
[326] ea ea b4 3b 9f 8b 9f b9 a5 cd a7 40 41 84 4c a9 ce dd dd 49 fe c6 3e
07 7f
[351] 54 e7 7f d9 f4 50 77 5c 19 a9 e0 b9 5e b2 dd 60 79 6c 13 ad 1e 17 98
11 1c
[376] 91 db e5 5f 7e df 0f e7 43 57 c9 7d 01 6c 3e 1a 5d 0c 2f ab 99 6e a9
a8 88
[401] 57 1c 35 5a 7c 03 73 22 e4 b1

R-2.10.1 RC produces the following equally enigmatic output:
> match(v,"")
NULL
Fehler: 'getEncChar' muss für CHARSXP aufgerufen werden

So my provisional guess is the bug is somewhere in the part of the
internal code for sub which is invoked whatever the value of fixed or
perl. It is strange though that it makes a difference whether you specify
fixed = TRUE or not.

George Russell


Prof Brian Ripley <ripley@stats.ox.ac.uk> schrieb am 10.12.2009 08:00:36:

> It seems (from the debugger output) that this is corruption in the R
> memory allocation routines.  Such things can usually be tracked down
> via valgrind and a valgrind-instrumented build of R, but I cannot
> trigger this on any system with valgrind.  I've tried 64- and 32-bit
> versions, and Latin-1 locales as well as UTF-8.
>
> So I am inclining to think this is Windows-specific.  One thing that
> is specific to Windows is UCS-2 (16-bit) wide characters, which might
> be the issue.  But we simply don't have the tools on Windows that we
> do on other platforms.
>
> On Wed, 9 Dec 2009, g.russell@eos-solutions.com wrote:
>
> > Hello Peter,
> >
> > I have now installed R-2.10.1 RC (sessionInfo() says "R version 2.10.1
RC
> > (2009-12-06 r50684)", the rest I believe is as before). The following
code
> > always brings R --vanilla down (with a crash, not a normal exit):
> > -- cut here --
> > gctorture()
> > u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2))))
> > v <- rep(u,1e2)
> > v <- sub(" ","",v)
> > v %in% ""
> > q()
> > -- cut here --
> >
> > I've tried this several times now, with different effects. Sometimes R
> > crashes after 'v %in% ""'. Sometimes it survives that command, but
crashes
> > during the q(). I have also had the error message "Fehler in match(x,
> > table, nomatch = 0L) > 0L : Vergleich (6) ist nur für atomare und
> > Listentypen möglich" from that command (the match seems to be the
> > problem), when I type q() R still crashes.
> >
> > Best wishes,
> >
> > George Russell | KG EOS Holding GmbH & Co
> >
> > Tel: +49 40 2850 – 1574  | g.russell@eos-solutions.com
> >
> > EOS. With head and heart in finance
> >
> > KG EOS Holding GmbH & Co | Steindamm 71, 20099 Hamburg | AG Hamburg
HRA 95
> > 748
> > Persönlich haftend | EOS Holding GmbH | AG Hamburg HRB 78 748
> > Geschäftsführer | Hans-Werner Scherer, Klaus Engberding, Justus
> > Hecking-Veltman, Paul Leary sen., Christos Savvides, Dr. Andreas
Witzig
> > Vorsitzender des Beirates | Jürgen Schulte-Laggenbeck
> >
> > Save a tree. Don’t print this email unless it’s really necessary.
> >
> > Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
> > Informationen. Wenn Sie nicht der richtige Adressat sind oder diese
E-Mail
> > irrtümlich erhalten haben, informieren Sie bitte sofort den Absender
und
> > vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
> > Weitergabe dieser Mail ist nicht gestattet.
> >
> > This email may contain confidential and/or privileged information.
> > If you are not the intended recipient or have received this email in
> > error, please notify the sender immediately and destroy this email.
> > Any unauthorized copying, disclosure or distribution of the material
in
> > this email is strictly forbidden.
> >
> > Peter Dalgaard <P.Dalgaard@biostat.ku.dk> schrieb am 08.12.2009
11:24:50:
> >
> >> g.russell@eos-solutions.com wrote:
> >>> Full_Name: George Russell
> >>> Version: 2.10.0
> >>> OS: Windows XP Version 2002 SP 2
> >>> Submission from: (NULL) (217.111.3.131)
> >>>
> >>>
> >>> The following typed into R --vanilla induces a crash:
> >>> -- cut here --
> >>> gctorture()
> >>> u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2))))
> >>> v <- rep(u,1e2)
> >>> v <- sub(" ","",v)
> >>> v %in% ""
> >>> -- cut here --
> >>>
> >>> sessionInfo() says:
> >>>
> >>> -- cut here --
> >>> R version 2.10.0 (2009-10-26)
> >>> i386-pc-mingw32
> >>>
> >>> locale:
> >>> [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252
> >>> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
> >>> [5] LC_TIME=German_Germany.1252
> >>>
> >>> attached base packages:
> >>> [1] stats     graphics  grDevices datasets  utils     methods   base
> >>> -- cut here --
> >>>
> >>> I apologise for not testing this with R-2.10.1 but as far as I can
> >> see there are
> >>> only source releases available so far, which I am not able to
compile.
> >>>
> >>
> >> 2.10.1 RC is available now. Please check. It does seem to be
> >> reproducible in the Windows version, or at least it takes a very long
> >> time, but that means running under Wine on SUSE for me. I don't see
the
> >> effect with the Linux build.
> >>
> >> --
> >>    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
> >>   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
> >>  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45)
35327918
> >> ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)              FAX: (+45)
35327907
> >>
> >
> > ______________________________________________
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> --
> Brian D. Ripley,                  ripley@stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595




-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)              FAX: (+45) 35327907

Comment 8 Jitterbug compatibility account 2009-12-15 14:50:00 UTC
NOTES:
 Fixed for 2.11.0.
Comment 9 Jitterbug compatibility account 2009-12-15 14:50:04 UTC
Audit (from Jitterbug):
Wed Dec  9 12:17:46 2009	ripley	changed notes
Tue Dec 15 08:50:04 2009	ripley	changed notes
Tue Dec 15 08:50:04 2009	ripley	moved from incoming to Low-level-fixed