res <- c(a=raw(2), raw(2^31-1))
Error: attempt to set index 18446744071562067968/2147483649 in SET_STRING_ELT
Successful example (producing correct 'res'):
res <- c(raw(2), raw(2^31-1))
res <- c(a=raw(2), raw(2^31-2))
Output of sessionInfo() :
R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)
attached base packages:
 stats graphics
 grDevices utils
 datasets methods
other attached packages:
loaded via a namespace (and not attached):
It was in RStudio, https://my.datascientistworkbench.com/tools/rstudio-ide/ .
I suspect that the culprit is this part in function 'NewExtractNames' in bind.c.
SET_STRING_ELT(data->ans_names, (data->ans_nnames)++, namei);
In function 'NewExtractNames', 'data' is a pointer to struct BindData. In struct BindData, field 'ans_nnames' has type int. When data->ans_nnames is (2^31-1) (maximum int),
leads to overflow. The overflowed value is passed to 'SET_STRING_ELT' in the next call.
It is still like that in R devel r72807.
The error message in the failed example above is also not quite right. The index that is passed to 'SET_STRING_ELT' is not 18446744071562067968.
Don't do this on your laptop/desktop unless you have 20 GB free memory !!
Confirmed indeed on latest R-devel (64 bit; Fedora 24 Linux)
> system.time(res <- c(a = raw(2), raw(2^31-1)))
Error in system.time(res <- c(a = raw(2), raw(2^31 - 1))) :
attempt to set index 18446744071562067968/2147483649 in SET_STRING_ELT
Timing stopped at: 41.63 9.898 51.86
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 375829 20.1 10000000 534.1 1095554 58.6
Vcells 805118 6.2 2581033758 19691.8 2685229083 20486.7
Your diagnosis seems correct to me.
However this just points again to the issue that bind.c's NewExtractNames()
is unnecessarily inefficient.... as you have noticed in PR#17284
I won't time/stamina to address this myself currently.
I was wrong. This is not fixed by your 2nd patch (
for PR # 17284
I'm still committing that now, so we can work to fix this bug, using the newer code.
Created attachment 2275 [details]
Against R devel 72947, bind.c
Not quite right error message from function 'SET_STRING_ELT' in memory.c, like "attempt to set index 18446744071562067968/2147483649 in SET_STRING_ELT" in the example, can be fixed by using an appropriate printf format specifier.
Created attachment 2276 [details]
Against R devel r72947, bind.c, for nameData->seqno overflow
In function 'NewExtractNames' in bind.c, another thing that potentially overflows is nameData->seqno. An example that triggers it:
res <- c(a = list(rep(c(b=raw(1)), 2^31-2), raw(2)), recursive=TRUE)
Result (36 GB of memory was used):
Error: could not allocate memory (2048 Mb) in C function 'R_AllocStringBuffer'
The example was successful when (2^31-2) was replaced by (2^31-3).
Casting 'seqno' to double doesn't result in loss of precision if seqno <= R_XLEN_T_MAX.
Alternatively, an appropriate printf format specifier can be used.
(In reply to Suharto Anggono from comment #4)
> Casting 'seqno' to double doesn't result in loss of precision if seqno <=
> Alternatively, an appropriate printf format specifier can be used.
I see that function 'VectorIndex' in printutils.c uses %ld. Using %ld in function 'NewName' in bind.c doesn't give the desired result when 'seqno' is 2^31 or more for 64-bit R on Windows. On Windows, long is 32-bit.
(In reply to Suharto Anggono from comment #5)
> (In reply to Suharto Anggono from comment #4)
> > Casting 'seqno' to double doesn't result in loss of precision if seqno <=
> > R_XLEN_T_MAX.
> > Alternatively, an appropriate printf format specifier can be used.
> I see that function 'VectorIndex' in printutils.c uses %ld. Using %ld in
> function 'NewName' in bind.c doesn't give the desired result when 'seqno' is
> 2^31 or more for 64-bit R on Windows. On Windows, long is 32-bit.
printf() specifiers: Yes, that's a bit of an unfortunate state I think.
As we require C99 nowadays for building R, I would like to use portable C99 types and PRI* format specifiers such as PRId64 from the C99 inttypes.h ,
Alternatively, but I think not more portably, we could use long long and %ll
which should be 64-bit also on 32-windows. Currently I think that finding consistent and future proof (and still nice to read)
printf format specifiers in the R sources is something that should be addressed in a wider context.
Closing the bug --
finding portable printf() format strings needs a new PR with its own subject.
Just some "final notes" here, before I forget it:
Actually, we configure check for working PRI macros,
checking for inttypes.h... yes
checking for unsigned long long int... yes
checking for inttypes.h... (cached) yes
checking whether the inttypes.h PRIxNN macros are broken... no
Now if we made this (working PRIxNN macros) a strict requirement we could use them here and in quite a few more places.
Now checked: We have this check for broken PRIxNN since 2005 (!)
in m4/gettext.m4 where the comment says
# Define PRI_MACROS_BROKEN if <inttypes.h> exists and defines the PRI*
# macros to non-string values. This is the case on AIX 4.3.3.
... so it may be time now to be able to require C compiler/library setup where
<inttypes.h> correctly implements the PRInXX macros (as by the C99 standard).