Bug 17182 - write.table of matrix of 2^31 elements fails with wrong message
Summary: write.table of matrix of 2^31 elements fails with wrong message
Status: UNCONFIRMED
Alias: None
Product: R
Classification: Unclassified
Component: I/O (show other bugs)
Version: R 3.3.0
Hardware: Other Linux-Debian
: P5 minor
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2016-11-17 16:40 UTC by Suharto Anggono
Modified: 2016-11-17 16:40 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Suharto Anggono 2016-11-17 16:40:55 UTC
This is an example, in RStudio in Data Scientist Workbench.

R> m <- matrix(raw(1), 2^30, 2)

The following failed after not long time.

R> write.table(m, file="m.txt", row.names=FALSE, col.names=FALSE)
Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol,  : 
  corrupt matrix -- dims not not match length

The message "corrupt matrix" is wrong. But, "not not match" is "match"?

R> prod(dim(m)) == length(m)
[1] TRUE

R> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux stretch/sid

locale:
 [1] LC_CTYPE=en_US.UTF-8      
 [2] LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8      
 [8] LC_NAME=C                 
 [9] LC_ADDRESS=C              
[10] LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8
[12] LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] SparkR_1.6.1

loaded via a namespace (and not attached):
[1] tools_3.3.1


In R devel r71661 source at
https://svn.r-project.org/R/trunk/src/library/utils/src/io.c ,
the error message comes from this part of function 'writetable'.
	if(XLENGTH(x) != (R_len_t)nr * nc)
	    error(_("corrupt matrix -- dims not not match length"));

Because R_len_t is int,
(R_len_t)nr * nc
overflows in the above example.