Bug 16243 - unzip() fails on ZIP archives with files larger than 2^32 bytes
Summary: unzip() fails on ZIP archives with files larger than 2^32 bytes
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Misc (show other bugs)
Version: R-devel (trunk)
Hardware: x86_64/x64/amd64 (64-bit) Linux-Fedora
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2015-03-09 15:58 UTC by Hannes Mühleisen
Modified: 2015-03-12 13:54 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Hannes Mühleisen 2015-03-09 15:58:13 UTC
When unpacking a .ZIP archive that contains a file that is larger than 2^32 bytes, only the first 2^32 bytes are unpacked. Bug appeared on Fedora 20, also confirmed for Mac OSX 10.10. Tested R 3.1.2 and trunk, both affected. However, works correctly on Windows (wow).


Steps to reproduce:

$ dd if=/dev/zero of=manyzeros bs=1m count=10000 # create large file
$ ls -la manyzeros
-rw-r--r--  1 hannes  staff  10485760000 Mar  9 16:19 manyzeros

$ 7za a manyzeros.zip manyzeros
$ mv manyzeros manyzeros.org

$ R -e "unzip('manyzeros.zip',exdir='.')"

$ ls -la manyzeros
-rw-r--r--  1 hannes  staff  4294967295 Mar  9 16:52 manyzeros

As you can see, the file is much smaller than before. Only the first 2^32 bytes (2^32 = 4294967295) are unpacked. The ZIP format seems to have a limitation there [1], but ?unzip states that "It does have support for files of more than 4GB" [2]. I have made the test file available online [3]. 


[1] http://en.wikipedia.org/wiki/Zip_%28file_format%29#Limits
[2] https://stat.ethz.ch/R-manual/R-devel/library/utils/html/unzip.html says 
[3] http://homepages.cwi.nl/~hannes/manyzeros.zip
Comment 1 Hannes Mühleisen 2015-03-09 17:20:42 UTC
Traced this to the call to inflate() from zlib.h (in dounzip.c/unzReadCurrentFile()), which stops reading prematurely.
Comment 2 Brian Ripley 2015-03-12 13:54:20 UTC
'files' means zip files, not the files they contain.  I'll change the comment.