Bugzilla – Bug 14462
unzip() doesn't create directories properly
Last modified: 2014-02-16 11:42:54 UTC
The unzip() function (tested in 2.12.0 and 2.11.1) doesn't seem to be creating nested directories properly when extracting. Here's a simple test case:
In the shell:
[/tmp] % mkdir -p foo/bar/baz
[/tmp] % echo "hi mom" > foo/bar/baz/file.txt
[/tmp] % zip zipfile foo/bar/baz/file.txt
adding: foo/bar/baz/file.txt (stored 0%)
[/tmp] % rm -rf foo
> unzip("/tmp/zipfile.zip", "foo/bar/baz/file.txt")
Error in unzip("/tmp/zipfile.zip", "foo/bar/baz/file.txt") :
cannot open file './foo/bar/baz/file.txt': No such file or directory
> unzip("/tmp/zipfile.zip", "foo/bar/baz/file.txt", exdir="/tmp")
Error in unzip("/tmp/zipfile.zip", "foo/bar/baz/file.txt", exdir = "/tmp") :
cannot open file '/tmp/foo/bar/baz/file.txt': No such file or directory
By contrast, if I create the directory path manually, it succeeds:
> dir.create("/tmp/foo/bar/baz", recursive=TRUE)
> (unzip("/tmp/zipfile.zip", "foo/bar/baz/file.txt", exdir="/tmp"))
By way of support for why this behavior is undesirable - when unzipping a zip archive, the directory structure inside the archive is typically unknown. One could in theory use a workaround in which one queries the zip archive for all its file names, then iterates through them creating their dirname() directories, and then unzipping each from the archive. But obviously that's not a very convenient workaround.
Additionally, by way of support for why this is important - the 'ff' package uses zip as its underlying storage format, and since the built-in unzip() function doesn't work as expected, it still uses stuff like
system(paste('unzip -Z -1 "', zipfile, '"', sep=""), intern=TRUE), and manually inspects the system output to get the extracted file names, etc. Not ideal.
If the zip() and unzip() functions were updated to handle large files ( see http://sourceforge.net/projects/infozip/files/ ), then the 'zip' format is actually a surprisingly good way to manage large archive files of data, since it can access/extract individual members without seeking across the entire data file (something .tar.gz can't offer).
The third-party code used did not support zipfiles which did not contain all the appropriate directory entries, but this was patched in 2002 to create a single missing leading directory for a file entry -- this example has three. It is unclear if such zipfiles are really valid, and R only contains an internal unzip() for use on Windows, in particular to unpack Windows binary packages.
system() can always be used for an external zip or unzip utility: we will never try to duplicate all their functionality, such as support for > 2GB files. But we cannot assume that all Windows boxes have unzip.exe.
R 2.12.1 patched will shortly create more missing leading directories, including from directory entries in the file.
Thanks Brian. Glad to hear this particular issue will be resolved soon.
I do think it's prudent to improve unzip() to be functional on all [reasonable] platforms, including large-file support, but that's probably a discussion for a mailing list instead of a bug tracker.
(spam comment removed)