Bug 14462 - unzip() doesn't create directories properly
Summary: unzip() doesn't create directories properly
Alias: None
Product: R
Classification: Unclassified
Component: I/O (show other bugs)
Version: R 2.12.0
Hardware: Other Mac OS X v10.6
: P5 minor
Assignee: R-core
Depends on:
Reported: 2010-12-16 16:52 UTC by Ken Williams
Modified: 2014-02-16 11:42 UTC (History)
1 user (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Ken Williams 2010-12-16 16:52:43 UTC
The unzip() function (tested in 2.12.0 and 2.11.1) doesn't seem to be creating nested directories properly when extracting.  Here's a simple test case:

In the shell:
[/tmp] % mkdir -p foo/bar/baz
[/tmp] % echo "hi mom" > foo/bar/baz/file.txt
[/tmp] % zip zipfile foo/bar/baz/file.txt 
  adding: foo/bar/baz/file.txt (stored 0%)
[/tmp] % rm -rf foo

In R:
> unzip("/tmp/zipfile.zip", "foo/bar/baz/file.txt")
Error in unzip("/tmp/zipfile.zip", "foo/bar/baz/file.txt") : 
  cannot open file './foo/bar/baz/file.txt': No such file or directory

> unzip("/tmp/zipfile.zip", "foo/bar/baz/file.txt", exdir="/tmp")
Error in unzip("/tmp/zipfile.zip", "foo/bar/baz/file.txt", exdir = "/tmp") : 
  cannot open file '/tmp/foo/bar/baz/file.txt': No such file or directory

By contrast, if I create the directory path manually, it succeeds:

In R:
> dir.create("/tmp/foo/bar/baz", recursive=TRUE)
> (unzip("/tmp/zipfile.zip", "foo/bar/baz/file.txt", exdir="/tmp"))
[1] "/tmp/foo/bar/baz/file.txt"

Comment 1 Ken Williams 2010-12-16 21:05:23 UTC
By way of support for why this behavior is undesirable - when unzipping a zip archive, the directory structure inside the archive is typically unknown.  One could in theory use a workaround in which one queries the zip archive for all its file names, then iterates through them creating their dirname() directories, and then unzipping each from the archive.  But obviously that's not a very convenient workaround.
Comment 2 Ken Williams 2011-01-03 16:46:22 UTC
Additionally, by way of support for why this is important - the 'ff' package uses zip as its underlying storage format, and since the built-in unzip() function doesn't work as expected, it still uses stuff like 
system(paste('unzip -Z -1 "', zipfile, '"', sep=""), intern=TRUE), and manually inspects the system output to get the extracted file names, etc.  Not ideal.

If the zip() and unzip() functions were updated to handle large files ( see http://sourceforge.net/projects/infozip/files/ ), then the 'zip' format is actually a surprisingly good way to manage large archive files of data, since it can access/extract individual members without seeking across the entire data file (something .tar.gz can't offer).
Comment 3 Brian Ripley 2011-01-12 12:10:24 UTC
The third-party code used did not support zipfiles which did not contain all the appropriate directory entries, but this was patched in 2002 to create a single missing leading directory for a file entry -- this example has three.  It is unclear if such zipfiles are really valid, and R only contains an internal unzip() for use on Windows, in particular to unpack Windows binary packages.

system() can always be used for an external zip or unzip utility: we will never try to duplicate all their functionality, such as support for > 2GB files.  But we cannot assume that all Windows boxes have unzip.exe.

R 2.12.1 patched will shortly create more missing leading directories, including from directory entries in the file.
Comment 4 Ken Williams 2011-01-12 23:34:32 UTC
Thanks Brian.  Glad to hear this particular issue will be resolved soon.

I do think it's prudent to improve unzip() to be functional on all [reasonable] platforms, including large-file support, but that's probably a discussion for a mailing list instead of a bug tracker.
Comment 5 Jackie Rosen 2014-02-16 11:42:54 UTC
(spam comment removed)