The function `tar()` in current R-devel (and seemingly for many months or years) does not work. It does not work as documented with default arguments and in many cases it does not work at all. (1) If a `files` value is specified, an empty tar archive is created and no files are added. (2) If a `files` value is not specified, an empty tar archive is created and the function errors. The solution: Moving one line of code, changing it slightly, and making it conditional on the value of `files`. The first line of the function should be: if(is.null(files)) files <- list.files(recursive = TRUE, all.files = TRUE, full.names = TRUE, include.dirs = TRUE) This would replace (lines 396-387 in /src/library/utils/tar.R): files <- list.files(files, recursive = TRUE, all.files = TRUE, full.names = TRUE, include.dirs = TRUE) The current behavior creates the error in (2) from calling `list.files(NULL)` (an error), or `list.files(filename)` (which is always `character(0)` thus adding nothing to the archive. The patch cannot simply remove the first argument, as it would then copy the empty initialized tar archive into itself. Here's an extended example showing the problems: d <- tempfile() dir.create(d) setwd(d) # Current `tar()` function ## tar w/ specified files write.csv(iris, file = "iris.csv") tar("test1.tar", files = "iris.csv") unlink("iris.csv") untar("test1.tar", list = TRUE) # tar is empty ## [1] "/usr/bin/tar: Record size = 2 blocks" readBin("test1.tar", what = "raw") # tar is empty ## [1] 00 unlink("test1.tar") ## tar w/o specified files write.csv(iris, file = "iris.csv") tar("test1.tar") # writes empty tar file; errors ## Error in list.files(files, recursive = TRUE, all.files = TRUE, full.names = TRUE, : ## invalid 'path' argument unlink("iris.csv") dir() # [1] "test1.tar" unlink("test1.tar") And here is the behavior, as documented, with the corrected function (here `tar2()`): # corrected `tar()` function ## tar w/ specified files write.csv(iris, file = "iris.csv") tar2("test2.tar", files = "iris.csv") unlink("iris.csv") untar("test2.tar", list = TRUE) # file added correctly ## [1] "/usr/bin/tar: Record size = 13 blocks" "iris.csv" untar("test2.tar") dir() ## [1] "iris.csv" "test2.tar" unlink("test2.tar") ## tar w/o specified files write.csv(iris, file = "iris.csv") tar2("test2.tar") unlink("iris.csv") untar("test2.tar", list = TRUE) ## [1] "/usr/bin/tar: Record size = 13 blocks" "./iris.csv" untar("test2.tar") dir() ## [1] "iris.csv" "test2.tar" unlink("iris.csv") unlink("test2.tar") Here is sessionInfo(): R Under development (unstable) (2016-02-15 r70179) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base
(In reply to Thomas J. Leeper from comment #0) > The function `tar()` in current R-devel (and seemingly for many months or > years) does not work. It does not work as documented with default arguments > and in many cases it does not work at all. > > (1) If a `files` value is specified, an empty tar archive is created and no > files are added. not always: E.g., tar() is used by 'R CMD build ..' and there it does work fine... and if I implement your proposed change, 'R CMD build' is broken. > (2) If a `files` value is not specified, an empty tar archive is created and > the function errors. That is of course "not according to documentation". > > The solution: Moving one line of code, changing it slightly, and making it > conditional on the value of `files`. The first line of the function should > be: See above, this is *not* the solution.... and two more iterations I tried did also not provide "the" solution. One question is really if the bug is more in the documentation or more in the implementation and if 'files' should really typically be *file* names, or rather *directory* ("folder") names : The current (buggy code) does -- as you mention -- call files <- list.files(files, recursive = TRUE, all.files = TRUE, full.names = TRUE, include.dirs = TRUE) whereas the first argument of list.files() is 'path' and gives an empty result when you pass it simple files names (from the current directory), and this is indeed the crux of the problem. tar() assumes that 'files' can contain file names and or directory names where list.files() only considers directory names. So this is more complicated than hope and won't be fixed for R 3.2.4 anymore, I think. > if(is.null(files)) files <- list.files(recursive = TRUE, all.files = TRUE, > full.names = TRUE, include.dirs = TRUE) > > This would replace (lines 396-387 in /src/library/utils/tar.R): > > files <- list.files(files, recursive = TRUE, all.files = TRUE, > full.names = TRUE, include.dirs = TRUE) > > The current behavior creates the error in (2) from calling > `list.files(NULL)` (an error), or `list.files(filename)` (which is always > `character(0)` thus adding nothing to the archive. The patch cannot simply > remove the first argument, as it would then copy the empty initialized tar > archive into itself. > > Here's an extended example showing the problems: > > d <- tempfile() > dir.create(d) > setwd(d) > > # Current `tar()` function > ## tar w/ specified files > write.csv(iris, file = "iris.csv") > tar("test1.tar", files = "iris.csv") > unlink("iris.csv") > untar("test1.tar", list = TRUE) # tar is empty > ## [1] "/usr/bin/tar: Record size = 2 blocks" > readBin("test1.tar", what = "raw") # tar is empty > ## [1] 00 > unlink("test1.tar") > > ## tar w/o specified files > write.csv(iris, file = "iris.csv") > tar("test1.tar") # writes empty tar file; errors > ## Error in list.files(files, recursive = TRUE, all.files = TRUE, full.names > = TRUE, : > ## invalid 'path' argument > unlink("iris.csv") > dir() > # [1] "test1.tar" > unlink("test1.tar") > > > And here is the behavior, as documented, with the corrected function (here > `tar2()`): > > # corrected `tar()` function > ## tar w/ specified files > write.csv(iris, file = "iris.csv") > tar2("test2.tar", files = "iris.csv") > unlink("iris.csv") > untar("test2.tar", list = TRUE) # file added correctly > ## [1] "/usr/bin/tar: Record size = 13 blocks" "iris.csv" > untar("test2.tar") > dir() > ## [1] "iris.csv" "test2.tar" > unlink("test2.tar") > > ## tar w/o specified files > write.csv(iris, file = "iris.csv") > tar2("test2.tar") > unlink("iris.csv") > untar("test2.tar", list = TRUE) > ## [1] "/usr/bin/tar: Record size = 13 blocks" "./iris.csv" > untar("test2.tar") > dir() > ## [1] "iris.csv" "test2.tar" > unlink("iris.csv") > unlink("test2.tar") > > > > Here is sessionInfo(): > > R Under development (unstable) (2016-02-15 r70179) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 7 x64 (build 7601) Service Pack 1 > > locale: > [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United > Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 > [4] LC_NUMERIC=C LC_TIME=English_United > Kingdom.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base
(In reply to Martin Maechler from comment #1) Martin, thanks for looking into this! It does seem complicated. I guess the easiest fix is to change the documentation, at least the definition of the `files` argument, which is currently: > A character vector of filepaths to be archived: the default is to archive all > files under the current directory. I guess that should instead say "a path to a directory"? In the long term, I think the code should be changed but it doesn't seem like it will be a simple fix. As some more context, I came across this issue when porting some code that used `zip()` to do something similar with `tar()`. `zip()` works as expected with a `files` argument (with no default value) that is defined as: > A character vector of recorded filepaths to be included. It would see somewhat logical to have these two functions have similar arguments and behavior (as seems to be the intention given argument naming).
I have now fixed your '2)' -- the default case of 'files = NULL', but "only" for R-devel (to become R 3.3.0 in a bit more than 1 month), as R 3.2.4 is already "release candidate". Because of the bigger issue .. I'd call the "confusion" of 'files' with 'paths' we do need more changes including to the documentation which I did *not* change yet.. rather the "big issue" change would generalize the API and then document that. Your point about tar() working similar to zip() is a very reasonable one and so ideally, we should adapt the implementation rather than (only) the documenation.
This bug report is still relevant. tar() has the argument > files: A character vector of filepaths to be archived but the internal tar() method silently ignores any *non-directory* 'files'. For example, setwd(tempdir()) file.create("afile") tar("test1.tar", files = "afile") untar("test1.tar", list = TRUE) # character(0) results in an empty archive! In contrast, my external GNU tar does include the file: tar("test2.tar", files = "afile", tar = "tar") untar("test2.tar", list = TRUE) # [1] "afile" Another difference between the two tar methods is that the external GNU tar visibly fails if a path does not exist, whereas the internal tar simply ignores such paths (via list.files). This does not seem to be intended as the internal code contains a "file '%s' not found" warning later on. So maybe the internal method should be amended to apply list.files() only on the dir.exists(files) subset and keep the other filepaths asis?
Created attachment 2639 [details] do not skip non-directory 'files' Currently, R's default tar() simply ignores 'files' which are not paths to directories, silently, undocumented. Something needs to be done and I've made an attempt in the attached patch. This patch for the internal tar ensures that non-directory paths in 'files' are processed. Thus, regular 'files' are included, invalid paths give warnings. I have amended the existing regression tests for this bug in reg-tests-1c.R (rather than adding new test code in reg-tests-1d.R); make check-devel passes. I wonder if this fix is too easy to be "the" solution.
(In reply to Sebastian Meyer from comment #5) > Created attachment 2639 [details] > do not skip non-directory 'files' > ... > I wonder if this fix is too easy to be "the" solution. Thank you Sebastian, I plan to have a look at your patch soonish, and apply a version of it. As a matter of fact, we have e-talked once and also thought that the solution may be simple.
(In reply to Martin Maechler from comment #6)
I plan to have a look at your patch soonish, and apply a version of it. As a matter of fact, we have e-talked once and also thought that the solution may be simple.
that plan took a while to be effectuated ...
Finally committed (+ NEWS, +- cosmetic) as svn rev 79223 (R-devel only, to be ported after a few days) with thanks to Sebastian