Bug 16716 - tar does not add any files to compressed directory
Summary: tar does not add any files to compressed directory
Status: ASSIGNED
Alias: None
Product: R
Classification: Unclassified
Component: Language (show other bugs)
Version: R-devel (trunk)
Hardware: x86_64/x64/amd64 (64-bit) Windows 64-bit
: P3 major
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2016-02-17 12:29 UTC by Thomas J. Leeper
Modified: 2016-03-04 11:00 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas J. Leeper 2016-02-17 12:29:07 UTC
The function `tar()` in current R-devel (and seemingly for many months or years) does not work. It does not work as documented with default arguments and in many cases it does not work at all.

(1) If a `files` value is specified, an empty tar archive is created and no files are added.

(2) If a `files` value is not specified, an empty tar archive is created and the function errors.

The solution: Moving one line of code, changing it slightly, and making it conditional on the value of `files`. The first line of the function should be:

if(is.null(files)) files <- list.files(recursive = TRUE, all.files = TRUE, 
        full.names = TRUE, include.dirs = TRUE)

This would replace (lines 396-387 in /src/library/utils/tar.R):

files <- list.files(files, recursive = TRUE, all.files = TRUE, 
        full.names = TRUE, include.dirs = TRUE)

The current behavior creates the error in (2) from calling `list.files(NULL)` (an error), or `list.files(filename)` (which is always `character(0)` thus adding nothing to the archive. The patch cannot simply remove the first argument, as it would then copy the empty initialized tar archive into itself.

Here's an extended example showing the problems:

d <- tempfile()
dir.create(d)
setwd(d)

# Current `tar()` function
## tar w/ specified files
write.csv(iris, file = "iris.csv")
tar("test1.tar", files = "iris.csv")
unlink("iris.csv")
untar("test1.tar", list = TRUE) # tar is empty
## [1] "/usr/bin/tar: Record size = 2 blocks"
readBin("test1.tar", what = "raw") # tar is empty
## [1] 00
unlink("test1.tar")

## tar w/o specified files
write.csv(iris, file = "iris.csv")
tar("test1.tar") # writes empty tar file; errors
## Error in list.files(files, recursive = TRUE, all.files = TRUE, full.names = TRUE,  : 
##  invalid 'path' argument
unlink("iris.csv")
dir()
# [1] "test1.tar"
unlink("test1.tar")


And here is the behavior, as documented, with the corrected function (here `tar2()`):

# corrected `tar()` function
## tar w/ specified files
write.csv(iris, file = "iris.csv")
tar2("test2.tar", files = "iris.csv")
unlink("iris.csv")
untar("test2.tar", list = TRUE) # file added correctly
## [1] "/usr/bin/tar: Record size = 13 blocks" "iris.csv"
untar("test2.tar")
dir()
## [1] "iris.csv"  "test2.tar"
unlink("test2.tar")

## tar w/o specified files
write.csv(iris, file = "iris.csv")
tar2("test2.tar")
unlink("iris.csv")
untar("test2.tar", list = TRUE)
## [1] "/usr/bin/tar: Record size = 13 blocks" "./iris.csv"
untar("test2.tar")
dir()
## [1] "iris.csv"  "test2.tar"
unlink("iris.csv")
unlink("test2.tar")



Here is sessionInfo():

R Under development (unstable) (2016-02-15 r70179)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
Comment 1 Martin Maechler 2016-03-03 17:54:36 UTC
(In reply to Thomas J. Leeper from comment #0)
> The function `tar()` in current R-devel (and seemingly for many months or
> years) does not work. It does not work as documented with default arguments
> and in many cases it does not work at all.
> 
> (1) If a `files` value is specified, an empty tar archive is created and no
> files are added.

not always: E.g.,  tar() is used by  'R CMD build ..'  and there it does work fine... and if I implement your proposed change,  'R CMD build'  is broken.


> (2) If a `files` value is not specified, an empty tar archive is created and
> the function errors.

That is of course "not according to documentation".

> 
> The solution: Moving one line of code, changing it slightly, and making it
> conditional on the value of `files`. The first line of the function should
> be:

See above, this is *not* the solution.... and two more iterations I tried did also not provide "the" solution.

One question is really if the bug is more in the documentation or more in the implementation and if 'files' should really typically be *file* names, or rather
*directory* ("folder") names :
The current (buggy code) does -- as you mention --
call

  files <- list.files(files, recursive = TRUE, all.files = TRUE,
                        full.names = TRUE, include.dirs = TRUE)

whereas the first argument of list.files() is  'path'  and gives an empty result when you pass it simple files names (from the current directory), and this is indeed the crux of the problem.

tar() assumes that 'files' can contain file names and or directory names
where list.files() only considers directory names.

So this is more complicated than hope and won't be fixed for R 3.2.4 anymore, I think.

 

> if(is.null(files)) files <- list.files(recursive = TRUE, all.files = TRUE, 
>         full.names = TRUE, include.dirs = TRUE)
> 
> This would replace (lines 396-387 in /src/library/utils/tar.R):
> 
> files <- list.files(files, recursive = TRUE, all.files = TRUE, 
>         full.names = TRUE, include.dirs = TRUE)
> 
> The current behavior creates the error in (2) from calling
> `list.files(NULL)` (an error), or `list.files(filename)` (which is always
> `character(0)` thus adding nothing to the archive. The patch cannot simply
> remove the first argument, as it would then copy the empty initialized tar
> archive into itself.
> 
> Here's an extended example showing the problems:
> 
> d <- tempfile()
> dir.create(d)
> setwd(d)
> 
> # Current `tar()` function
> ## tar w/ specified files
> write.csv(iris, file = "iris.csv")
> tar("test1.tar", files = "iris.csv")
> unlink("iris.csv")
> untar("test1.tar", list = TRUE) # tar is empty
> ## [1] "/usr/bin/tar: Record size = 2 blocks"
> readBin("test1.tar", what = "raw") # tar is empty
> ## [1] 00
> unlink("test1.tar")
> 
> ## tar w/o specified files
> write.csv(iris, file = "iris.csv")
> tar("test1.tar") # writes empty tar file; errors
> ## Error in list.files(files, recursive = TRUE, all.files = TRUE, full.names
> = TRUE,  : 
> ##  invalid 'path' argument
> unlink("iris.csv")
> dir()
> # [1] "test1.tar"
> unlink("test1.tar")
> 
> 
> And here is the behavior, as documented, with the corrected function (here
> `tar2()`):
> 
> # corrected `tar()` function
> ## tar w/ specified files
> write.csv(iris, file = "iris.csv")
> tar2("test2.tar", files = "iris.csv")
> unlink("iris.csv")
> untar("test2.tar", list = TRUE) # file added correctly
> ## [1] "/usr/bin/tar: Record size = 13 blocks" "iris.csv"
> untar("test2.tar")
> dir()
> ## [1] "iris.csv"  "test2.tar"
> unlink("test2.tar")
> 
> ## tar w/o specified files
> write.csv(iris, file = "iris.csv")
> tar2("test2.tar")
> unlink("iris.csv")
> untar("test2.tar", list = TRUE)
> ## [1] "/usr/bin/tar: Record size = 13 blocks" "./iris.csv"
> untar("test2.tar")
> dir()
> ## [1] "iris.csv"  "test2.tar"
> unlink("iris.csv")
> unlink("test2.tar")
> 
> 
> 
> Here is sessionInfo():
> 
> R Under development (unstable) (2016-02-15 r70179)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1
> 
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
> Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
> [4] LC_NUMERIC=C                            LC_TIME=English_United
> Kingdom.1252    
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
Comment 2 Thomas J. Leeper 2016-03-03 23:38:55 UTC
(In reply to Martin Maechler from comment #1)

Martin, thanks for looking into this! It does seem complicated. I guess the easiest fix is to change the documentation, at least the definition of the `files` argument, which is currently:

> A character vector of filepaths to be archived: the default is to archive all files under the current directory.

I guess that should instead say "a path to a directory"?

In the long term, I think the code should be changed but it doesn't seem like it will be a simple fix. As some more context, I came across this issue when porting some code that used `zip()` to do something similar with `tar()`. `zip()` works as expected with a `files` argument (with no default value) that is defined as:

> A character vector of recorded filepaths to be included.

It would see somewhat logical to have these two functions have similar arguments and behavior (as seems to be the intention given argument naming).
Comment 3 Martin Maechler 2016-03-04 11:00:24 UTC
I have now fixed  your '2)'  -- the default case of  'files = NULL',
but "only" for R-devel (to become R 3.3.0 in a bit more than 1 month), as R 3.2.4 is already "release candidate".


Because of the bigger issue .. I'd call the "confusion" of 'files' with 'paths'
we do need more changes including to the documentation which I did *not* change yet.. rather the "big issue" change would generalize the API and then document that.

Your point about tar() working similar to zip() is a very reasonable one and so ideally, we should adapt the implementation rather than (only) the documenation.