Bug 16716 - tar does not add any files to compressed directory
Summary: tar does not add any files to compressed directory
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Language (show other bugs)
Version: R-devel (trunk)
Hardware: x86_64/x64/amd64 (64-bit) Windows 64-bit
: P3 major
Assignee: R-core
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-17 12:29 UTC by Thomas J. Leeper
Modified: 2020-09-17 15:35 UTC (History)
2 users (show)

See Also:


Attachments
do not skip non-directory 'files' (2.50 KB, patch)
2020-06-26 11:37 UTC, Sebastian Meyer
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas J. Leeper 2016-02-17 12:29:07 UTC
The function `tar()` in current R-devel (and seemingly for many months or years) does not work. It does not work as documented with default arguments and in many cases it does not work at all.

(1) If a `files` value is specified, an empty tar archive is created and no files are added.

(2) If a `files` value is not specified, an empty tar archive is created and the function errors.

The solution: Moving one line of code, changing it slightly, and making it conditional on the value of `files`. The first line of the function should be:

if(is.null(files)) files <- list.files(recursive = TRUE, all.files = TRUE, 
        full.names = TRUE, include.dirs = TRUE)

This would replace (lines 396-387 in /src/library/utils/tar.R):

files <- list.files(files, recursive = TRUE, all.files = TRUE, 
        full.names = TRUE, include.dirs = TRUE)

The current behavior creates the error in (2) from calling `list.files(NULL)` (an error), or `list.files(filename)` (which is always `character(0)` thus adding nothing to the archive. The patch cannot simply remove the first argument, as it would then copy the empty initialized tar archive into itself.

Here's an extended example showing the problems:

d <- tempfile()
dir.create(d)
setwd(d)

# Current `tar()` function
## tar w/ specified files
write.csv(iris, file = "iris.csv")
tar("test1.tar", files = "iris.csv")
unlink("iris.csv")
untar("test1.tar", list = TRUE) # tar is empty
## [1] "/usr/bin/tar: Record size = 2 blocks"
readBin("test1.tar", what = "raw") # tar is empty
## [1] 00
unlink("test1.tar")

## tar w/o specified files
write.csv(iris, file = "iris.csv")
tar("test1.tar") # writes empty tar file; errors
## Error in list.files(files, recursive = TRUE, all.files = TRUE, full.names = TRUE,  : 
##  invalid 'path' argument
unlink("iris.csv")
dir()
# [1] "test1.tar"
unlink("test1.tar")


And here is the behavior, as documented, with the corrected function (here `tar2()`):

# corrected `tar()` function
## tar w/ specified files
write.csv(iris, file = "iris.csv")
tar2("test2.tar", files = "iris.csv")
unlink("iris.csv")
untar("test2.tar", list = TRUE) # file added correctly
## [1] "/usr/bin/tar: Record size = 13 blocks" "iris.csv"
untar("test2.tar")
dir()
## [1] "iris.csv"  "test2.tar"
unlink("test2.tar")

## tar w/o specified files
write.csv(iris, file = "iris.csv")
tar2("test2.tar")
unlink("iris.csv")
untar("test2.tar", list = TRUE)
## [1] "/usr/bin/tar: Record size = 13 blocks" "./iris.csv"
untar("test2.tar")
dir()
## [1] "iris.csv"  "test2.tar"
unlink("iris.csv")
unlink("test2.tar")



Here is sessionInfo():

R Under development (unstable) (2016-02-15 r70179)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
Comment 1 Martin Maechler 2016-03-03 17:54:36 UTC
(In reply to Thomas J. Leeper from comment #0)
> The function `tar()` in current R-devel (and seemingly for many months or
> years) does not work. It does not work as documented with default arguments
> and in many cases it does not work at all.
> 
> (1) If a `files` value is specified, an empty tar archive is created and no
> files are added.

not always: E.g.,  tar() is used by  'R CMD build ..'  and there it does work fine... and if I implement your proposed change,  'R CMD build'  is broken.


> (2) If a `files` value is not specified, an empty tar archive is created and
> the function errors.

That is of course "not according to documentation".

> 
> The solution: Moving one line of code, changing it slightly, and making it
> conditional on the value of `files`. The first line of the function should
> be:

See above, this is *not* the solution.... and two more iterations I tried did also not provide "the" solution.

One question is really if the bug is more in the documentation or more in the implementation and if 'files' should really typically be *file* names, or rather
*directory* ("folder") names :
The current (buggy code) does -- as you mention --
call

  files <- list.files(files, recursive = TRUE, all.files = TRUE,
                        full.names = TRUE, include.dirs = TRUE)

whereas the first argument of list.files() is  'path'  and gives an empty result when you pass it simple files names (from the current directory), and this is indeed the crux of the problem.

tar() assumes that 'files' can contain file names and or directory names
where list.files() only considers directory names.

So this is more complicated than hope and won't be fixed for R 3.2.4 anymore, I think.

 

> if(is.null(files)) files <- list.files(recursive = TRUE, all.files = TRUE, 
>         full.names = TRUE, include.dirs = TRUE)
> 
> This would replace (lines 396-387 in /src/library/utils/tar.R):
> 
> files <- list.files(files, recursive = TRUE, all.files = TRUE, 
>         full.names = TRUE, include.dirs = TRUE)
> 
> The current behavior creates the error in (2) from calling
> `list.files(NULL)` (an error), or `list.files(filename)` (which is always
> `character(0)` thus adding nothing to the archive. The patch cannot simply
> remove the first argument, as it would then copy the empty initialized tar
> archive into itself.
> 
> Here's an extended example showing the problems:
> 
> d <- tempfile()
> dir.create(d)
> setwd(d)
> 
> # Current `tar()` function
> ## tar w/ specified files
> write.csv(iris, file = "iris.csv")
> tar("test1.tar", files = "iris.csv")
> unlink("iris.csv")
> untar("test1.tar", list = TRUE) # tar is empty
> ## [1] "/usr/bin/tar: Record size = 2 blocks"
> readBin("test1.tar", what = "raw") # tar is empty
> ## [1] 00
> unlink("test1.tar")
> 
> ## tar w/o specified files
> write.csv(iris, file = "iris.csv")
> tar("test1.tar") # writes empty tar file; errors
> ## Error in list.files(files, recursive = TRUE, all.files = TRUE, full.names
> = TRUE,  : 
> ##  invalid 'path' argument
> unlink("iris.csv")
> dir()
> # [1] "test1.tar"
> unlink("test1.tar")
> 
> 
> And here is the behavior, as documented, with the corrected function (here
> `tar2()`):
> 
> # corrected `tar()` function
> ## tar w/ specified files
> write.csv(iris, file = "iris.csv")
> tar2("test2.tar", files = "iris.csv")
> unlink("iris.csv")
> untar("test2.tar", list = TRUE) # file added correctly
> ## [1] "/usr/bin/tar: Record size = 13 blocks" "iris.csv"
> untar("test2.tar")
> dir()
> ## [1] "iris.csv"  "test2.tar"
> unlink("test2.tar")
> 
> ## tar w/o specified files
> write.csv(iris, file = "iris.csv")
> tar2("test2.tar")
> unlink("iris.csv")
> untar("test2.tar", list = TRUE)
> ## [1] "/usr/bin/tar: Record size = 13 blocks" "./iris.csv"
> untar("test2.tar")
> dir()
> ## [1] "iris.csv"  "test2.tar"
> unlink("iris.csv")
> unlink("test2.tar")
> 
> 
> 
> Here is sessionInfo():
> 
> R Under development (unstable) (2016-02-15 r70179)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1
> 
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
> Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
> [4] LC_NUMERIC=C                            LC_TIME=English_United
> Kingdom.1252    
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
Comment 2 Thomas J. Leeper 2016-03-03 23:38:55 UTC
(In reply to Martin Maechler from comment #1)

Martin, thanks for looking into this! It does seem complicated. I guess the easiest fix is to change the documentation, at least the definition of the `files` argument, which is currently:

> A character vector of filepaths to be archived: the default is to archive all
> files under the current directory.

I guess that should instead say "a path to a directory"?

In the long term, I think the code should be changed but it doesn't seem like it will be a simple fix. As some more context, I came across this issue when porting some code that used `zip()` to do something similar with `tar()`. `zip()` works as expected with a `files` argument (with no default value) that is defined as:

> A character vector of recorded filepaths to be included.

It would see somewhat logical to have these two functions have similar arguments and behavior (as seems to be the intention given argument naming).
Comment 3 Martin Maechler 2016-03-04 11:00:24 UTC
I have now fixed  your '2)'  -- the default case of  'files = NULL',
but "only" for R-devel (to become R 3.3.0 in a bit more than 1 month), as R 3.2.4 is already "release candidate".


Because of the bigger issue .. I'd call the "confusion" of 'files' with 'paths'
we do need more changes including to the documentation which I did *not* change yet.. rather the "big issue" change would generalize the API and then document that.

Your point about tar() working similar to zip() is a very reasonable one and so ideally, we should adapt the implementation rather than (only) the documenation.
Comment 4 Sebastian Meyer 2020-06-25 18:13:50 UTC
This bug report is still relevant. tar() has the argument

> files: A character vector of filepaths to be archived

but the internal tar() method silently ignores any *non-directory* 'files'.
For example,

setwd(tempdir())
file.create("afile")
tar("test1.tar", files = "afile")
untar("test1.tar", list = TRUE)  # character(0)

results in an empty archive!
In contrast, my external GNU tar does include the file:

tar("test2.tar", files = "afile", tar = "tar")
untar("test2.tar", list = TRUE)  # [1] "afile"

Another difference between the two tar methods is that the external GNU tar visibly fails if a path does not exist, whereas the internal tar simply ignores such paths (via list.files). This does not seem to be intended as the internal code contains a "file '%s' not found" warning later on.

So maybe the internal method should be amended to apply list.files() only on the dir.exists(files) subset and keep the other filepaths asis?
Comment 5 Sebastian Meyer 2020-06-26 11:37:27 UTC
Created attachment 2639 [details]
do not skip non-directory 'files'

Currently, R's default tar() simply ignores 'files' which are not paths to directories, silently, undocumented. Something needs to be done and I've made an attempt in the attached patch.

This patch for the internal tar ensures that non-directory paths in 'files' are processed. Thus, regular 'files' are included, invalid paths give warnings. I have amended the existing regression tests for this bug in reg-tests-1c.R (rather than adding new test code in reg-tests-1d.R); make check-devel passes.

I wonder if this fix is too easy to be "the" solution.
Comment 6 Martin Maechler 2020-06-27 07:27:42 UTC
(In reply to Sebastian Meyer from comment #5)
> Created attachment 2639 [details]
> do not skip non-directory 'files'
> 
...
> I wonder if this fix is too easy to be "the" solution.

Thank you Sebastian,

I plan to have a look at your patch soonish, and apply a version of it.  As a matter of fact, we have e-talked once and also thought that the solution may be simple.
Comment 7 Martin Maechler 2020-09-17 15:35:10 UTC

(In reply to Martin Maechler from comment #6)

I plan to have a look at your patch soonish, and apply a version of it. As
a matter of fact, we have e-talked once and also thought that the solution
may be simple.

that plan took a while to be effectuated ...

Finally committed (+ NEWS, +- cosmetic) as svn rev 79223 (R-devel only, to be ported after a few days)
with thanks to Sebastian