Bug 17293 - utils::tar fails on large folders if "tar" argument is used
Summary: utils::tar fails on large folders if "tar" argument is used
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: I/O (show other bugs)
Version: 3.4.0
Hardware: x86_64/x64/amd64 (64-bit) Linux
: P5 major
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2017-06-20 19:37 UTC by meik michalke
Modified: 2017-11-06 12:13 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description meik michalke 2017-06-20 19:37:36 UTC
if you use the tar() function from utils in R 3.4 on a directory with many files (e.g., if it includes a git repository) and set the "tar" argument to anything other than an empty string, the function does nothing, not even throw a warning or error.

the problem seems to be that internally, tar() appends every single file recursively to the system call, which then gets so long that the system tar command just fails.

i managed to cat() a generated system call of tar() to a temporary file -- it was 158 kilobytes(!) long.
Comment 1 meik michalke 2017-06-22 13:39:04 UTC
here's instructions how to reproduce the problem (replace "touch" on non-unix systems):

demoDir <- file.path(tempdir(), "a_rather_long_path_name_to_make_tar_faint_more_quickly_because_it_will_repeat_this_string_for_each_file")
dir.create(demoDir)
# let's create 1000 empty dummy files
for (thisFile in 1:1000) {
  system(paste(Sys.which("touch"), file.path(demoDir, paste0("a_long_file_name_to_make_tar_faint_more_quickly_because_it_will_append_each_single_file_internally_", thisFile))))
}
# doesn't work:
tar(file.path(tempdir(), "tar_fail.tar"), files=demoDir, tar="/bin/tar")
dir(tempdir())
# still works:
tar(file.path(tempdir(), "tar_nofail.tar"), files=demoDir, tar="")
dir(tempdir())


on my system, 600 files were enough to trigger the bug. when i shorten the file names, more files are excepted.
Comment 2 meik michalke 2017-07-02 12:03:23 UTC
still broken in R 3.4.1
Comment 3 meik michalke 2017-10-05 19:37:40 UTC
still broken in R 3.4.2
Comment 4 Tomas Kalibera 2017-10-23 14:45:29 UTC
Thanks for the report, partially fixed in 73589: now an error will be reported on Unix when external command cannot be executed.
Comment 5 meik michalke 2017-10-23 18:04:43 UTC
thanks for picking this up.

i just checked with some older R versions. this problem was introduced with 3.3 (it's not present up until 3.2.5). i compared both implementations of tar(), there's only minor differences.

but obviously the replacement of "files" with a long vector of individual files ("files <- list.files(...)") was moved way up to the start of the function code, so it is evaluated before the system call now. 

you should check if that order of operations is really what was intended. if you can omit this replacement when "tar" is set, the problem should be solved.
Comment 6 Tomas Kalibera 2017-11-06 12:13:40 UTC
Please note that "tar" returns the return code from "system" when you use an external tar command. "system" returns code 127 when the command could not be run for any reason. This is documented behavior and existing code depends on it, so it can't be changed to an error. Instead, we have modified "system" to also display a warning when the command could not be executed.

The implementation of utils::tar was improved by Brian Ripley so that it (a) leaves the search for files in the specified directory to the external tar command (b) uses -T for very long file name with GNU and libarchive tar (some other tar implementations do not support it).