Bugzilla – Bug 14241
read.fwf issues warning when n is specified
Last modified: 2010-04-01 14:22:53 UTC
Consider the last example from the read.fwf section in the R Reference Index,
slightly modified so the file contains four two-line records instead of one:
ff <- tempfile()
Reading only three records from the file by adding the parameter n=3 in the
read.fwf(ff, widths=list(c(1,0, 2,3), c(2,2,2)), n=3)
V1 V2 V3 V4 V5 V6 V7
1 1 NA 11 111 22 22 22
2 4 NA 44 444 55 55 55
3 6 NA 66 666 66 66 66
Note the missing 333333 and the duplicate use of the 666666 line.
Also, this call produces the warning:
last record incomplete, 1 lines discarded
and in fact any specification n>0 issues one or more of these warnings.
This behaviour is the same on Windows and Linux (2.10.1 Patched (2010-03-07 r51225)), so I assume this is an 'all platforms' feature.
Looking at the code of read.fwf, I have the impression that the line:
else thisblock <- min(buffersize, n)
may have to be replaced by:
else thisblock <- min(buffersize, recordlength)
because in the light of the next line:
raw <- readLines(file, n = thisblock)
which reads a number of lines for one record, the parameter n being the number of lines in one record makes more sense than n being the number of records left to read.
This patch solves the above problems (both data frame values and warnings), but I don't have the expertise to do regression tests.
I would appreciate it if this patch is taken along in the upcoming 2.11 release.
I've confirmed the bug. The fix is not what was suggested; the problem was it read n lines of text, when it should have read n records of text.
I'll commit the changes to R-devel and R-11-branch.