Bug 14241 - read.fwf issues warning when n is specified
read.fwf issues warning when n is specified
Status: RESOLVED FIXED
Product: R
Classification: Unclassified
Component: I/O
R 2.10.1 patched
ix86 (32-bit) All
: P5 normal
Assigned To: R-core
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-03-26 15:18 UTC by read.fwf
Modified: 2010-04-01 14:22 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description read.fwf 2010-03-26 15:18:49 UTC
Consider the last example from the read.fwf section in the R Reference Index,
slightly modified so the file contains four two-line records instead of one:

    ff <- tempfile()
    cat(file=ff, "111111",
                 "222222",
                 "333333",
                 "444444",
                 "555555",
                 "666666",
                 "777777",
                 "888888", sep="\n")

Reading only three records from the file by adding the parameter n=3 in the
read.fwf call:

    read.fwf(ff, widths=list(c(1,0, 2,3), c(2,2,2)), n=3)
    unlink(ff)

displays:

      V1 V2 V3  V4 V5 V6 V7
    1  1 NA 11 111 22 22 22
    2  4 NA 44 444 55 55 55
    3  6 NA 66 666 66 66 66

Note the missing 333333 and the duplicate use of the 666666 line.

Also, this call produces the warning:

    last record incomplete, 1 lines discarded

and in fact any specification n>0 issues one or more of these warnings.


This behaviour is the same on Windows and Linux (2.10.1 Patched (2010-03-07 r51225)), so I assume this is an 'all platforms' feature.

POSSIBLE SOLUTION
=================

Looking at the code of read.fwf, I have the impression that the line:

    else thisblock <- min(buffersize, n)

may have to be replaced by:

    else thisblock <- min(buffersize, recordlength)

because in the light of the next line:

    raw <- readLines(file, n = thisblock)

which reads a number of lines for one record, the parameter n being the number of lines in one record makes more sense than n being the number of records left to read.


This patch solves the above problems (both data frame values and warnings), but I don't have the expertise to do regression tests.


I would appreciate it if this patch is taken along in the upcoming 2.11 release.
Comment 1 Duncan Murdoch 2010-04-01 14:22:53 UTC
I've confirmed the bug.  The fix is not what was suggested; the problem was it read n lines of text, when it should have read n records of text.

I'll commit the changes to R-devel and R-11-branch.