Bug 9252 - Wish: change behaviour of header in read.fwf
Summary: Wish: change behaviour of header in read.fwf
Status: NEW
Alias: None
Product: R
Classification: Unclassified
Component: Wishlist (show other bugs)
Version: old
Hardware: All All
: P5 normal
Assignee: Jitterbug compatibility account
URL:
Depends on:
Blocks:
 
Reported: 2006-09-26 05:01 UTC by Jitterbug compatibility account
Modified: 2006-09-26 05:01 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jitterbug compatibility account 2006-09-26 05:01:06 UTC
From: Gregor Gorjanc <gregor.gorjanc@bfro.uni-lj.si>
Hello!

In my opinion read.fwf()'s behaviour of header is not really useful. Say
I have the following data:

col1  col2  col3
 123   123   123
   a           b
1234    12  1234
      65.4   4.5

Now if I want to read this data into R I can not use read.table due to
missing fields.

read.table(file="test.txt")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
na.strings,  :
	line 3 did not have 3 elements

However, read.fwf() can help me.

read.fwf(file="test.txt", widths=c(5, 6, 5))
     V1     V2    V3
1 col1   col2   col3
2  123    123    123
3    a             b
4 1234     12   1234
5        65.4    4.5

Upps, I need to specify header and help page says that header fields
must be separated by sep. sep part of help page says

     sep: character; the separator used internally; should be a
          character that does not occur in the file (except in the
          header).

This is quite limiting because I never know in advance which characters
do not occur in a datafile and if I do, I have to  properly modify
header in the file before import. Naive use of read.fwf returns an error

read.fwf(file="test.txt", widths=c(5, 6, 5), header=TRUE, sep=" ")
Error in read.table(file = FILE, header = header, sep = sep, as.is =
as.is,  :
	more columns than column names

read.fwf(file="test.txt", widths=c(5, 6, 5), header=TRUE, sep="  ")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
na.strings,  :
	invalid 'sep' value: must be one byte

I get lost in reading source of read.fwf, but I think that the following
idea should be easy to implement and it would be also similar to
read.table behaviour.

<ideaCode>

if(header) {
  ## sep is from read.fwf call
  header <- unlist(strsplit(readLines(con=file, n=1), split=sep))
}
...
## tweaks related to issues with length(header), row.names, ncol(), ...
read.table(..., col.names=header, ...)

</ideaCode>

I know that FWF is not used much these days, but I would find proposed
change really useful.

-- 
Lep pozdrav / With regards,
    Gregor Gorjanc
----------------------------------------------------------------------
University of Ljubljana     PhD student
Biotechnical Faculty
Zootechnical Department     URI: http://www.bfro.uni-lj.si/MR/ggorjan
Groblje 3                   mail: gregor.gorjanc <at> bfro.uni-lj.si

SI-1230 Domzale             tel: +386 (0)1 72 17 861
Slovenia, Europe            fax: +386 (0)1 72 17 888

----------------------------------------------------------------------
"One must learn by doing the thing; for though you think you know it,
 you have no certainty until you try." Sophocles ~ 450 B.C.

Comment 1 Jitterbug compatibility account 2006-11-29 00:24:00 UTC
NOTES:
 Easy to read the header separately!
Comment 2 Jitterbug compatibility account 2006-11-29 01:24:41 UTC
Audit (from Jitterbug):
Mon Oct  2 20:02:18 2006	thomas	moved from incoming to wishlist
Tue Nov 28 19:24:41 2006	ripley	changed notes