Bug 14522 - scan(strip.white=TRUE) only removes trailing white space with quote characters
scan(strip.white=TRUE) only removes trailing white space with quote characters
Status: CLOSED FIXED
Product: R
Classification: Unclassified
Component: I/O
R 2.12.2
Other Other
: P5 normal
Assigned To: R-core
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-03-07 20:58 UTC by Stefan Widgren
Modified: 2011-03-21 20:57 UTC (History)
1 user (show)

See Also:


Attachments
testcase to reproduce the error (41 bytes, application/vnd.ms-excel)
2011-03-07 20:58 UTC, Stefan Widgren
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Widgren 2011-03-07 20:58:25 UTC
Created attachment 1181 [details]
testcase to reproduce the error

strip.white doesn't remove the leading white spaces from a file with the following content:

c1;c2;c3
" A"; B;C
"A ";B ;C
"A";B;C

> testcase <- read.csv2('testcase.csv', strip.white=TRUE)
> str(testcase)
'data.frame':   3 obs. of  3 variables:
 $ c1: Factor w/ 2 levels " A","A": 1 2 2
 $ c2: Factor w/ 1 level "B": 1 1 1
 $ c3: Factor w/ 1 level "C": 1 1 1

I expected c1 to have 1 level.
Comment 1 Simon Urbanek 2011-03-07 21:35:56 UTC
This is really an issue with scan() which has an asymmetry wrt ws removal in quoted strings:

> scan(textConnection('" A"\n"A "\n" A "\n"A"'),list(""),strip.white=T)[[1]]
Read 4 records
[1] " A" "A"  " A" "A" 

And from scan sources it seems the intention is that only trailing ws are removed:

 donefill:
    /* strip trailing white space, if desired and if item is non-null */

So either the documentation needs to be adjusted to say that only trailing ws are removed or the code needs to remove leading ws as well. I'm not sure which of the two is better.
Comment 2 Brian Ripley 2011-03-07 21:44:48 UTC
DO try to write an intelligible subject line!  This is one of the
least informative we have ever seen.
Comment 3 Brian Ripley 2011-03-20 19:38:11 UTC
It is not supposed to remove white space inside quoted strings ....
Comment 4 Stefan Widgren 2011-03-21 20:57:01 UTC
Makes sense to keep quoted strings unchanged.

Regards
Stefan