Bug 15971 - Inconsistent treatment of character vectors with read.table or read.csv
Summary: Inconsistent treatment of character vectors with read.table or read.csv
Status: NEW
Alias: None
Product: R
Classification: Unclassified
Component: I/O (show other bugs)
Version: R 3.1.1
Hardware: x86_64/x64/amd64 (64-bit) Windows 64-bit
: P5 major
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2014-09-10 21:55 UTC by Joe Ritter
Modified: 2014-09-10 21:55 UTC (History)
0 users

See Also:


Attachments
tiny csv file 1 (65 bytes, application/vnd.ms-excel)
2014-09-10 21:55 UTC, Joe Ritter
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Joe Ritter 2014-09-10 21:55:33 UTC
Created attachment 1657 [details]
tiny csv file 1

I attach a tiny .csv file, na1.csv. I created na2.csv by editing out the first column of na1.csv.  (I can only attach one file, but I have pasted the contents below.)  

na1.csv ====================
a, b,  c
1, "b", 1
2, "", 2
 , "b", 3
4,    , 4
5, "NA", 5
===========================

na2.csv ===================
b,  c
"b", 1
"", 2
"b", 3
   , 4
"NA", 5
==========================




Here is what I get when I read them into dataframes:

> df1 <- read.csv("na1.csv")
> df1
   a    b c
1  1    b 1
2  2      2
3 NA    b 3
4  4      4
5  5   NA 5
> df2 <- read.csv("na2.csv")
> df2
     b c
1    b 1
2      2
3    b 3
4      4
5 <NA> 5
> df1$b==df2$b
Error in Ops.factor(df1$b, df2$b) : level sets of factors are different
> levels(df1$b)
[1] " "    "    " " b"   " NA" 
> levels(df2$b)
[1] ""    "   " "b"  

If I read them with as.is=TRUE, I again get the extra spaces in df1$b. Also, again, df1$b[5] is " NA" rather than NA.

I can't see why this would be "correct" behavior.  I apologize if I've missed something here.

Thanks for your great work on R!

Best regards,

Joe Ritter