Bug 16478 - read.table() argument colClasses needs to be ordered even when named
Summary: read.table() argument colClasses needs to be ordered even when named
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: I/O (show other bugs)
Version: R 3.2.0
Hardware: Other OS X Mavericks
: P5 minor
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2015-07-21 13:44 UTC by Andreas Leha
Modified: 2015-08-04 11:38 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Leha 2015-07-21 13:44:19 UTC
(This has been discussed on R-help [1])


The colClasses argument to read.table needs to be in the order of
the columns even when it is named if it is as long as there are
columns.  This is not in the documentation.

Here is a MWE:

--8<---------------cut here---------------start------------->8---
kkk <- c("a\tb",
         "3.14\tx")

## works without specifying colClasses
read.table(textConnection(kkk),
           sep="\t",
           header = TRUE)

cclasses=c(b="character",
           a="numeric")

## works with short/named colClasses
read.table(textConnection(kkk),
           sep="\t",
           header = TRUE,
           colClasses = cclasses[1])

## works with ordered colClasses
read.table(textConnection(kkk),
           sep="\t",
           header = TRUE,
           colClasses = cclasses[order(names(cclasses))])

## error
read.table(textConnection(kkk),
           sep="\t",
           header = TRUE,
           colClasses = cclasses)
--8<---------------cut here---------------end--------------->8---



In the thread on R-help Henrik Bengtsson provided a patch which I
inline here:


[HB-X201]{hb}: svn diff src\library\utils\R\readtable.R
Index: src/library/utils/R/readtable.R
===================================================================
--- src/library/utils/R/readtable.R     (revision 68642)
+++ src/library/utils/R/readtable.R     (working copy)
@@ -139,7 +139,7 @@
     if (rlabp) col.names <- c("row.names", col.names)

     nmColClasses <- names(colClasses)
-    if(length(colClasses) < cols)
+    if(length(colClasses) <= cols)
         if(is.null(nmColClasses)) {
             colClasses <- rep_len(colClasses, cols)
         } else {



Thanks,
Andreas

[1] http://permalink.gmane.org/gmane.comp.lang.r.general/321919
Comment 1 Brian Ripley 2015-08-04 09:15:05 UTC
Although I have altered this, it was the intentional behaviour (if minimally documented).

For colClasses long enough (including too long) positional matching was used.  Only for too-short vectors were names used to identify which entries had been omitted.
Comment 2 Henrik Bengtsson 2015-08-04 11:38:53 UTC
Thank you for clarifying and for updating.