Bug 15337 - str(factor) is slow
Summary: str(factor) is slow
Alias: None
Product: R
Classification: Unclassified
Component: I/O (show other bugs)
Version: R 3.0.0
Hardware: x86_64/x64/amd64 (64-bit) Linux
: P5 major
Assignee: R-core
Depends on:
Reported: 2013-06-06 19:33 UTC by Sam Steingold
Modified: 2013-07-30 17:09 UTC (History)
1 user (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Sam Steingold 2013-06-06 19:33:50 UTC
str(factor) is extremely slow (can take minutes!) for long vectors with many unused levels.
this is a major violation of the contract ("give reasonable output for *any* R object") because this speed makes the function virtually unusable.
to reproduce:
> words <- sapply(1:1e7,function(i)paste(sample(letters,8),collapse=""))
> uids <- as.facror(sample(words[1:1e5],1e7,replace=TRUE))
> levels(uids) <- words
> system.time(str(uids))
 Factor w/ 9999170 levels "ucfztmbv","eqfsohly",..: 59269 86271 6298 58634 41938 95895 71648 41311 63157 67683 ...
             user            system           elapsed 
15.752 (15.75sec)  0.128 (128.00ms) 15.915 (15.91sec) 
(your timing may vary because of the effects of randomness)
Comment 1 Martin Maechler 2013-06-08 13:39:23 UTC
Thank you, Sam!

  Indeed, this has been unfortunate, and I have fixed it,
together with another efficiency infelicity with *very* large objects, by a proposal from Luke Tierney,
in R-devel for now (svn rev 62902). 
Will backport  to  3.0.1 patched after few days,
just in case I had overlooked a use case which would show up on CRAN or elsewhere.