Bug 15337 - str(factor) is slow
str(factor) is slow
Status: CLOSED FIXED
Product: R
Classification: Unclassified
Component: I/O
R 3.0.0
x86_64/x64/amd64 (64-bit) Linux
: P5 major
Assigned To: R-core
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-06 19:33 UTC by Sam Steingold
Modified: 2013-07-30 17:09 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sam Steingold 2013-06-06 19:33:50 UTC
str(factor) is extremely slow (can take minutes!) for long vectors with many unused levels.
this is a major violation of the contract ("give reasonable output for *any* R object") because this speed makes the function virtually unusable.
to reproduce:
------------------------------------------------------------------
> words <- sapply(1:1e7,function(i)paste(sample(letters,8),collapse=""))
> uids <- as.facror(sample(words[1:1e5],1e7,replace=TRUE))
> levels(uids) <- words
> system.time(str(uids))
 Factor w/ 9999170 levels "ucfztmbv","eqfsohly",..: 59269 86271 6298 58634 41938 95895 71648 41311 63157 67683 ...
             user            system           elapsed 
15.752 (15.75sec)  0.128 (128.00ms) 15.915 (15.91sec) 
------------------------------------------------------------------
(your timing may vary because of the effects of randomness)
Comment 1 Martin Maechler 2013-06-08 13:39:23 UTC
Thank you, Sam!

  Indeed, this has been unfortunate, and I have fixed it,
together with another efficiency infelicity with *very* large objects, by a proposal from Luke Tierney,
in R-devel for now (svn rev 62902). 
Will backport  to  3.0.1 patched after few days,
just in case I had overlooked a use case which would show up on CRAN or elsewhere.

Martin