Bug 16088 - subset on names has quadratic run time if many items don't match
Summary: subset on names has quadratic run time if many items don't match
Status: NEW
Alias: None
Product: R
Classification: Unclassified
Component: Low-level (show other bugs)
Version: R-devel (trunk)
Hardware: x86_64/x64/amd64 (64-bit) Linux
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2014-11-26 08:55 UTC by Kirill Müller
Modified: 2014-11-26 08:57 UTC (History)
0 users

See Also:


Attachments
R source to reproduce the problem (1011 bytes, text/x-r-source)
2014-11-26 08:55 UTC, Kirill Müller
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kirill Müller 2014-11-26 08:55:45 UTC
Created attachment 1693 [details]
R source to reproduce the problem

I have noticed quadratic behavior of `[` when called for a vector where many elements don't have a corresponding entry in the table. The attached script illustrates the behavior, the results of a larger experiment set are on RPubs [1]. Tested with R 3.1.2 and R-devel r67059.

The reason could be a nested loop that kicks in to postprocess mismatches [2].

The lookup process is non-interruptible, the only way out is to kill R entirely. This is especially surprising because a lookup operation is supposed to be cheap.

[1] http://rpubs.com/krlmlr/quadratic_subset
[2] https://github.com/wch/r-source/blob/20b10862635/src/main/subscript.c#L758
Comment 1 Kirill Müller 2014-11-26 08:57:17 UTC
> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_1.0.0.99 plyr_1.8.1      

loaded via a namespace (and not attached):
 [1] colorspace_1.2-4 digest_0.6.4     grid_3.1.2       gtable_0.1.2     MASS_7.3-35      munsell_0.4.2   
 [7] proto_0.3-10     Rcpp_0.11.3      reshape2_1.4     scales_0.2.4     stringr_0.6.2    tools_3.1.2     
[13] ulimit_0.0-2