Bug 17118 - Wishlist: Make sapply(x, f) not much slower for named 'x' and long f(x[[i]])
Summary: Wishlist: Make sapply(x, f) not much slower for named 'x' and long f(x[[i]])
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Wishlist (show other bugs)
Version: R 3.3.*
Hardware: ix86 (32-bit) Windows 32-bit
: P5 enhancement
Assignee: R-core
URL: http://stackoverflow.com/questions/12...
Depends on:
Blocks:
 
Reported: 2016-07-12 16:41 UTC by Suharto Anggono
Modified: 2016-07-14 09:45 UTC (History)
1 user (show)

See Also:


Attachments
simplify2array: use unlist(use.names=FALSE) for common.len>1 (883 bytes, patch)
2016-07-12 16:41 UTC, Suharto Anggono
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Suharto Anggono 2016-07-12 16:41:05 UTC
Created attachment 2126 [details]
simplify2array: use unlist(use.names=FALSE) for common.len>1

Timings in answer by mnel in http://stackoverflow.com/questions/12188509/cleaning-inf-values-from-an-r-dataframe demonstrates that using 'sapply' may slow things down noticeably.

A modified example:

R> dat <- list(a = rep(c(1,Inf), 1e5), b = rep(c(Inf,2), 1e5),
R+ c = rep(c('a','b'), 1e5), d = rep(c(1,Inf), 1e5),
R+ e = rep(c(Inf,2), 1e5))
R> system.time(sapply(dat, is.infinite))
   user  system elapsed
   3.27    0.05    3.34
R> system.time(sapply(dat, is.infinite))
   user  system elapsed
   2.05    0.01    2.07
R> system.time(lapply(dat, is.infinite))
   user  system elapsed
   0.01    0.00    0.02
R> system.time(lapply(dat, is.infinite))
   user  system elapsed
   0.03    0.00    0.03
R> system.time(do.call(cbind, lapply(dat, is.infinite)))
   user  system elapsed
   0.04    0.00    0.03
R> system.time(do.call(cbind, lapply(dat, is.infinite)))
   user  system elapsed
   0.05    0.00    0.03
R> system.time(vapply(dat, is.infinite, logical(length(dat[[1]]))))
   user  system elapsed
   0.03    0.00    0.03
R> system.time(vapply(dat, is.infinite, logical(length(dat[[1]]))))
   user  system elapsed
   0.03    0.00    0.03
R> dat2 <- dat; names(dat2) <- NULL
R> system.time(sapply(dat2, is.infinite))
   user  system elapsed
   0.01    0.00    0.03
R> system.time(sapply(dat2, is.infinite))
   user  system elapsed
   0.04    0.00    0.03

When being applied to 'dat2' that doesn't have names, 'sapply' is much faster.


R> system.time(unlist(lapply(dat, is.infinite),
R+ recursive = FALSE))
   user  system elapsed
   2.85    0.00    2.95
R> system.time(unlist(lapply(dat, is.infinite),
R+ recursive = FALSE))
   user  system elapsed
   2.26    0.00    2.31
R> system.time(unlist(lapply(dat, is.infinite),
R+ recursive = FALSE, use.names = FALSE))
   user  system elapsed
   0.03    0.00    0.03
R> system.time(unlist(lapply(dat, is.infinite),
R+ recursive = FALSE, use.names = FALSE))
   user  system elapsed
   0.03    0.00    0.03

Above, it seems that 'unlist' takes time.
'sapply' calls 'simplify2array'. In code of function 'simplify2array', 'unlist' is used.
unlist(use.names = FALSE) is much faster.
In 'simplify2array', for common.len > 1L, unlist(use.names = FALSE) could be used instead. If simplification is done, function 'array' is applied afterwards, and names in the 'unlist' result is not used.


R> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows XP (build 2600) Service Pack 2

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.3.1
Comment 1 Martin Maechler 2016-07-14 09:45:52 UTC
Thank you, Suharto,  this is much appreciated!

I've checked the proposal including with all recommended packages, and I can't think of a case where it fails.

Committed to R-devel (only).