Bug 16640

Summary: 2X speedup for tapply
Product: R Reporter: Peter Haverty <phaverty>
Component: Low-levelAssignee: R-core <R-core>
Status: REOPENED ---    
Severity: enhancement CC: maechler, ryanramadhan25corps
Priority: P5    
Version: R-devel (trunk)   
Hardware: Other   
OS: Other   
Attachments: refactoring of tapply for speed
3 versions of tapply() - plus some testing examples
revised second part
revised second part, more change
revised second part, more change
4 versions of tapply() - plus some testing examples
alternative first part
alternative first part, with integer overflow check
alternative first part, with integer overflow check
alternative first part using 'lapply'
alternative first part using 'lapply', with integer overflow check
alternative first part using 'lapply', without specializing one factor
essence of first part (minimal diff)
alternative first part using 'lapply' and index matrix
alternative first part using 'lapply' and index matrix
6 versions of tapply() - plus some testing & benchmark examples
consider integer overflow while there is a factor with zero levels
first part, original + specializing one factor + integer overflow check
test of timing in extreme case
second part, passing factor to 'split' only when all groups are present
old second part with little change
old second part with little change
9 versions of tapply() - plus some testing & benchmark examples
Output of benchmarking (for 'nb-mm4')
second part, reusing variable 'group'
choosing between passing factor to 'split' and not
second part, splitting in two stage for large 'ngroup'
second part, splitting in two stage for large 'ngroup'
11 versions of tapply() - plus some testing & benchmark examples
Output of benchmarking (for 'nb-mm4')
Timing new and changed old version for various length and number of levels

Description Peter Haverty 2015-12-23 00:38:06 UTC
Created attachment 1950 [details]
refactoring of tapply for speed

> microbenchmark(old = tapply(x,y,sum), new=tapply2(x,y,sum))
 Unit: microseconds
  expr     min       lq      mean  median       uq     max neval cld
   old 123.267 128.3620 132.62360 130.819 133.2620 293.802   100   b
   new  63.515  69.9605  73.67165  74.185  76.6655  97.960   100  a
> 

Some small changes remove about half the overhead from tapply. It might be worth having a fast path for a single factor in the "INDEX" too.
Comment 1 Martin Maechler 2015-12-26 16:10:57 UTC
Created attachment 1954 [details]
3 versions of tapply() - plus some testing examples
Comment 2 Martin Maechler 2015-12-26 16:14:40 UTC
(In reply to Martin Maechler from comment #1)
> Created attachment 1954 [details]
> 3 versions of tapply() - plus some testing examples

(First comment got lost:)

The first part of your patch is fine, makes sense and gains about 3% speed for the (not so interesting) case  FUN = NULL.

However the important part, the line  'group = structure(.............)' is incorrect.

I've tried two version that *are* correct, but unfortunately there is basically no speed gain left,
see attachment 1954 [details]  with three versions of tapply() and realistic testing code.
Comment 3 Suharto Anggono 2015-12-31 02:11:06 UTC
Created attachment 1961 [details]
revised second part
Comment 4 Suharto Anggono 2015-12-31 02:12:02 UTC
Created attachment 1962 [details]
revised second part, more change
Comment 5 Suharto Anggono 2015-12-31 02:55:27 UTC
Created attachment 1963 [details]
revised second part, more change
Comment 6 Suharto Anggono 2015-12-31 05:13:38 UTC
The check
    if (!nI) stop("'INDEX' is of length zero")
cannot be removed. The user can supply INDEX = list().
Comment 7 Martin Maechler 2016-01-01 18:58:41 UTC
Patch 1963 ("revised second part, more change") is really good, speed wise.
No factor of 2 though... but I see pretty uniform improvement.

For now, I'm replacing my attchment with one that has  4 versions of tapply(),
showing the above improvement *and* passing the new checks that I had added
(which the very first proposal failed).

Thank you very much Suharto Anggono!
Comment 8 Martin Maechler 2016-01-01 19:00:47 UTC
Created attachment 1964 [details]
4 versions of tapply() - plus some testing examples

Now incorporates the latest proposal (from Dec 31) and demonstrates that it is clearly superior
Comment 9 Suharto Anggono 2016-01-02 02:40:44 UTC
Created attachment 1965 [details]
alternative first part
Comment 10 Suharto Anggono 2016-01-02 02:41:35 UTC
Created attachment 1966 [details]
alternative first part, with integer overflow check
Comment 11 Suharto Anggono 2016-01-02 02:45:36 UTC
Created attachment 1967 [details]
alternative first part, with integer overflow check
Comment 12 Suharto Anggono 2016-01-02 04:09:47 UTC
Created attachment 1968 [details]
alternative first part using 'lapply'
Comment 13 Suharto Anggono 2016-01-02 04:10:36 UTC
Created attachment 1969 [details]
alternative first part using 'lapply', with integer overflow check
Comment 14 Suharto Anggono 2016-01-02 06:01:42 UTC
Created attachment 1970 [details]
alternative first part using 'lapply', without specializing one factor
Comment 15 Suharto Anggono 2016-01-02 08:02:42 UTC
Created attachment 1971 [details]
essence of first part (minimal diff)
Comment 16 Suharto Anggono 2016-01-02 08:05:54 UTC
In my experiment with small number of factors, putting length check outside the loop is not better.
Comment 17 Martin Maechler 2016-01-02 11:36:19 UTC
(In reply to Suharto Anggono from comment #16)
> In my experiment with small number of factors, putting length check outside
> the loop is not better.

Well, "of course".  The typical tapply usage is with one factor and then two, probably quite rarely more than two.
That's why my tests look at these cases and even more importantly that's why the original authors of  tapply()  did not care about vectorizing the  for() loop.

OTOH, I do like your lapply() version including the if(ni > 1L)  
and I also agree that the integer overflow check should get into tapply() ..
So, I'll look into attachment 1969 [details] ('tapply1diff2.txt').
Comment 18 Suharto Anggono 2016-01-02 13:40:39 UTC
Created attachment 1973 [details]
alternative first part using 'lapply' and index matrix
Comment 19 Suharto Anggono 2016-01-02 15:02:11 UTC
Created attachment 1974 [details]
alternative first part using 'lapply' and index matrix
Comment 20 Martin Maechler 2016-01-02 21:27:33 UTC
Created attachment 1975 [details]
6 versions of tapply() - plus some testing & benchmark examples
Comment 21 Martin Maechler 2016-01-02 21:31:28 UTC
(In reply to Martin Maechler from comment #20)
> Created attachment 1975 [details]
> 6 versions of tapply() - plus some testing & benchmark examples

Attachment 1969 [details] is very good.  
The new (attachment 1974 [details]) is clearly slower for one factor only, 
but good for two factors and slightly better for three.

Still, I'd like to commit the 1969 one, i.e. 'tapply5()' from my attachment 1975 [details]
to R-devel.

If you want to continue, we'll work with diffs against the new R-devel.
Comment 22 Suharto Anggono 2016-01-03 00:57:33 UTC
(In reply to Martin Maechler from comment #21)
> (In reply to Martin Maechler from comment #20)
> > Created attachment 1975 [details]
> > 6 versions of tapply() - plus some testing & benchmark examples
> 
> Attachment 1969 [details] is very good.  
> The new (attachment 1974 [details]) is clearly slower for one factor only, 
> but good for two factors and slightly better for three.
> 
> Still, I'd like to commit the 1969 one, i.e. 'tapply5()' from my attachment
> 1975 [details]
> to R-devel.
> 
> If you want to continue, we'll work with diffs against the new R-devel.

"Matrices are restricted to less than 2^31 rows and columns even on 64-bit systems.". So, "alternative first part using 'lapply' and index matrix" (attachment 1974 [details]) can be ignored because it makes 'tapply' not applicable to long vectors.
Comment 23 Suharto Anggono 2016-01-03 01:40:27 UTC
By using 'split.default' instead of 'split', if 'tapply' is applied to object of class "Date" or "POSIXct", its 'split' method, which is faster than 'split.default', is not picked up.
Comment 24 Suharto Anggono 2016-01-03 11:06:17 UTC
Created attachment 1976 [details]
consider integer overflow while there is a factor with zero levels

If there is a factor with zero levels, in the end, the number of groups is zero. However, in that case, integer overflow can happen in the middle. Proceeding as usual gives correct result, but with warning "NAs introduced by coercion to integer range". This is to avoid the warning.

Error message is changed to use the word "cell" as mentioned in the documentation.
Comment 25 Suharto Anggono 2016-01-03 13:22:43 UTC
Created attachment 1977 [details]
first part, original + specializing one factor + integer overflow check

Seems to be slightly faster
Comment 26 Martin Maechler 2016-01-04 17:22:30 UTC
(In reply to Suharto Anggono from comment #23)
> By using 'split.default' instead of 'split', if 'tapply' is applied to
> object of class "Date" or "POSIXct", its 'split' method, which is faster
> than 'split.default', is not picked up.

You are right.  -> R-devel's version changed back to use split()
Comment 27 Peter Haverty 2016-01-04 17:26:25 UTC
(In reply to Suharto Anggono from comment #23)
> By using 'split.default' instead of 'split', if 'tapply' is applied to
> object of class "Date" or "POSIXct", its 'split' method, which is faster
> than 'split.default', is not picked up.

Good point, thanks for catching this.
Comment 28 Martin Maechler 2016-01-05 14:24:44 UTC
(In reply to Suharto Anggono from comment #25)
> Created attachment 1977 [details]
> first part, original + specializing one factor + integer overflow check
> 
> Seems to be slightly faster

yes, in most cases.. but only slightly. ... and the code is more complicated.
For now, I'm keeping what we have in R-devel since yesterday {which corresponds to 'tapply5' in attachment 1975 [details].

Hence I'm closing this for now.
We did make considerable process, thank you Suharto and Pete!
Comment 29 Suharto Anggono 2016-01-05 17:26:33 UTC
Created attachment 1991 [details]
test of timing in extreme case

This uses extreme case from attachment 1975 [details].

If FUN is not null, the version of 'tapply' in current R-devel (using second part based on attachment 1963 [details]) is worse than the old version of 'tapply' when there are many cells without data.

> memory.limit()
[1] 502
> for (nI in 7:11) print(system.time(tapply1(x, I32[1:nI], sum)))
   user  system elapsed
   0.06    0.00    0.06
   user  system elapsed
   0.08    0.00    0.08
   user  system elapsed
   0.09    0.00    0.10
   user  system elapsed
   0.22    0.05    0.28
   user  system elapsed
   0.33    0.06    0.39
> for (nI in 7:11) print(system.time(tapply5(x, I32[1:nI], sum)))
   user  system elapsed
   0.03    0.00    0.03
   user  system elapsed
   0.11    0.00    0.11
   user  system elapsed
   0.91    0.00    0.91
   user  system elapsed
   4.78    0.07    4.88
   user  system elapsed
  22.55    1.14   37.37
> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows XP (build 2600) Service Pack 2

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] compiler  stats     graphics  grDevices utils     datasets  methods
[8] base
Comment 30 Suharto Anggono 2016-01-09 05:54:24 UTC
Created attachment 1994 [details]
second part, passing factor to 'split' only when all groups are present

This is an attempt to make 'tapply' not too bad in extreme case.
Comment 31 Suharto Anggono 2016-01-09 15:08:34 UTC
Created attachment 1995 [details]
old second part with little change
Comment 32 Suharto Anggono 2016-01-09 15:10:16 UTC
Created attachment 1996 [details]
old second part with little change
Comment 33 Martin Maechler 2016-01-09 21:36:41 UTC
Created attachment 1997 [details]
9 versions of tapply() - plus some testing & benchmark examples

This now has  tapply1 .. tapply9, with the latest two corresponding to attachment 1994 [details] and attachment 1996 [details].
It shows that these two do good for the extreme cases -- HOWEVER -- they are somewhat poor (compared tapply5, say, the current R-devel version) for the
much more important cases of one factor, two factors,...
tapply9 is embarassingly bad there, whereas  tapply8  seems ok (still losing 10% compared to tapply5).

Overall, tapply8 seems the best compromise currently... but I'd hope for something better for a bit.
Comment 34 Martin Maechler 2016-01-09 21:39:21 UTC
Created attachment 1998 [details]
Output of benchmarking (for 'nb-mm4')

This is the output from running  the  tapply-speed.R script from attachment 1997 [details] on my (relatively new intel CORE i7) notebook, under R-devel.
Comment 35 Suharto Anggono 2016-01-11 16:38:55 UTC
For a quite large 'ngroup', as.character(seq_len(ngroup)) takes time.
In the version of 'tapply' in current R-devel (using second part based on attachment 1963 [details]), levels(group) actually could be anything.
A hack like
    attributes(group) <- list(levels = character(ngroup), class = "factor")
can be used. But the duplicated levels is not desired, is it?
intToUtf8(., multiple = TRUE) is faster than as.character(.), but still takes time.
For the case of one factor only, split(X, INDEX[[1L]]) could be used instead of split(X, group).

Using the hack, the time taken is not too bad. For the case where there are very many cells and only a tiny fraction that has data, the old version of 'tapply' is still faster because 'split' with very many splits takes time.
Comment 36 Martin Maechler 2016-01-11 17:47:13 UTC
(In reply to Suharto Anggono from comment #35)
I do agree that the current R-devel  tapply()  is not "optimal" and should be
improved for "the extreme case".

Hence, I have formally reopened the bug.
OTOH, the current R-devel  tapply() is considerably faster in all "reasonable" situations, and hence I'd rather keep it than replace it with something that's considerably slower for all typical cases (of one, or two, or maybe 3 factors).
Comment 37 Peter Haverty 2016-01-11 17:54:39 UTC
Perhaps we should special-case tapply with a single factor?
Comment 38 Suharto Anggono 2016-01-11 22:17:35 UTC
Created attachment 2001 [details]
second part, reusing variable 'group'

This helps to throw away levels(group) when no longer needed.
Comment 39 Suharto Anggono 2016-01-14 14:42:34 UTC
Testing tapply(x, f, FUN) for non-null 'FUN' may use the following cases.
f <- factor(rep(1, n), 1:m)  # good for old, bad for new
f <- factor(rep(1:m, length = n), 1:m)  # good for new, bad for old

To see the effect of slow 'split', tapply(f, f, length), where f is a factor, may be tested.
Comment 40 Suharto Anggono 2016-01-14 16:26:39 UTC
Created attachment 2003 [details]
choosing between passing factor to 'split' and not

This falls back to the old version of 'tapply' in "unsafe" cases.

To be efficient, it is decided in advance whether to proceed like the version of 'tapply' in current R-devel or like the old version of 'tapply'. It is not guaranteed that the chosen branch will be faster than the other, but maybe not too bad.

The variable name 'group' is used in place of 'index'.

It is also possible simply to put the part from attachment 1963 [details] and the part from attachment 1996 [details] on the branches.
Comment 41 Suharto Anggono 2016-01-23 07:11:48 UTC
Created attachment 2006 [details]
second part, splitting in two stage for large 'ngroup'
Comment 42 Suharto Anggono 2016-01-23 07:20:33 UTC
Created attachment 2007 [details]
second part, splitting in two stage for large 'ngroup'
Comment 43 Suharto Anggono 2016-01-26 16:41:13 UTC
(In reply to Suharto Anggono from comment #40)
> Created attachment 2003 [details]
> choosing between passing factor to 'split' and not
> 
> This falls back to the old version of 'tapply' in "unsafe" cases.

Another variant of the condition:
    if (allg <- ngroup <= length(X))
length(X) might already be saved in a variable before.
It proceeds like the old version of 'tapply' when number of available slots is less than number of available groups, so that groups cannot be all present.
Comment 44 Martin Maechler 2016-01-27 22:25:23 UTC
Created attachment 2009 [details]
11 versions of tapply() - plus some testing & benchmark examples

the benchmarking R script ... including "tapply11" which correspsonds to the attachment 2007 [details].

Interesting, the latest two (t..10 and t..11) are not so good if we move towards the extreme case,  "t.8" and "t.9" seem better..
Comment 45 Martin Maechler 2016-01-27 22:26:42 UTC
Created attachment 2010 [details]
Output of benchmarking (for 'nb-mm4')

Output of attachment 2009 [details] on my 'nb-mm4' (using very recent R-devel !!)
Comment 46 Suharto Anggono 2016-01-30 15:48:21 UTC
Created attachment 2012 [details]
Timing new and changed old version for various length and number of levels

Result:

Length: 100
        Number of Levels
           10  100 1000 10000 1e+05
  1 new  0.00 0.02 0.06  0.52  7.23
  1 oldc 0.02 0.02 0.03  0.03  0.32
  2 new  0.00 0.01 0.05  0.50  6.86
  2 oldc 0.01 0.01 0.00  0.03  0.25
Length: 1000
        Number of Levels
           10  100 1000 10000 1e+05
  1 new  0.02 0.01 0.08  0.52  7.05
  1 oldc 0.02 0.03 0.13  0.14  0.44
  2 new  0.01 0.01 0.05  0.47  6.98
  2 oldc 0.00 0.02 0.02  0.04  0.33
Length: 10000
        Number of Levels
           10  100 1000 10000 1e+05
  1 new  0.03 0.03 0.11  0.87  7.49
  1 oldc 0.06 0.08 0.19  1.39  1.91
  2 new  0.02 0.03 0.06  0.56  7.06
  2 oldc 0.06 0.07 0.06  0.08  0.41
Length: 1e+05
        Number of Levels
           10  100 1000 10000 1e+05
  1 new  0.18 0.22 0.30  1.28 12.69
  1 oldc 0.84 0.86 0.89  2.77 20.12
  2 new  0.19 0.23 0.28  0.67  6.54
  2 oldc 0.84 0.86 0.80  0.79  0.87
Length: 1e+06
        Number of Levels
           10  100 1000 10000 1e+05
  1 new  2.53 2.41 2.70  3.94 18.65
  1 oldc 7.67 7.87 9.03 12.45 43.20
  2 new  2.55 2.58 2.46  3.33  9.22
  2 oldc 7.94 7.96 7.94  8.09  9.14
R version 3.2.3 (2015-12-10)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows XP (build 2600) Service Pack 2

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] compiler  stats     graphics  grDevices utils     datasets  methods  
[8] base
Comment 47 Suharto Anggono 2016-01-31 00:38:14 UTC
(In reply to Suharto Anggono from comment #46)
> Created attachment 2012 [details]
> Timing new and changed old version for various length and number of levels
> 

This uses
system.time(for (i in 1:10) tapply5(f, f, length))
system.time(for (i in 1:10) tapply9(f, f, length))

Length: 100
        Number of Levels
           10  100 1000 10000
  1 new  0.00 0.06 0.56 32.69
  1 oldc 0.02 0.07 0.03  0.05
  2 new  0.01 0.05 0.54 33.72
  2 oldc 0.02 0.00 0.00  0.02
Length: 1000
        Number of Levels
           10  100 1000 10000
  1 new  0.02 0.05 1.00 34.08
  1 oldc 0.01 0.06 0.72  0.66
  2 new  0.00 0.05 0.52 32.42
  2 oldc 0.00 0.01 0.00  0.02
Length: 10000
        Number of Levels
           10  100 1000 10000
  1 new  0.00 0.05 0.95 34.97
  1 oldc 0.02 0.06 0.51 35.98
  2 new  0.00 0.05 0.56 36.81
  2 oldc 0.01 0.02 0.01  0.03
Length: 1e+05
        Number of Levels
           10  100 1000 10000
  1 new  0.16 0.12 0.69 36.61
  1 oldc 0.25 0.32 1.14 38.40
  2 new  0.21 0.17 0.57 35.53
  2 oldc 0.28 0.33 0.27  0.36
Length: 1e+06
        Number of Levels
           10  100 1000 10000
  1 new  2.01 2.30 1.99 36.17
  1 oldc 3.05 2.90 3.73 41.73
  2 new  1.56 1.06 2.03 39.92
  2 oldc 2.42 2.28 2.57  2.69
R version 3.2.3 (2015-12-10)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows XP (build 2600) Service Pack 2

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] compiler  stats     graphics  grDevices utils     datasets  methods  
[8] base     


P.S.:
Identical <- function(...) Reduce(
function(x, y) list(x[[1]] && identical(x[[2]], y), y),
list(...)[-1], list(TRUE, ..1))[[1]]
Comment 48 Suharto Anggono 2016-01-31 11:07:59 UTC
(In reply to Suharto Anggono from comment #46)
> Created attachment 2012 [details]
> Timing new and changed old version for various length and number of levels
> 
Using R-devel r70026:

Length: 100
        Number of Levels
           10  100 1000 10000 1e+05
  1 new  0.00 0.02 0.07  0.50  6.58
  1 oldc 0.02 0.03 0.02  0.03  0.14
  2 new  0.00 0.00 0.04  0.69  6.48
  2 oldc 0.00 0.01 0.00  0.01  0.13
Length: 1000
        Number of Levels
           10  100 1000 10000 1e+05
  1 new  0.01 0.02 0.09  0.56  6.69
  1 oldc 0.02 0.04 0.14  0.17  0.26
  2 new  0.00 0.02 0.05  0.47  6.45
  2 oldc 0.02 0.01 0.00  0.02  0.14
Length: 10000
        Number of Levels
           10  100 1000 10000 1e+05
  1 new  0.03 0.03 0.09  0.86  7.08
  1 oldc 0.06 0.08 0.19  1.35  1.51
  2 new  0.02 0.02 0.07  0.48  7.53
  2 oldc 0.07 0.06 0.06  0.08  0.28
Length: 1e+05
        Number of Levels
           10  100 1000 10000 1e+05
  1 new  0.23 0.21 0.49  1.61 13.00
  1 oldc 1.10 1.02 0.95  3.88 25.38
  2 new  0.23 0.24 0.30  0.72  8.48
  2 oldc 1.09 1.18 1.17  0.75  1.01
Length: 1e+06
        Number of Levels
           10  100 1000 10000 1e+05
  1 new  2.30 2.47 2.78  4.30 21.12
  1 oldc 8.18 8.50 9.89 15.25 51.19
  2 new  2.41 2.30 2.73  2.84 10.30
  2 oldc 8.96 9.07 8.15  8.29  9.08
R Under development (unstable) (2016-01-27 r70026)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows XP (build 2600) Service Pack 2

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] compiler  stats     graphics  grDevices utils     datasets  methods  
[8] base     


The following uses
system.time(for (i in 1:10) tapply5(f, f, length))
system.time(for (i in 1:10) tapply9(f, f, length))

Length: 100
        Number of Levels
           10  100 1000 10000
  1 new  0.00 0.07 0.94 32.11
  1 oldc 0.01 0.04 0.07  0.04
  2 new  0.00 0.06 0.92 33.75
  2 oldc 0.00 0.01 0.00  0.00
Length: 1000
        Number of Levels
           10  100 1000 10000
  1 new  0.01 0.06 0.72 36.56
  1 oldc 0.01 0.05 0.55  0.66
  2 new  0.01 0.04 0.65 30.42
  2 oldc 0.01 0.01 0.00  0.00
Length: 10000
        Number of Levels
           10  100 1000 10000
  1 new  0.02 0.05 0.53 33.78
  1 oldc 0.03 0.05 0.61 33.42
  2 new  0.02 0.03 0.58 29.89
  2 oldc 0.02 0.03 0.03  0.03
Length: 1e+05
        Number of Levels
           10  100 1000 10000
  1 new  0.09 0.14 0.80 33.24
  1 oldc 0.25 0.28 1.08 30.66
  2 new  0.11 0.16 0.79 32.92
  2 oldc 0.26 0.19 0.31  0.20
Length: 1e+06
        Number of Levels
           10  100 1000 10000
  1 new  1.95 1.64 2.04 32.33
  1 oldc 3.01 3.48 3.92 40.78
  2 new  0.90 1.24 1.44 33.36
  2 oldc 2.65 2.95 2.98  3.11
R Under development (unstable) (2016-01-27 r70026)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows XP (build 2600) Service Pack 2

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] compiler  stats     graphics  grDevices utils     datasets  methods  
[8] base
Comment 49 Suharto Anggono 2016-02-14 09:30:38 UTC
While on this, I found two places where copying or allocation could be avoided:
- storage.mode(f) <- "integer" in 'split.default'
- 'ansmat'

* storage.mode(f) <- "integer" in 'split.default'
If 'f' is already a factor with storage mode "integer", no change to 'f' is required. But, apparently, in such case,
storage.mode(f) <- "integer"
results in copying.
In similar instance in function 'structure', assignment to storage.mode is made conditionally.

* 'ansmat'
- In no simplification case
List of correct length is created. Then, it is passed to function 'array', that, apparently, results in reallocation.
Assigning dim and dimnames directly, apparently, doesn't copy.
- In simplification case
Initially, 'ansmat' is of mode "logical".
In
ansmat[index] <- ans
copy is made if 'ans' is not of mode "logical". In principle, 'ansmat' can be created with correct storage mode in the first time, that avoids copying.