Bug 15215

Summary: Error of plot on dendrogram object
Product: R Reporter: limingc
Component: GraphicsAssignee: R-core <R-core>
Status: CLOSED FIXED    
Severity: normal CC: greg, luke, maechler, om, simon.urbanek, sm, suharto_anggono
Priority: P3    
Version: R 3.0.0   
Hardware: x86_64/x64/amd64 (64-bit)   
OS: Windows 64-bit   
Attachments: Example data for dendrogram that shows error
Code that converts a byte-code function to an interpreted-code function in the original environment
Patch to dendrogram.R that makes 'plotNode' non-recursive
Patch to dendrogram.R that makes 'plotNode' non-recursive
Patch to dendrogram.R that makes 'plotNode' non-recursive

Description limingc 2013-02-20 19:00:34 UTC
The following code reproduces the problem. I tested it on both 64-bit Windows & Linux machines.

> set.seed(1234)
> x <- matrix(rnorm(1000*40), nr=1000)
> plot(as.dendrogram(hclust(dist(x[1:1000, ]), "average")))
> plot(as.dendrogram(hclust(dist(x[1:1000, ]), "single")))
Error in lapply(args, is.character) : node stack overflow
Error in dev.flush() : node stack overflow
> plot(as.dendrogram(hclust(dist(x[1:100, ]), "single")))
> 

It seems the error happened under the situation of large sample size & 'single' method.
Comment 1 Oleksandr Moskalenko 2013-03-01 15:53:55 UTC
This error also comes up when labels are extracted from a dendrogram object of about 27000 nodes.

> table[,3]=labels(h)
Error in match.fun(FUN) : node stack overflow
Calls: labels ... lapply -> FUN -> lapply -> FUN -> lapply -> match.fun
Execution halted

This is not an issue with the available memory as test runs were done on a very large memory system (1TB).
Comment 2 Oleksandr Moskalenko 2013-03-01 16:01:47 UTC
The case I described is in a command-line mode on x86_64 Linux by the wayt.
Comment 3 Oleksandr Moskalenko 2013-05-16 13:37:12 UTC
This issue is still present in the latest R code. Why can't 64-bit R handle a 30000 node dendrogram?
Comment 4 Simon Urbanek 2013-05-16 13:55:09 UTC
This comes from the byte code, so if your machine can handle large stacks, then either disable byte-compilation or increase R_BCNODESTACKSIZE (in Defn.h). Note that you will also need to increase other recursion limits (see expressions in ?options).
Comment 5 Oleksandr Moskalenko 2013-05-16 15:30:19 UTC
(In reply to comment #4)
> This comes from the byte code, so if your machine can handle large stacks, then
> either disable byte-compilation or increase R_BCNODESTACKSIZE (in Defn.h). Note
> that you will also need to increase other recursion limits (see expressions in
> ?options).

Thanks for the suggestion Simon.

I'm trying both "--disable-byte-compiled-packages" configure switch and

larger R_BCNODESTACKSIZE plus increased --max-ppsize X and options(expressions=X).

The build log shows "** byte-compile and prepare package for lazy loading" that even with "--disable-byte-compiled-packages", though.
Comment 6 Greg Warnes 2014-09-16 19:47:22 UTC
This issue also affects heatmap() and heatmap.2() with large deeply-nested dendrograms (example attached as 'dat.csv'), and appears to be a  consequence of two issues:
(1) stats:::plotNode is implemented as a recursive algorithm, which triggers recursion limits on certain large dendrogram objects, and
(2) the recursion stack for byte-compiled code is fixed at compile time.

Example Code:

    dat <- as.matrix(read.csv(file="dat.csv", row.names=1))
    dist2 <- function(x) as.dist(1-cor(t(x), method="pearson"))
    hclust1 <- function(x) hclust(x, method = "single")

    distance <- dist2(dat)
    cluster  <- hclust1(distance)

    dend <- as.dendrogram(cluster)
    plot(dend)
    ## Error in structure(NextMethod("[["), class = "dendrogram") :
    ##   node stack overflow
    ## Error in dev.flush() : node stack overflow

Previously, when plotNode was an interpreted function, the user could adjust the recursion limit using
   options(expression=...)
to enable handling of deeply-nested dendrograms.  

Now that stats:::plotNode is byte-compiled, the byte-code node stack is set at compile time, it is not simple to adjust the recursion limit to handle these objects.

It seems that the best long-term approach would be to modify the code in plotNode to be iterative rather than recursive, which would remove the recursion depth issue entirely.

As a short term solution, this ***hack*** has worked for me:

    source('unByteCode.R') ## attached
    dat <- as.matrix(read.csv(file="dat.csv", row.names=1))
    dist2 <- function(x) as.dist(1-cor(t(x), method="pearson"))
    hclust1 <- function(x) hclust(x, method = "single")

    distance <- dist2(dat)
    cluster  <- hclust1(distance)

    dend <- as.dendrogram(cluster)

    ## convert stats:::plotNode from byte-code to interpreted-code
    unByteCodeAssign(stats:::plotNode)
  
    options("expressions"=5e4)  # increase recursion limit
    plot(dend)
Comment 7 Greg Warnes 2014-09-16 19:48:17 UTC
Created attachment 1659 [details]
Example data for dendrogram that shows error
Comment 8 Greg Warnes 2014-09-16 19:49:56 UTC
Created attachment 1660 [details]
Code that converts a byte-code function to an interpreted-code function in the original environment
Comment 9 Luke Tierney 2014-09-17 10:49:04 UTC
The recursive solution is going to run into a hard limit one way or another, whether it is the byte code stack limit or the C stack limit. We can raise the
byte code stack size -- both examples pass with a factor of 4 increase, but I'm not sure that is warranted by this case alone.
Comment 10 Greg Warnes 2014-09-17 14:46:46 UTC
I agree that raising the (fixed) bytecode stack limit is not the solution, although it could be helpful if could be done at run-time to allow users to work around the issue when it does occur.  

Ultimately, the solution for this particular case is to modify stats:::plotNode to  use an iterative algorithm rather than a recursive one.
Comment 11 Suharto Anggono 2015-05-02 02:05:37 UTC
Created attachment 1818 [details]
Patch to dendrogram.R that makes 'plotNode' non-recursive

This retains the original code.

When 'plotNode' here is executed, there is a difference in order of actions compared to the original. Here, all lines to all childs are drawn first. Then, each child is drawn.
Comment 12 Suharto Anggono 2015-05-02 04:55:04 UTC
Created attachment 1819 [details]
Patch to dendrogram.R that makes 'plotNode' non-recursive

This retains the original code.

When 'plotNode' here is executed, there is a difference in order of actions compared to the original. Here, all lines to all children are drawn first. Then, each child is drawn.
Comment 13 Suharto Anggono 2015-05-02 08:33:55 UTC
Created attachment 1820 [details]
Patch to dendrogram.R that makes 'plotNode' non-recursive

This retains the original code.

When 'plotNode' here is executed, there is a difference in order of actions compared to the original. Here, all lines to all children are drawn first. Then, each child is drawn.
Comment 14 Martin Maechler 2015-05-02 12:27:38 UTC
(In reply to Suharto Anggono from comment #13)
> Created attachment 1820 [details]
> Patch to dendrogram.R that makes 'plotNode' non-recursive
> 
> This retains the original code.
> 
> When 'plotNode' here is executed, there is a difference in order of actions
> compared to the original. Here, all lines to all children are drawn first.
> Then, each child is drawn.

I think that should not be a problem in principle.   
At least with the tests I've tried,  you did solve the problem of rewriting the original code ... and still keep all its parts ...

Thank you very much Suharto Anggono for you work.
I'm committing it to R-devel now  {and to R-patched if no problems surface}.
Comment 15 Suharto Anggono 2015-05-02 18:04:01 UTC
The particular issue with 'labels.dendrogram' may be solved by using rapply(...) instead of unlist(dendrapply(...)).
Comment 16 Suharto Anggono 2015-05-14 14:29:19 UTC
(In reply to Martin Maechler from comment #14)
> (In reply to Suharto Anggono from comment #13)
> > Created attachment 1820 [details]
> > Patch to dendrogram.R that makes 'plotNode' non-recursive
> > 
> > This retains the original code.
> > 
> > When 'plotNode' here is executed, there is a difference in order of actions
> > compared to the original. Here, all lines to all children are drawn first.
> > Then, each child is drawn.
> 
> I think that should not be a problem in principle.   
> At least with the tests I've tried,  you did solve the problem of rewriting
> the original code ... and still keep all its parts ...
> 
> Thank you very much Suharto Anggono for you work.
> I'm committing it to R-devel now  {and to R-patched if no problems surface}.

The documentation of 'dendrogram' in Warning section says: "Some operations on dendrograms (including plotting) make use of recursion." The part "(including plotting)" no longer holds with the 'plotNode' code change.
Comment 17 lissacoffey 2015-07-06 05:35:44 UTC
Error in lapply(args, is.character) : node stack overflow
Error in dev.flush() : node stack overflow
http://dataverse.scholarsportal.info/dvn/dv/blog
http://arc.irss.unc.edu/dvn/dv/blog
Comment 18 Martin Maechler 2015-07-06 16:50:04 UTC
(In reply to Suharto Anggono from comment #15)
> The particular issue with 'labels.dendrogram' may be solved by using
> rapply(...) instead of unlist(dendrapply(...)).

Yes, indeed.... I'm still testing it {which is slow for these very large clusterings},  but indeed, it seems rapply(..) is enough for this case.

If the tests succeed well, I'll commit the change to R-devel ... to be ported to R-patched not much later.
Comment 19 Martin Maechler 2015-07-08 11:10:33 UTC
(In reply to Martin Maechler from comment #18)
> (In reply to Suharto Anggono from comment #15)
> > The particular issue with 'labels.dendrogram' may be solved by using
> > rapply(...) instead of unlist(dendrapply(...)).
> 
> Yes, indeed.... I'm still testing it {which is slow for these very large
> clusterings},  but indeed, it seems rapply(..) is enough for this case.
> 
> If the tests succeed well, I'll commit the change to R-devel ... to be
> ported to R-patched not much later.

That was not good enough... it was fast and correct in the large examples I tested, but it broke several CRAN packages.  so has been reverted.
--> reopened the bug {the part about making  `labels.dendrogram()`  non-recursive}.
Comment 20 Martin Maechler 2015-07-18 16:30:18 UTC
(In reply to Martin Maechler from comment #19)
> (In reply to Martin Maechler from comment #18)
> > (In reply to Suharto Anggono from comment #15)
> > > The particular issue with 'labels.dendrogram' may be solved by using
> > > rapply(...) instead of unlist(dendrapply(...)).
> > 
> > Yes, indeed.... I'm still testing it {which is slow for these very large
> > clusterings},  but indeed, it seems rapply(..) is enough for this case.
> > 
> > If the tests succeed well, I'll commit the change to R-devel ... to be
> > ported to R-patched not much later.
> 
> That was not good enough... it was fast and correct in the large examples I
> tested, but it broke several CRAN packages.  so has been reverted.
> --> reopened the bug {the part about making  `labels.dendrogram()` 
> non-recursive}.

This has been fixed both in R-devel and R-patched now:
labels.dendrogram now works via rapply(.) when possible.