Bug 16424 - stats::heatmap fails with node stack overflow error when several identical rows exist in data
Summary: stats::heatmap fails with node stack overflow error when several identical ro...
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Graphics (show other bugs)
Version: R 3.2.0
Hardware: Other Linux
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2015-06-15 12:46 UTC by Gökcen Eraslan
Modified: 2016-02-09 07:53 UTC (History)
1 user (show)

See Also:


Attachments
sample file to reproduce the error (335.99 KB, application/gzip)
2015-06-15 12:46 UTC, Gökcen Eraslan
Details
Patch to dendrogram.R that makes 'oV' and 'setmid' non-recursive (3.16 KB, patch)
2016-02-06 03:23 UTC, Suharto Anggono
Details | Diff
Patch to dendrogram.R that makes 'oV' and 'setmid' non-recursive (3.14 KB, patch)
2016-02-06 10:15 UTC, Suharto Anggono
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Gökcen Eraslan 2015-06-15 12:46:29 UTC
Created attachment 1844 [details]
sample file to reproduce the error

stats::heatmap() function fails with the following error when there are several duplicate rows in the data:

heatmap(as.matrix(read.csv('example.csv.gz')))

Error in structure(r, class = "dendrogram") : node stack overflow
Error during wrapup: node stack overflow

Error can be reproduced with the attached sample file. Intuitively, filtering out duplicate rows can be used as a dirty workaround:

x <- as.matrix(read.csv('example.csv.gz'))
heatmap(x[!duplicated(x),])

However, it should also work without filtering out identical rows. It is just a 2499x56 matrix...

PS: This seems similar to #15215, but plot(as.dendrogram(hclust(dist(as.matrix(read.csv('example.csv.gz')))))) works fine.

> sessionInfo()
R version 3.2.1 RC (2015-06-14 r68515)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Antergos

locale:
 [1] LC_CTYPE=en_DK.UTF-8       LC_NUMERIC=C               LC_TIME=en_DK.UTF-8        LC_COLLATE=en_DK.UTF-8    
 [5] LC_MONETARY=en_DK.UTF-8    LC_MESSAGES=en_DK.UTF-8    LC_PAPER=en_DK.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.2.1
Comment 1 Gökcen Eraslan 2015-06-15 13:19:45 UTC
It seems recursive oV function within reorder.dendrogram is causing the error. Easier way to reproduce the bug:

x <- as.matrix(read.csv('example.csv.gz'))
reorder(as.dendrogram(hclust(dist(x))), rowMeans(x))

Rewriting a non-recurseive oV(https://github.com/wch/r-source/blob/trunk/src/library/stats/R/dendrogram.R#L628) should solve the problem.
Comment 2 Suharto Anggono 2016-02-06 03:23:38 UTC
Created attachment 2017 [details]
Patch to dendrogram.R that makes 'oV' and 'setmid' non-recursive

For the example, for byte-compiled functions 'reorder.dendrogram' and 'midcache.dendrogram', just making function 'setmid' in 'midcache.dendrogram' non-recursive is enough to avoid "node stack overflow"; making just function 'oV' in 'reorder.dendrogram' non-recursive still gives "node stack overflow".

Some differences from the original: this uses 'vapply'; for 'setmid' in 'midcache.dendrogram', there is a place where this uses 'r' and the original uses 'd'.
Comment 3 Suharto Anggono 2016-02-06 10:15:14 UTC
Created attachment 2018 [details]
Patch to dendrogram.R that makes 'oV' and 'setmid' non-recursive
Comment 4 Martin Maechler 2016-02-08 20:34:36 UTC
(In reply to Suharto Anggono from comment #3)
> Created attachment 2018 [details]
> Patch to dendrogram.R that makes 'oV' and 'setmid' non-recursive

Thank you Suharto,  that patch looks good to me, and I have applied it locally already. Plan is to commit it tomorrow
Comment 5 Martin Maechler 2016-02-09 07:53:01 UTC
Now fixed in R-devel and R 3.2.3 patched.