Bug 15648 - as.hclust after merging dendrograms leads to an incorrect hclust object
Summary: as.hclust after merging dendrograms leads to an incorrect hclust object
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Analyses (show other bugs)
Version: R 3.0.2
Hardware: x86_64/x64/amd64 (64-bit) Linux
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2014-01-27 13:10 UTC by Jonas Klasen
Modified: 2015-12-14 13:47 UTC (History)
2 users (show)

See Also:


Attachments
Patch to merge.dendrogram (2.99 KB, patch)
2014-09-17 21:03 UTC, Greg Warnes
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jonas Klasen 2014-01-27 13:10:30 UTC
### R:
a <- matrix(runif(100), 10, 10)
colnames(a) <- rownames(a) <- paste0("A", 1:10)
a.dist <- as.dist(a)
den.a <- as.dendrogram(hclust(a.dist))

b <- matrix(runif(100), 10, 10)
colnames(b) <- rownames(b) <- paste0("B", 1:10)
b.dist <- as.dist(b)
den.b <- as.dendrogram(hclust(b.dist))

d.ab <- merge(den.a, den.b)

# if the merged dendrogram is transformed to hclust, 'order' etc is wrong
hc.ab <- as.hclust(d.ab)
unclass(hc.ab)
Comment 1 Jonas Klasen 2014-01-29 14:44:25 UTC
The problem in within merge.dendrogram. The indexes are not updated. 
This fix (indicated by ### FIX) worked for me.




merge.dendrogram <- function(x, y, ..., height) {
  stopifnot(inherits(x,"dendrogram"), inherits(y,"dendrogram"))
  
  ### FIX
  inx.add <- function(inx, add) {
    if(is.leaf(inx)) {
      inx <- inx + add
    }
    return(inx)
  }
  y <- dendrapply(y,  inx.add, add=max(unlist(x)))
  ### FIX
  
  r <- list(x,y)
  if(length(xtr <- list(...))) {
    xpr <- substitute(c(...))
    if(!all(is.d <- vapply(xtr, inherits, NA, what="dendrogram"))) {
      nms <- sapply(xpr[-1][!is.d], deparse, nlines = 1L)
      ## do not simplify: xgettext needs this form
      msg <- ngettext(length(nms),
                      "extra argument %s is not of class \"%s\"",
                      "extra arguments %s are not of class \"%s\"s")
      stop(sprintf(msg, paste(nms, collapse=", "), "dendrogram"),
           domain = NA)
    }
    r <- c(r, xtr)
  }
  attr(r, "members") <- sum(vapply(r, attr, 0L, which="members"))
  h.max <- max(vapply(r, attr, 0., which="height"))
  if(missing(height) || is.null(height))
    height <- 1.1 * h.max
  else if(height < h.max) {
    msg <- gettextf("'height' must be at least %g, the maximal height of its components", h.max)
    stop(msg, domain = NA)
  }
  attr(r, "height") <- height
  class(r) <- "dendrogram"
  midcache.dendrogram(r, quiet=TRUE)
}
Comment 2 Greg Warnes 2014-09-17 21:03:10 UTC
I've also just hit the bug with merge.dendrogram not properly updating indexes.  

Jonas' code change doesn't go quite far enough, since it doesn't handle the case when there more than two dendrograms are being merged.   Below is the revised function, which resolves this.  I'm also uploading a patch against r-devel as of 2014-09-17:

merge.dendrogram <- function(x, y, ..., height) {
  stopifnot(inherits(x,"dendrogram"), inherits(y,"dendrogram"))

  ### FIX
  inx.add <- function(inx, add) {
    if(is.leaf(inx)) {
      inx <- inx + add
    }
    return(inx)
  }
  y <- dendrapply(y,  inx.add, add=max(unlist(x)))
  ### FIX

  r <- list(x,y)
  if(length(xtr <- list(...))) {
    if(!all(is.d <- vapply(xtr, inherits, NA, what="dendrogram"))) {
        xpr <- substitute(c(...))
        nms <- sapply(xpr[-1][!is.d], deparse, nlines = 1L)
        ## do not simplify: xgettext needs this form
        msg <- ngettext(length(nms),
                        "extra argument %s is not of class \"%s\"",
                        "extra arguments %s are not of class \"%s\"s")
        stop(sprintf(msg, paste(nms, collapse=", "), "dendrogram"),
             domain = NA)
    }
    ## <GRW>
    for(i in 1:length(xtr))
        {
            add <- max(c(unlist(r), unlist(xtr)))
            print(add)
            xtr[[i]] <- dendrapply(xtr[[i]], inx.add, add=add)
        }
    ## </GRW>
    r <- c(r, xtr)
  }
  attr(r, "members") <- sum(vapply(r, attr, 0L, which="members"))
  h.max <- max(vapply(r, attr, 0., which="height"))
  if(missing(height) || is.null(height))
    height <- 1.1 * h.max
  else if(height < h.max) {
    msg <- gettextf("'height' must be at least %g, the maximal height of its components", h.max)
    stop(msg, domain = NA)
  }
  attr(r, "height") <- height
  class(r) <- "dendrogram"
  midcache.dendrogram(r, quiet=TRUE)
}
Comment 3 Greg Warnes 2014-09-17 21:03:55 UTC
Created attachment 1661 [details]
Patch to merge.dendrogram
Comment 4 Martin Maechler 2014-09-18 12:48:04 UTC
Thank you, Jonas and Greg  for both your patches.
My change to merge.dendrogram used both, after simplification (and renaming) of you inx.add().
Greg's patch was not quite correct because the max(.) has to be taken sequentially.  

The patch goes to both R-patched and R-devel.

Thanks once more!
Martin