Bug 17249 - Bug in nlm(): wrong Cholesky decomposition of analytic Hessian
Summary: Bug in nlm(): wrong Cholesky decomposition of analytic Hessian
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Analyses (show other bugs)
Version: R 3.3.*
Hardware: Other Windows 64-bit
: P5 enhancement
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2017-03-29 12:48 UTC by Marie Boehnstedt
Modified: 2017-04-26 07:33 UTC (History)
1 user (show)

See Also:


Attachments
Details on the bug in nlm() which results from a wrong Cholesky decomposition of the analytic Hessian (202.91 KB, application/pdf)
2017-03-29 12:48 UTC, Marie Boehnstedt
Details
R functions used in the report (15.52 KB, text/x-matlab)
2017-03-31 07:30 UTC, Marie Boehnstedt
Details
Extension to Marie's R functions script - "proving" that fixing choldc() helps half way (18.92 KB, text/plain)
2017-04-10 17:09 UTC, Martin Maechler
Details
Extension to Marie's R functions script - "proving" that fixing choldc() helps half way (19.61 KB, text/x-r)
2017-04-19 12:48 UTC, Martin Maechler
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marie Boehnstedt 2017-03-29 12:48:17 UTC
Created attachment 2241 [details]
Details on the bug in nlm() which results from a wrong Cholesky decomposition of the analytic Hessian

Hi,
I would like to report a bug in nlm().

The following erroneous behavior was observed in several test settings: nlm() converges if only an analytic gradient is specified by the user, but it does not converge if additionally the analytic Hessian is provided. This is due to an erroneous implementation of a modified Cholesky decomposition of the analytic Hessian in function choldc() in uncmin.c.

To see the nonconvergence, run this example of optimization of the Rosenbrock banana valley function (as in demo(nlm)):

fg <- function(x){  # analytic gradient only
  gr <- function(x1, x2) c(-400*x1*(x2 - x1*x1)-2*(1-x1), 200*(x2 - x1*x1)) 
  x1 <- x[1]; x2 <- x[2]
  res<- 100*(x2 - x1*x1)^2 + (1-x1)^2 
  attr(res, "gradient") <- gr(x1, x2)
  return(res)
} 
nlm.fg <- nlm(fg, c(-1.2, 1))

fgh <- function(x){  #analytic gradient and Hessian
  gr <- function(x1, x2) c(-400*x1*(x2 - x1*x1) - 2*(1-x1), 200*(x2 - x1*x1)) 
  h <- function(x1, x2){
    a11 <- 2 - 400*x2 + 1200*x1*x1
    a21 <- -400*x1 
    matrix(c(a11, a21, a21, 200), 2, 2)
  } 
  x1 <- x[1];  x2 <- x[2];  res<- 100*(x2 - x1*x1)^2 + (1-x1)^2 
  attr(res, "gradient") <- gr(x1, x2)
  attr(res, "hessian") <- h(x1, x2) 
  return(res)
}
nlm.fgh <- nlm(fgh, c(-1.2,1))

If only the analytic gradient is provided, the algorithm terminates after 24 iterations. If additionally the analytic Hessian is provided, the algorithm terminates after the maximum number of 100 iterations at a point far from the optimum.
I found that this is due to the bug in choldc() in uncmin.c. A more detailed explanation is attached.

Best wishes,
Marie
Comment 1 Martin Maechler 2017-03-30 19:55:02 UTC
Thank you for the report!
What you write (in the attached 11 page pdf report) makes much sense to me as reader.

However your remedy R functions chlhsnDS() and choldcDS()  are - as you acknowledge more or less -  quite a bit further away from the current code than what would strictly be needed, right? ... notably as I think I understood that the current code in "only" wrong in choldc() and not in chlhsn().
OTOH Dennis and Schnabel (1996) is almost surely more recent than what was the basis of nlm() and it might be advantageous to do the switch of algorithm details (and from rowwise to columnwise, in choldc()).
OTOH, I'd like to be cautious in a change and hence, ideally I'd like to see a *minimal* change, i.e. only change the *order* of filling the entries  (first H_ij for j < i, then H_ii) in the 2nd part of choldc().
Shouldn't that be already sufficient?

If you provided your R functions in 2nd attachment,  your report would be reproducible and we could experiment with such a minimal change to  choldc() only.

Thank you once more for the very detailed report!
Comment 2 Marie Boehnstedt 2017-03-31 07:30:58 UTC
Created attachment 2242 [details]
R functions used in the report
Comment 3 Marie Boehnstedt 2017-03-31 07:32:50 UTC
Thank you for taking a closer look at the report.
It's true that the actual error is only in choldc(), so one might start with the minimal change you suggest (changing the order of filling the entries).
However, it seems that the checks for positive definiteness in chlhsn() also differ a bit (e.g., Dennis and Schnabel (1996) additionally check for large off-diagonal elements) and I haven't examined what effect this has...
I attached an R script with the code from the report, so you can experiment.

(In reply to Martin Maechler from comment #1)
Comment 4 Martin Maechler 2017-04-07 16:28:08 UTC
Thank you.

I've found the following (already last weekend):

Fixing  choldc() alone, by reversing the two tasks at the end (filling 'L'),
choldc() itself then does work correctly.
==> The Rosenbrock banana example _also_ works for the "+Hessian" case.
but interestingly (or not) it does *not* converge faster for the example. Rather it uses also 24 iteration and even ends a bit further away (but still close to) the solution (1 1).

This coincides BTW with a finding by Doug Bates about 8 years ago that using very sophisticated analytical derivatives (I think that were even only gradients) for LMM (Linear mixed effect models) did not help much .. and in his case even took 

Unfortunately that does *not* solve the wrong convergence for the 4d Wood example.... and I have used  `deriv3()` to check that indeed your manually derived Hessian function computes the same numbers as the automatic derivative from `deriv3()`.

My tentative conclusion: 
Dennis Schnabel (1996) definitely use a better way to determine how to "add to the diagonal" when the cholesky would be deficient... and so 
your chlhsnDS() does work whereas  chlhsn() does not... even when using a corrected choldc().

I'll append a modified version of your R script, allowing a new argument 'neworder'
to choldc() and chlhsn()  and  neworder=TRUE  uses the fixed choldc() whereas the default  `neworder=FALSE` uses the old broken choldc().
Comment 5 Martin Maechler 2017-04-10 17:09:07 UTC
Created attachment 2244 [details]
Extension to Marie's R functions script - "proving" that fixing choldc() helps half way

Belongs to my "Comment #4"
Comment 6 Martin Maechler 2017-04-19 12:48:41 UTC
Created attachment 2246 [details]
Extension to Marie's R functions script - "proving" that fixing choldc() helps half way

Amended version of R script (which was invalid; notably _lacking_ the woodf.gh() definition, using deriv3() !)
Comment 7 Martin Maechler 2017-04-19 21:42:26 UTC
The first (and main) part of this bug, choldc() in C code is now fixed (in R-devel, svn rev 72555).

As mentioned, this solves convergence for the Rosenbrock with analytic Hessian, but not the other (4d) example.
Comment 8 Martin Maechler 2017-04-26 07:33:34 UTC
I'm closing the bug -- with thanks to Marie Boehnstedt!

The fact that Dennis & Schnabel actually propose a slightly different "perturbed cholesky" algorithm -- and one that is at least in the 4D Wood example better --
is not really anymore a bug in nlm().

I still would want to provide the alternative chlhsnDS() method as a new method in nlm() ... thanks to Marie's excellent analysis - and translation of the published algorithms to R code, but am closing the bug report - about the wrong Cholesky now.