Created attachment 2241 [details] Details on the bug in nlm() which results from a wrong Cholesky decomposition of the analytic Hessian Hi, I would like to report a bug in nlm(). The following erroneous behavior was observed in several test settings: nlm() converges if only an analytic gradient is specified by the user, but it does not converge if additionally the analytic Hessian is provided. This is due to an erroneous implementation of a modified Cholesky decomposition of the analytic Hessian in function choldc() in uncmin.c. To see the nonconvergence, run this example of optimization of the Rosenbrock banana valley function (as in demo(nlm)): fg <- function(x){ # analytic gradient only gr <- function(x1, x2) c(-400*x1*(x2 - x1*x1)-2*(1-x1), 200*(x2 - x1*x1)) x1 <- x[1]; x2 <- x[2] res<- 100*(x2 - x1*x1)^2 + (1-x1)^2 attr(res, "gradient") <- gr(x1, x2) return(res) } nlm.fg <- nlm(fg, c(-1.2, 1)) fgh <- function(x){ #analytic gradient and Hessian gr <- function(x1, x2) c(-400*x1*(x2 - x1*x1) - 2*(1-x1), 200*(x2 - x1*x1)) h <- function(x1, x2){ a11 <- 2 - 400*x2 + 1200*x1*x1 a21 <- -400*x1 matrix(c(a11, a21, a21, 200), 2, 2) } x1 <- x[1]; x2 <- x[2]; res<- 100*(x2 - x1*x1)^2 + (1-x1)^2 attr(res, "gradient") <- gr(x1, x2) attr(res, "hessian") <- h(x1, x2) return(res) } nlm.fgh <- nlm(fgh, c(-1.2,1)) If only the analytic gradient is provided, the algorithm terminates after 24 iterations. If additionally the analytic Hessian is provided, the algorithm terminates after the maximum number of 100 iterations at a point far from the optimum. I found that this is due to the bug in choldc() in uncmin.c. A more detailed explanation is attached. Best wishes, Marie

Thank you for the report! What you write (in the attached 11 page pdf report) makes much sense to me as reader. However your remedy R functions chlhsnDS() and choldcDS() are - as you acknowledge more or less - quite a bit further away from the current code than what would strictly be needed, right? ... notably as I think I understood that the current code in "only" wrong in choldc() and not in chlhsn(). OTOH Dennis and Schnabel (1996) is almost surely more recent than what was the basis of nlm() and it might be advantageous to do the switch of algorithm details (and from rowwise to columnwise, in choldc()). OTOH, I'd like to be cautious in a change and hence, ideally I'd like to see a *minimal* change, i.e. only change the *order* of filling the entries (first H_ij for j < i, then H_ii) in the 2nd part of choldc(). Shouldn't that be already sufficient? If you provided your R functions in 2nd attachment, your report would be reproducible and we could experiment with such a minimal change to choldc() only. Thank you once more for the very detailed report!

Created attachment 2242 [details] R functions used in the report

Thank you for taking a closer look at the report. It's true that the actual error is only in choldc(), so one might start with the minimal change you suggest (changing the order of filling the entries). However, it seems that the checks for positive definiteness in chlhsn() also differ a bit (e.g., Dennis and Schnabel (1996) additionally check for large off-diagonal elements) and I haven't examined what effect this has... I attached an R script with the code from the report, so you can experiment. (In reply to Martin Maechler from comment #1)

Thank you. I've found the following (already last weekend): Fixing choldc() alone, by reversing the two tasks at the end (filling 'L'), choldc() itself then does work correctly. ==> The Rosenbrock banana example _also_ works for the "+Hessian" case. but interestingly (or not) it does *not* converge faster for the example. Rather it uses also 24 iteration and even ends a bit further away (but still close to) the solution (1 1). This coincides BTW with a finding by Doug Bates about 8 years ago that using very sophisticated analytical derivatives (I think that were even only gradients) for LMM (Linear mixed effect models) did not help much .. and in his case even took Unfortunately that does *not* solve the wrong convergence for the 4d Wood example.... and I have used `deriv3()` to check that indeed your manually derived Hessian function computes the same numbers as the automatic derivative from `deriv3()`. My tentative conclusion: Dennis Schnabel (1996) definitely use a better way to determine how to "add to the diagonal" when the cholesky would be deficient... and so your chlhsnDS() does work whereas chlhsn() does not... even when using a corrected choldc(). I'll append a modified version of your R script, allowing a new argument 'neworder' to choldc() and chlhsn() and neworder=TRUE uses the fixed choldc() whereas the default `neworder=FALSE` uses the old broken choldc().

Created attachment 2244 [details] Extension to Marie's R functions script - "proving" that fixing choldc() helps half way Belongs to my "Comment #4"

Created attachment 2246 [details] Extension to Marie's R functions script - "proving" that fixing choldc() helps half way Amended version of R script (which was invalid; notably _lacking_ the woodf.gh() definition, using deriv3() !)

The first (and main) part of this bug, choldc() in C code is now fixed (in R-devel, svn rev 72555). As mentioned, this solves convergence for the Rosenbrock with analytic Hessian, but not the other (4d) example.

I'm closing the bug -- with thanks to Marie Boehnstedt! The fact that Dennis & Schnabel actually propose a slightly different "perturbed cholesky" algorithm -- and one that is at least in the 4D Wood example better -- is not really anymore a bug in nlm(). I still would want to provide the alternative chlhsnDS() method as a new method in nlm() ... thanks to Marie's excellent analysis - and translation of the published algorithms to R code, but am closing the bug report - about the wrong Cholesky now.