Bugzilla – Bug 14105

dchisq tail accuracy

Last modified: 2010-04-11 14:17:36 UTC

From: Jerry.Lewis@biogenidec.com Full_Name: Jerry W. Lewis Version: 2.10.0 OS: Windows XP Professional Submission from: (NULL) (198.180.131.16) dchisq is inaccurate in the extreme tails. For instance, dchisq(1510,2,952) returns 4.871004e-18 which is almost 15 times smaller than the correct value of 7.053889e-17. A better approach for ncp>0 would be besselI(sxn,d2,TRUE) * (x/ncp)^(d2/2) * exp(sxn-(x+ncp)/2)/2 where logs may prevent over/underflow in extreme calculations.

From: Jerry Lewis <jerry.lewis@biogenidec.com> The undefined variables in the original post are d2 <- df/2-1 sxn <- sqrt(x*ncp) Jerry [[alternative HTML version deleted]]

Audit (from Jitterbug): Thu Dec 3 13:10:01 2009 ripley moved from incoming to Accuracy

From: Jerry Lewis <jerry.lewis@biogenidec.com> The issue seems to be that the infinite sum is truncated too early when x is in the extreme upper tail. An easily validated improvement to to dnchisq.c would be to add an additional requirement in the upper tail while condition, that the summation should continue while the additive term remains too big a fraction of the saddlepoint density. Here is replacement code that partially offsets the time required for addtional terms in the sum by calculating x*ncp2 only once instead of repeating it in every iteration of both loops. x2 = x * ncp2 /* upper tail */ term = mid; df = dfmid; i = imax; s = 0.5 - (dx+sqrt(dx*dx+4*ncp/x))*0.25 /* saddlepoint */ s2 = 1-2*s termlimit = exp(ncp*s/s2 - log(s2)*df*0.5 - s*x - log((df+2*ncp+4*ncp*s/s2)/(s2*s2)*pi) - log(2)*2**54) /* saddlepoint density * 2^-53 */ do { i++; q = x2 / i / df; df += 2; term *= q; sum += term; } while (q >= 1 || term * q > (1-q)*eps || term > termlimit); /* lower tail */ term = mid; df = dfmid; i = imax; while ( i ){ df -= 2; q = i * df / x2; i--; term *= q; sum += term; if (q < 1 && term * q <= (1-q)*eps) break; } Jerry [[alternative HTML version deleted]]

Patches should at least be valid C: this is not. dx is undefined, no reference is given to find out what might have been meant. Surely it is basic knowledge that compilers are very good at extracting invariant expressions from loops: no evidence is given of the purported performance improvement by doing so manually. A much simpler fix seems to work well enough, and has been put in 2.12.0.