Bug 14105 - dchisq tail accuracy
dchisq tail accuracy
Status: RESOLVED FIXED
Product: R
Classification: Unclassified
Component: Accuracy
old
ix86 (32-bit) Windows 32-bit
: P5 normal
Assigned To: Jitterbug compatibility account
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-12-02 22:51 UTC by Jitterbug compatibility account
Modified: 2010-04-11 14:17 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jitterbug compatibility account 2009-12-02 22:51:06 UTC
From: Jerry.Lewis@biogenidec.com
Full_Name: Jerry W. Lewis
Version: 2.10.0
OS: Windows XP Professional
Submission from: (NULL) (198.180.131.16)


dchisq is inaccurate in the extreme tails.  For instance, dchisq(1510,2,952)
returns 4.871004e-18 which is almost 15 times smaller than the correct value of
7.053889e-17.  A better approach for ncp>0 would be
  besselI(sxn,d2,TRUE) * (x/ncp)^(d2/2) * exp(sxn-(x+ncp)/2)/2
where logs may prevent over/underflow in extreme calculations.

Comment 1 Jitterbug compatibility account 2009-12-02 23:14:45 UTC
From: Jerry Lewis <jerry.lewis@biogenidec.com>
The undefined variables in the original post are
   d2 <- df/2-1
   sxn <- sqrt(x*ncp)

Jerry
	[[alternative HTML version deleted]]

Comment 2 Jitterbug compatibility account 2009-12-03 19:10:01 UTC
Audit (from Jitterbug):
Thu Dec  3 13:10:01 2009	ripley	moved from incoming to Accuracy
Comment 3 Jitterbug compatibility account 2010-01-04 02:54:51 UTC
From: Jerry Lewis <jerry.lewis@biogenidec.com>
The issue seems to be that the infinite sum is truncated too early when x 
is in the extreme upper tail.  An easily validated improvement to to 
dnchisq.c would be to add an additional requirement in the upper tail 
while condition, that the summation should continue while the additive 
term remains too big a fraction of the saddlepoint density.  Here is 
replacement code that partially offsets the time required for addtional 
terms in the sum by calculating x*ncp2 only once instead of repeating it 
in every iteration of both loops.

    x2 = x * ncp2
    /* upper tail */
    term = mid; df = dfmid; i = imax;
    s = 0.5 - (dx+sqrt(dx*dx+4*ncp/x))*0.25  /* saddlepoint */
    s2 = 1-2*s
    termlimit = exp(ncp*s/s2 - log(s2)*df*0.5 - s*x - 
log((df+2*ncp+4*ncp*s/s2)/(s2*s2)*pi) - log(2)*2**54) /* saddlepoint 
density * 2^-53 */
    do {
        i++;
        q = x2 / i / df;
        df += 2;
        term *= q;
        sum += term;
    } while (q >= 1 || term * q > (1-q)*eps || term > termlimit);
    /* lower tail */
    term = mid; df = dfmid; i = imax;
    while ( i ){
        df -= 2;
        q = i * df / x2;
        i--;
        term *= q;
        sum += term;
        if (q < 1 && term * q <= (1-q)*eps) break;
    }

Jerry
	[[alternative HTML version deleted]]

Comment 4 Brian Ripley 2010-04-11 14:17:36 UTC
Patches should at least be valid C: this is not.

dx is undefined, no reference is given to find out what might have been meant.

Surely it is basic knowledge that compilers are very good at
extracting invariant expressions from loops: no evidence is given of
the purported performance improvement by doing so manually.

A much simpler fix seems to work well enough, and has been put in 2.12.0.