Bug 15635 - pchisq(1e-5, 100, 1) == 0 due to underflow in dpois_raw
Summary: pchisq(1e-5, 100, 1) == 0 due to underflow in dpois_raw
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Accuracy (show other bugs)
Version: R-devel (trunk)
Hardware: Other Other
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2014-01-10 01:03 UTC by Roby Joehanes
Modified: 2015-12-14 13:46 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Roby Joehanes 2014-01-10 01:03:23 UTC
When pchisq(1e-5, 100, 1, T, F) is invoked, the code goes through pgamma_smallx, which subsequently invokes dpois_raw. However, near the end of dpois_raw invocation:

M_2PI*x == 314.1592653589793
-stirlerr(x)-bd0(x,lambda) == -755.9064541923631

The code then executes exp(-stirlerr(x)-bd0(x,lambda)) / sqrt(M_2PI*x) which causes underflow error. The invocation returns to pgamma_smallx, with f2 == 0. The subroutine returns the value of f1 * f2, which is 0. This causes the for loop in pnchisq_raw to be always zero (since the largest term of sum is 0).

pchisq(1e-5, 100, 1, log=T) also yielded -Inf. Is there any way to get a good answer, at least in the log scale? Thanks.
Comment 1 Martin Maechler 2014-01-13 22:07:34 UTC
(In reply to Roby Joehanes from comment #0)
> When pchisq(1e-5, 100, 1, T, F) is invoked, the code goes through
> pgamma_smallx, which subsequently invokes dpois_raw. However, near the end
> of dpois_raw invocation:
> 
> M_2PI*x == 314.1592653589793
> -stirlerr(x)-bd0(x,lambda) == -755.9064541923631
> 
> The code then executes exp(-stirlerr(x)-bd0(x,lambda)) / sqrt(M_2PI*x) which
> causes underflow error. The invocation returns to pgamma_smallx, with f2 ==
> 0. The subroutine returns the value of f1 * f2, which is 0. This causes the
> for loop in pnchisq_raw to be always zero (since the largest term of sum is
> 0).
> 
> pchisq(1e-5, 100, 1, log=T) also yielded -Inf. Is there any way to get a
> good answer, at least in the log scale? Thanks.

Your analysis is correct... and you are right that it would be desirable to be better on the log scale, as empirically, very clearly   
        log F(x, df, ncp)  ~=  A + B * log(x)   (asymptotically, x -> 0)
I had been investigating this for hours 
several years ago.  I hadn't come to a conclusion then about an accurate approximation for  A  and B  above which must be functions of (ncp,df).

Alternatively,  we should use  "logspace_add()" in  (line 101 of pnchisq.c):

	    sum += pr * pchisq(x, f+2*i, lower_tail, FALSE);

i.e. work with  log(pr)  & pchisq(...., TRUE)
as the set of i is maximally in  0:110  we could just store things then add when see where it's maximal.... so after all,
it's just a programming exercise...

patches are welcome... {or I will get to it eventually .. not very soon though}.
Comment 2 Roby Joehanes 2014-01-14 18:24:41 UTC
It looks like the modification is non-trivial. Essentially, pnchisq_raw needs an additional Rboolean log_p parameter on it. Then, the entire function needs to be adjusted. I am not very familiar to the "if (theta >= 80)" branch yet. On top of that, qnchisq needs to be modified to afford the new parameter.

The code inside if(theta < 80) branch is modified as follows (it works):

if (theta < 80) {
	LDOUBLE sum = ML_NEGINF, sum2 = ML_NEGINF, lambda = 0.5*theta, pr = -lambda, ans;
	/* we need to renormalize here: the result could be very close to 1 */
	for(int i = 0; i < 110;  pr += LOG(lambda) - LOG(++i)) {
		sum2 = logspace_add(sum2, pr);
		sum = logspace_add(sum, pr + pchisq(x, f+2*i, lower_tail, TRUE));
		if (EXP(sum2) >= 1-1e-15) break;
	}
	ans = sum - sum2;
	return log_p ? ((double) ans) : EXP(ans);
}

And, on top:
#ifdef HAVE_LONG_DOUBLE
# define EXP expl
# define FABS fabsl
# define LOG logl
#else
# define EXP exp
# define FABS fabs
# define LOG log
#endif
Comment 3 Martin Maechler 2014-04-08 10:42:56 UTC
My solution was to only use  logspace arithmetic when necessary
as that is slower and less accurate when entirely unneeded,
and for back compatibility in cases where there is no "bad" underflow.

svn rev 65384
Comment 4 Roby Joehanes 2014-04-11 22:29:14 UTC
Hi Martin: Thank you. I could not find your fix in the 3.1.0 release source. Is there a problem (regression)? Thanks.
Comment 5 Peter Dalgaard 2014-04-12 07:06:57 UTC
3.1.0 was in code freeze when Martin fixed this. At that stage, only very serious issues can make us change the code base. It is fixed in R-devel, presumably with the intention of porting it to the release branch later.