When pchisq(1e-5, 100, 1, T, F) is invoked, the code goes through pgamma_smallx, which subsequently invokes dpois_raw. However, near the end of dpois_raw invocation: M_2PI*x == 314.1592653589793 -stirlerr(x)-bd0(x,lambda) == -755.9064541923631 The code then executes exp(-stirlerr(x)-bd0(x,lambda)) / sqrt(M_2PI*x) which causes underflow error. The invocation returns to pgamma_smallx, with f2 == 0. The subroutine returns the value of f1 * f2, which is 0. This causes the for loop in pnchisq_raw to be always zero (since the largest term of sum is 0). pchisq(1e-5, 100, 1, log=T) also yielded -Inf. Is there any way to get a good answer, at least in the log scale? Thanks.

(In reply to Roby Joehanes from comment #0) > When pchisq(1e-5, 100, 1, T, F) is invoked, the code goes through > pgamma_smallx, which subsequently invokes dpois_raw. However, near the end > of dpois_raw invocation: > > M_2PI*x == 314.1592653589793 > -stirlerr(x)-bd0(x,lambda) == -755.9064541923631 > > The code then executes exp(-stirlerr(x)-bd0(x,lambda)) / sqrt(M_2PI*x) which > causes underflow error. The invocation returns to pgamma_smallx, with f2 == > 0. The subroutine returns the value of f1 * f2, which is 0. This causes the > for loop in pnchisq_raw to be always zero (since the largest term of sum is > 0). > > pchisq(1e-5, 100, 1, log=T) also yielded -Inf. Is there any way to get a > good answer, at least in the log scale? Thanks. Your analysis is correct... and you are right that it would be desirable to be better on the log scale, as empirically, very clearly log F(x, df, ncp) ~= A + B * log(x) (asymptotically, x -> 0) I had been investigating this for hours several years ago. I hadn't come to a conclusion then about an accurate approximation for A and B above which must be functions of (ncp,df). Alternatively, we should use "logspace_add()" in (line 101 of pnchisq.c): sum += pr * pchisq(x, f+2*i, lower_tail, FALSE); i.e. work with log(pr) & pchisq(...., TRUE) as the set of i is maximally in 0:110 we could just store things then add when see where it's maximal.... so after all, it's just a programming exercise... patches are welcome... {or I will get to it eventually .. not very soon though}.

It looks like the modification is non-trivial. Essentially, pnchisq_raw needs an additional Rboolean log_p parameter on it. Then, the entire function needs to be adjusted. I am not very familiar to the "if (theta >= 80)" branch yet. On top of that, qnchisq needs to be modified to afford the new parameter. The code inside if(theta < 80) branch is modified as follows (it works): if (theta < 80) { LDOUBLE sum = ML_NEGINF, sum2 = ML_NEGINF, lambda = 0.5*theta, pr = -lambda, ans; /* we need to renormalize here: the result could be very close to 1 */ for(int i = 0; i < 110; pr += LOG(lambda) - LOG(++i)) { sum2 = logspace_add(sum2, pr); sum = logspace_add(sum, pr + pchisq(x, f+2*i, lower_tail, TRUE)); if (EXP(sum2) >= 1-1e-15) break; } ans = sum - sum2; return log_p ? ((double) ans) : EXP(ans); } And, on top: #ifdef HAVE_LONG_DOUBLE # define EXP expl # define FABS fabsl # define LOG logl #else # define EXP exp # define FABS fabs # define LOG log #endif

My solution was to only use logspace arithmetic when necessary as that is slower and less accurate when entirely unneeded, and for back compatibility in cases where there is no "bad" underflow. svn rev 65384

Hi Martin: Thank you. I could not find your fix in the 3.1.0 release source. Is there a problem (regression)? Thanks.

3.1.0 was in code freeze when Martin fixed this. At that stage, only very serious issues can make us change the code base. It is fixed in R-devel, presumably with the intention of porting it to the release branch later.