Bug 16332 - pbeta(0.555555, 1925.74, 33.7179, log=TRUE) returns -Inf
Summary: pbeta(0.555555, 1925.74, 33.7179, log=TRUE) returns -Inf
Status: ASSIGNED
Alias: None
Product: R
Classification: Unclassified
Component: Accuracy (show other bugs)
Version: R 3.2.0
Hardware: All All
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2015-04-24 18:54 UTC by cossio
Modified: 2015-04-28 17:41 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description cossio 2015-04-24 18:54:46 UTC
This is a numeric bug with the pbeta function with log=TRUE. I only have one example where it is broken, but probably the problem goes deeper.

Executing:

pbeta(0.555555, 1925.74, 33.7179, log=TRUE)

returns 

-Inf

The correct answer is -1165.999736.
Comment 1 cossio 2015-04-24 18:55:58 UTC
Moreover, there's no warning of numeric error, or overflow or anything.
Comment 2 Martin Maechler 2015-04-27 08:37:17 UTC
Yes, the underlying algorithm is not good enough (also is inaccurate in a vicinity) as it has not been written for the 'log=TRUE' case at all.

How did you compute the "correct" answer?
Comment 3 cossio 2015-04-28 13:22:49 UTC
I used Mathematica to compute the "correct" asnwer, which is supposed to have arbitrary precision. Too bad it is not open source.
Comment 4 cossio 2015-04-28 14:02:42 UTC
Note that the "correct" answer computed above with Mathematica refers to the unregularized Beta function. pbeta is supposed to compute the regularized Beta function, so you have to subtract/add the logarithm of B(a,b) to translate Mathematica value's to R's and viceversa.
Comment 5 cossio 2015-04-28 14:46:19 UTC
What's the problem with just using 

log(Beta(x,a,b)) = log(hyperg_2F1(a+b,1,a+1,x)) + a*log(x)+b*log(1-x)-log(a) 

or another identity involving hypergeometric functions, where hyperg_2F1 refers to an implementation of the Hypergeometric function, which I assume is available?
Comment 6 Ben Bolker 2015-04-28 16:35:41 UTC
Just pointing out that this has been discussed further at http://stats.stackexchange.com/questions/148192/logarithm-of-incomplete-beta-function-for-large-alpha-beta

and Stéphane Laurent has proposed a solution using the GSL interface:

library(gsl)
logpbeta <- function(x,a,b) log(hyperg_2F1(a+b,1,a+1,x)) +
                             a*log(x)+b*log(1-x)-log(a) - lbeta(a,b)

Maybe not easy to implement within base R, but at least it's open source ...

(I can reproduce on MacOS 10.6 64-bit version 3.1.2, but not on 32-bit Ubuntu r-devel from 24 April 2015 ...)
Comment 7 Martin Maechler 2015-04-28 17:41:28 UTC
(In reply to Ben Bolker from comment #6)
> Just pointing out that this has been discussed further at
> http://stats.stackexchange.com/questions/148192/logarithm-of-incomplete-beta-
> function-for-large-alpha-beta
> 
> and Stéphane Laurent has proposed a solution using the GSL interface:
> 
> library(gsl)
> logpbeta <- function(x,a,b) log(hyperg_2F1(a+b,1,a+1,x)) +
>                              a*log(x)+b*log(1-x)-log(a) - lbeta(a,b)
> 
> Maybe not easy to implement within base R, but at least it's open source ...
> 
> (I can reproduce on MacOS 10.6 64-bit version 3.1.2, but not on 32-bit
> Ubuntu r-devel from 24 April 2015 ...)

Thank you, Ben.

Well, it's definitely not as easy as the above function definition
(losing precision for extreme x very small),
but that's a good pointer.

Further: ?pbeta (-> "Source") mentions the algorithm, the famous TOMS 788, i.e.  Didonato, A. and Morris, A., Jr, (1992) 

which was not at all written to work on the log-scale. Since then, we have taken that, translated to C, and added improvements in many cases for the log-scale case. 
The current example is one where we are using an algorithm which can suffer from bad cancellation on the log-scale.