This is a numeric bug with the pbeta function with log=TRUE. I only have one example where it is broken, but probably the problem goes deeper. Executing: pbeta(0.555555, 1925.74, 33.7179, log=TRUE) returns -Inf The correct answer is -1165.999736.

Moreover, there's no warning of numeric error, or overflow or anything.

Yes, the underlying algorithm is not good enough (also is inaccurate in a vicinity) as it has not been written for the 'log=TRUE' case at all. How did you compute the "correct" answer?

I used Mathematica to compute the "correct" asnwer, which is supposed to have arbitrary precision. Too bad it is not open source.

Note that the "correct" answer computed above with Mathematica refers to the unregularized Beta function. pbeta is supposed to compute the regularized Beta function, so you have to subtract/add the logarithm of B(a,b) to translate Mathematica value's to R's and viceversa.

What's the problem with just using log(Beta(x,a,b)) = log(hyperg_2F1(a+b,1,a+1,x)) + a*log(x)+b*log(1-x)-log(a) or another identity involving hypergeometric functions, where hyperg_2F1 refers to an implementation of the Hypergeometric function, which I assume is available?

Just pointing out that this has been discussed further at http://stats.stackexchange.com/questions/148192/logarithm-of-incomplete-beta-function-for-large-alpha-beta and Stéphane Laurent has proposed a solution using the GSL interface: library(gsl) logpbeta <- function(x,a,b) log(hyperg_2F1(a+b,1,a+1,x)) + a*log(x)+b*log(1-x)-log(a) - lbeta(a,b) Maybe not easy to implement within base R, but at least it's open source ... (I can reproduce on MacOS 10.6 64-bit version 3.1.2, but not on 32-bit Ubuntu r-devel from 24 April 2015 ...)

(In reply to Ben Bolker from comment #6) > Just pointing out that this has been discussed further at > http://stats.stackexchange.com/questions/148192/logarithm-of-incomplete-beta- > function-for-large-alpha-beta > > and Stéphane Laurent has proposed a solution using the GSL interface: > > library(gsl) > logpbeta <- function(x,a,b) log(hyperg_2F1(a+b,1,a+1,x)) + > a*log(x)+b*log(1-x)-log(a) - lbeta(a,b) > > Maybe not easy to implement within base R, but at least it's open source ... > > (I can reproduce on MacOS 10.6 64-bit version 3.1.2, but not on 32-bit > Ubuntu r-devel from 24 April 2015 ...) Thank you, Ben. Well, it's definitely not as easy as the above function definition (losing precision for extreme x very small), but that's a good pointer. Further: ?pbeta (-> "Source") mentions the algorithm, the famous TOMS 788, i.e. Didonato, A. and Morris, A., Jr, (1992) which was not at all written to work on the log-scale. Since then, we have taken that, translated to C, and added improvements in many cases for the log-scale case. The current example is one where we are using an algorithm which can suffer from bad cancellation on the log-scale.