Dear all, It seems that rhyper() gives out wrong results randomly when m << n. rhyper(nn,m,n,k) generates hypergeometric distribution random numbers. according to theory, the mean of the random numbers generated should be around m/(m+n)*k, In my example, I keep m+n=1e9,k=1e6 and changing m. Here is what I get: > # this result is reasonable, the mean is around 1. > mean(rhyper(100,1000,1e9-1000,1e6)) [1] 0.91 > # ok > mean(rhyper(100,2000,1e9-2000,1e6)) [1] 2.04 > # the mean should be around 5 > mean(rhyper(100,5000,1e9-5000,1e6)) [1] 0.64 > # the mean should be around 6 > mean(rhyper(100,6000,1e9-6000,1e6)) [1] 1.55 > # the mean should be around 8,but it gives out a 0! > mean(rhyper(100,8000,1e9-8000,1e6)) [1] 0 > # the mean should be around 9 > mean(rhyper(100,9000,1e9-9000,1e6)) [1] 0.45 > #back to normal > mean(rhyper(100,10000,1e9-10000,1e6)) [1] 10.46 > #good > mean(rhyper(100,20000,1e9-20000,1e6)) [1] 20.14 I get the first two right, but the ones in the middle are wrong, and strangely, it comes to normal again. Best regards Mia

Confirmed on OS X. Curiously, it doesn't seem to affect the inverse distribution function method: > mean(rhyper(100,8000,1e9-8000,1e6)) [1] 0 > mean(qhyper(runif(100),8000,1e9-8000,1e6)) [1] 8.48 ...which may also offer a passable workaround.

(In reply to Peter Dalgaard from comment #1) > Confirmed on OS X. Curiously, it doesn't seem to affect the inverse > distribution function method: > > > mean(rhyper(100,8000,1e9-8000,1e6)) > [1] 0 > > mean(qhyper(runif(100),8000,1e9-8000,1e6)) > [1] 8.48 > > ...which may also offer a passable workaround. Yes.... Note that we are in a situation of large integer parameters (total = 1e9 *is* large), notably for the cases the algorithms were defined for, *and* we already had patched qhyper() in the past for these. ((though it seems to me that the "large" branch there is also improvable). BTW: It *is* branch II in the rhyper() algorithm, but there is no underflow to 0 in w = exp(...); w is actually pretty large (because of 'con')

(In reply to Martin Maechler from comment #2) > (In reply to Peter Dalgaard from comment #1) > > Confirmed on OS X. Curiously, it doesn't seem to affect the inverse > > distribution function method: > > > > > mean(rhyper(100,8000,1e9-8000,1e6)) > > [1] 0 > > > mean(qhyper(runif(100),8000,1e9-8000,1e6)) > > [1] 8.48 > > > > ...which may also offer a passable workaround. > > Yes.... Note that we are in a situation of large integer parameters (total > = 1e9 *is* large), notably for the cases the algorithms were defined for, > *and* we already had patched qhyper() in the past for these. > > ((though it seems to me that the "large" branch there is also improvable). > > BTW: It *is* branch II in the rhyper() algorithm, but there is no underflow > to 0 in w = exp(...); w is actually pretty large (because of 'con') In the end, it was just an integer multiplication overflow.. Committed to R-devel and will be ported to "R 3.2.2 patched".