rchisq produces NaN, and a warning that NAs are produced, on zero ncp and df: > rchisq(1,df=0,ncp=0) [1] NaN Warning message: In rchisq(1, df = 0, ncp = 0) : NAs produced The correct answer is produced when ncp is not given: > rchisq(1,df=0) [1] 0 I am guessing fixing this line in nmath/rnchisq.c (sorry, no line number) would do the trick: if (df == 0.) ML_ERR_return_NAN; It should probably be: if (df == 0.) return 0.;

you are perfectly right. I will fix this. As I see there seem related "problems" with the density also giving NaN much too quickly. Thank you for your report and analysis! Martin

(Be careful with the density, Martin. That one really is undefined if df==0.)

(In reply to Peter Dalgaard from comment #2) > (Be careful with the density, Martin. That one really is undefined if df==0.) Ok, let's "think" together in public: I would argue for the moment that this depends on "the kind of mathematics" you apply here. In R, we've always included +- Inf in our number range and tried to not give NaN's in cases where limits exist unambigously. So in this case, I'm looking at the case ncp > 0, lim_{ncp -> 0}, for fixed df == 0. By that argument the density of chisq_{ncp = 0, df = 0} is a point mass at zero. Consequently, dchisq(x, df=0, ncp=0) would return Inf for x=0 and 0 otherwise. *AND* I get the same limit for the {df > 0, ncp = 0, lim_{df -> 0} } branch. In this sense, I don't see any NaN return values here. Did I overlook something?

We've been here before, haven't we? A density for the df=0 case has to be with respect to the sum of Lebesgue measure on (0, infty) and a point measure at 0. This could in principle be implemented as f(x), x>=0, with f(0)=P(X==0) P(X > 0)=int_0^infty f(x) dx but I believe we refrained from doing it that way since the potential for confusion was too large. ("Oi! your density doesn't integrate to 1") For the df=0, ncp=0 case, the continuous component vanishes and we just have a unit point mass at 0. That could be interpreted as a discrete distribution and we would then just let the density be 1, but again, I think it would confuse users.

(In reply to Peter Dalgaard from comment #4) > We've been here before, haven't we? Well, yes, I vaguely recollect similar discussion > > A density for the df=0 case has to be with respect to the sum of Lebesgue > measure on (0, infty) and a point measure at 0. Yes, you are right, that is one possibility, and maybe the best from a pure math point of view. OTOH, it is one possibility of defining behavior.... what I paraphrased as 'depends on the "kind of mathematics"' .. Another priniciple was the one which I emphasized: Namely that for border cases, continuity is very very useful. E.g. for optimization (e.g. Maximum Likelihood) you are glad if functions involved do not easily give NaN but rather are continuous. > > This could in principle be implemented as f(x), x>=0, with > > f(0)=P(X==0) > P(X > 0)=int_0^infty f(x) dx > > but I believe we refrained from doing it that way since the potential for > confusion was too large. ("Oi! your density doesn't integrate to 1") > > For the df=0, ncp=0 case, the continuous component vanishes and we just have > a unit point mass at 0. Yes, I think I also said that .. or implied. The same happens with the gamma when the scale goes to zero, and there we know of real problems {discrete double precision numbers, etc}, But we also don't revert to NaN's More closer to home you may remember that we distinguish between dchisq(x, df) and dchisq(x, df, ncp=0) {using the two different algorithms..} but typically want to get the same results. Here, we see > dchisq((0:4)/4, df = 1e-322) [1] Inf 1.778636e-322 7.905050e-323 4.446591e-323 2.964394e-323 > dchisq((0:4)/4, df = 1e-325) [1] Inf 0 0 0 0 > dchisq((0:4)/4, df = 1e-322, ncp= 0) [1] Inf 1.778636e-322 7.905050e-323 4.446591e-323 2.964394e-323 > dchisq((0:4)/4, df = 1e-325, ncp= 0) [1] NaN NaN NaN NaN NaN Warning message: In dchisq((0:4)/4, df = 0, ncp = 0) : NaNs produced > and I just want the last one to return the much more natural Inf 0 0 0 0 > That could be interpreted as a discrete distribution > and we would then just let the density be 1, but again, I think it would > confuse users. I agree on that. For a mixed (discrete-continuous) distribution, where the discrete part is just a boarder case, it does not make sense to change the "simple-minded switch" (between density and discrete probabilities) into a density with respect ... measure interpreation

The main issue to me is that a limiting function like function(x) ifelse (x==0, Inf, 0) is not a satisfactory representation of a point mass at 0. Depending on your choice of integration theory, the integral can be various things: 0, Inf, NaN, but usually not 1. It certainly has no way of distinguishing between point masses of p for varying values of p. It's not really an argument that other things misbehave under limit operations due to numerical issues, is it? I see > dchisq(2.4703e-324,df=1e-325) [1] Inf > dchisq(2.4704e-324,df=1e-325) [1] 0 so it actually isn't a density... (Not that I think it is something that we can be expected to fix.) I think the only consistent way out would be to return the base measure as an attribute of the result. That wouldn't in itself be too hard, but since densities are used all over the place, it seems unavoidable that we would get in trouble with some applications.

(In reply to Peter Dalgaard from comment #6) > The main issue to me is that a limiting function like > > function(x) ifelse (x==0, Inf, 0) > > is not a satisfactory representation of a point mass at 0. Depending on your > choice of integration theory, the integral can be various things: 0, Inf, > NaN, but usually not 1. It certainly has no way of distinguishing between > point masses of p for varying values of p. > > It's not really an argument that other things misbehave under limit > operations due to numerical issues, is it? Well, yes, I'd say it is: Because of the limitation of computer arithmetic (and also because d*() functions have been used to always just return numeric vectors , and not further information ... see your point below), many of our d*() functions are not densities mathematically .. only just because of underflow to 0 and overflow to Inf. These d*() functions should be "best" approximations to the true underlying mathematical density function.... and these best approximations sometimes must be pretty far from the true because of the inherent limitations... and we use the log=TRUE option as one way to come closer to the truth at least for the under- and over-flow cases. > I see > > > dchisq(2.4703e-324,df=1e-325) > [1] Inf > > dchisq(2.4704e-324,df=1e-325) > [1] 0 > > so it actually isn't a density... (Not that I think it is something that we > can be expected to fix.) > Indeed, all our d*() are not densities if you look at it with a pure math eye. > > I think the only consistent way out would be to return the base measure as > an attribute of the result. That wouldn't in itself be too hard, but since > densities are used all over the place, it seems unavoidable that we would > get in trouble with some applications. yes, ... and I don't even know if we'd help solving a real problem. I think for those who want distribution function objects much richer and closer to the underlying math, there's the excellent "distr" collection of CRAN packages.