dnorm uses the obvious formula: c*exp(-0.5*x*x) ignoring mu and sigma. That is fine for moderate arguments, but "exp" has a nasty habit of blowing up rounding errors such as those for x*x when the argument is large. For example: x=35+2^-46. dnorm(x,0,1) --> 3.940396277134232207447585696367E-267 mathematica --> 3.940396277134064456725435240768312101152712969120510158e-267 That's only about 13 correct digits. That's not a catastrophe if dnorm is the end product, but dnorm is used as a building block in countless places. Suggestion for large x, non-log case: 1. Drop the sign (to ensure symmetry). 2. Split x into w+f with |f|<=1/2 3. Use c*exp(-0.5*w*w)*exp(-0.5*f*f)*exp(-w*f) This helps because w*w has no rounding error and w*f has ~5 extra bits of accuracy because it isn't lumped together with the large w*w. It's not clear what makes x large, but something like >15 is probably fine. This would be 3x slower for large x. If that is not acceptable, one could reduce it to 2x by tabulating the first exp. (With c folded in for a tiny bit of extra accuracy.)

Actually, we can do even better. dnorm underflows around x=37; with long double it would underflow around 150. So the whole part will have no more than 8 interesting bits. We can cram 26 bits into the first part without rounding error, so code like this ought to do better: x = fabs (x); if (x >= 256) { /* The main formula might produce a NaN in this case. */ return 0; } else { /* x1 will have only 24 or fewer bits. */ x1 = floor (x * 65536 + 0.5) / 65536; x2 = x - x1; return c*exp(-0.5*x1*x1)*exp(-0.5*x2*x2)*exp(-x1*x2); }

All you can assume about long double is that it is no less accurate than double. See the C99/C11 standards.

This seems to work for me without making assumptions about the precision of "double". x = fabs (x); if (x >= 2 * sqrt (DOUBLE_MAX)) return R_D__0; if (give_log) return -(M_LN_SQRT_2PI + 0.5 * x * x + log(sigma)); else if (x < 5) return M_1_SQRT_2PI * exp(-0.5 * x * x) / sigma; else { /* * Split x into two parts, x=x1+x2, such that |x2|<=2^-16. * Assuming that we are using IEEE doubles, that means that * x1*x1 is error free for x<1024 (above which we will underflow * anyway). If we are not using IEEE doubles then this is * still an improvement over the naive formula. */ double x1 = floor (x * 65536 + 0.5) / 65536; double x2 = x - x1; return M_1_SQRT_2PI / sigma * (exp(-0.5 * x1 * x1) * exp((-0.5 * x2 - x1) * x2)); }

(In reply to M Welinder from comment #3) I've implemented a version of this, in R-devel (actually, using ldexp(.) for faster multiplication with powers of 2), and then checked the result both accuracy and time consumption: For accuracy checking (my own) package 'Rmpfr' is very useful; for timing, I've used package 'microbenchmark'. Summary: - The new code uses 20% more CPU in a (somewhat) realistic setting - The accuracy is improved, indeed, but then really only for these very rare numbers such as <integer> +/- 2^-50 (you've used 35 + 2^-46). I'm very hesitant to activate this, because the problem is triggered so rarely *and* after all, the accuracy is still 13 digits in these rare cases, and the full 2e-16 (or better) accuracy for all the normal cases. Here's some R "log" for the timing {on one platform}: #### We provide pnorm() explicitly -- but with the price of warning about #### "shadowing pnorm() from R's 'stats' package. ###--> ~/R/Pkgs/Rmpfr/R/special-fun.R ## dnorm() is much simpler {and not part of 'Rmpfr'}: dnorm <- function (x, mean = 0, sd = 1, log = FALSE) { if(is.numeric(x) && is.numeric(mean) && is.numeric(sd)) stats::dnorm(x, mean, sd, log=log) else if(is(x, "mpfr") || is(mean, "mpfr") || is(sd, "mpfr")) { stopifnot(length(log) == 1) rr <- x <- (as(x, "mpfr") - mean) / sd prec.x <- max(.getPrec(x)) twopi <- 2*Const("pi", prec.x) ## f(x) = 1/(sigma*sqrt(2pi)) * exp(-1/2 x^2) if(log) ## log( f(x) ) = -[ log(sigma) + log(2pi)/2 + x^2 / 2] -(log(as(sd,"mpfr")) + (log(twopi) + x*x)/2) else exp(-x^2/2) / (as(sd,"mpfr")*sqrt(twopi)) } else stop("invalid arguments (x,mean,sd)") } options(digits=21) dnorm(5) require("Rmpfr") all.equal(dnorm(mpfr(5, 100)), dnorm(5), tol= 1e-17) # 2.0187 e-17 all.equal(dnorm(mpfr(6, 120)), dnorm(6), tol= 1e-17) # 9.8807 e-17 all.equal(dnorm(mpfr(7, 120)), dnorm(7), tol= 1e-17) # 9.8807 e-17 relErr <- function(target, current) as.numeric(1 - current/target) x0 <- seq(2, 40, by=1/4) x <- sort(c(x0, x0+2^-46, x0+2^-50, x0- 2^-54)) system.time(## takes a couple of seconds: re.x <- sapply(x, function(.) relErr(dnorm(mpfr(., 150)), dnorm(.))) ) ## 7.5 ## Now, I'm using a version of Morten Welinder's proposal for PR#15620 ## https://bugs.r-project.org/bugzilla/show_bug.cgi?id=15620 ## in ~/R/D/r-devel/R/src/nmath/dnorm.c ##---> But this is *not* active by default: You need to recompile ## with -DR_ACCURATE_dnorm to get that. ## I've kept both the logs (microbenchmark) ## -> ./distr-ex.Rout.~Rdevel~ ## -> ./distr-ex.Rout ## and the plots ## -> ./dnorm-relErr_R_3.0.2.pdf and ## -> ./dnorm-relErr_R_3.1.0.pdf <<- with the accurate (slow) code pdf(print(paste0("dnorm-relErr_R_",getRversion(),".pdf"))) op <- par(mar=par("mar")+c(0,0,0,1.5)) plot(abs(re.x) ~ x, subset=abs(re.x) < 1e-6, type="o", col="gray25", log="y", ylab = expression(group("|", rel.err, "|")), main = "| rel.Error( dnorm(x) ) |", yaxt="n"); sfsmisc::eaxis(2) col.e <- adjustcolor("tomato", 0.75) eps.c <- .Machine$double.eps f <- c(1,16, 256) (fE <- as.expression(lapply(f, function(.) if(. == 1) quote(epsilon[c]) else substitute(F*epsilon[c], list(F= .))))) abline(h = f*eps.c, lty=3, lwd=2, col=col.e) axis(4, at=f*eps.c, labels = fE, las=1, col=col.e, col.axis=col.e) par(op) abline(v=3:5, lty=2, col="gray") mtext(R.version.string, 1, adj=1, cex = 0.5, line=-1) detach("package:Rmpfr") rm(dnorm) # from above require(microbenchmark) mb <- microbenchmark(dnorm(x), dnorm(x, log=TRUE)) mb ##---- with the accurate dnorm: -------- 2 calls ----- ## Unit: microseconds ## expr min lq median uq max neval ## dnorm(x) 117.609 118.168 118.5875 125.711 1342.033 100 ## dnorm(x, log = TRUE) 76.264 77.102 77.5220 85.204 182.979 100 ## Unit: microseconds ## expr min lq median uq max neval ## dnorm(x) 117.610 118.308 119.006 140.9365 153.088 100 ## dnorm(x, log = TRUE) 76.265 76.824 77.382 77.6620 88.556 100 ##---- with the regular R 3.0.2 dnorm : -------- 2 calls ----- ## Unit: microseconds ## expr min lq median uq max neval ## dnorm(x) 83.249 83.807 94.7025 96.937 107.274 100 ## dnorm(x, log = TRUE) 82.131 82.690 82.9690 83.249 498.094 100 ## Unit: microseconds ## expr min lq median uq max neval ## dnorm(x) 83.528 84.087 96.658 97.217 108.949 100 ## dnorm(x, log = TRUE) 82.410 82.690 83.248 83.249 97.217 100

I think you are mistaken as to how often this happens -- I didn't discover it for constructed numbers like 35+2^-46 but for some random number. Here's another sample with only 13 correct digits: (1/Sqrt[2*Pi])*Exp[-(32486362294277461.5/2^46)^2/2] --> 3.196270108727368905046765648427127746702004721034032742... × 10^-272 R gets... > dnorm(2486362294277461.5/2^46,0,1) [1] 3.196270108727435 (That's 35+1/3 rounded to nearest double, btw.) Basically it following from the use of exp that this problem will exist for all large x for which x*x suffers rounding errors. The reason you don't see the problem is your test data: x0 <- seq(2, 40, by=1/4) *All* these numbers have <10 bits so x*x is error free. Do "by=1/10" and you will see the problem.

(In reply to M Welinder from comment #5) > I think you are mistaken as to how often this happens -- I didn't discover > it for constructed numbers like 35+2^-46 but for some random number. > You are right. I had been mistaken, there. Indeed it is typical that about 8 bit precision is lost (on my 64-bit platform) for "most" numbers around 35. OTOH, the extra effort is not negligible (and you never claimed it would be). It is a border case of what "should happen"... as in other cases, we are worse than 13 digits accuracy. After another rounds of experiments, I now tend to *do* the change (more accurate but slower for |x| > 5), but keep a compilation option which prefers speed. Martin