Created attachment 2494 [details] patch Patch attached. when rounding the 8 lines below only two round to even. Recreated bug on windows 64bit arch and linux 64bit arch > round(55.55, 1) >[1] 55.5 > round(55.555, 2) >[1] 55.55 > round(55.5555, 3) >[1] 55.556 <- correct > round(55.55555, 4) >[1] 55.5555 > round(55.555555, 5) >[1] 55.55555 > round(55.5555555, 6) >[1] 55.555555 > round(55.55555555, 7) >[1] 55.5555556 <-correct > round(55.555555555, 8) >[1] 55.55555555 This is because of the error caused by the subtraction (x-intx) on line 60 fround.c during that subtraction the least signifigant digit is being changed from a 5 to a 4 This is causing the half way rules to no longer apply and and the number is rounding down because it is then .4 instead of to even. below is a simple c program that breaks out the line to make it apparent what is happening. but the attached is an optimized version of the code. > >#include <stdio.h> >#include <math.h> >#include <float.h> >int >main () >{ >#define LDOUBLE long double > > double R_pow_di (double x, int n) > { > double pow = 1.0; > > if (n != 0) > { > > if (n < 0) > { > n = -n; > x = 1 / x; > } > for (;;) > { > if (n & 01) > pow *= x; > if (n >>= 1) > x *= x; > else > break; > } > } > return pow; > } > > > double fround (double x, double digits) > { >#define MAX_DIGITS DBL_MAX_10_EXP > double dbx = x; > LDOUBLE pow10, sgn, intx; > int dig; > > > if (digits > MAX_DIGITS) > digits = MAX_DIGITS; > dig = (int) floor (digits + 0.5); > if (x < 0.) > { > sgn = -1.; > x = -x; > } > else > sgn = 1.; > if (dig == 0) > { > return (double) (sgn * nearbyint (x)); > } > else if (dig > 0) > { > pow10 = R_pow_di(10., dig); > intx = floor (dbx); > double decx = dbx - intx; > double decxp10 = decx * pow10; > double decx10dbl = (double) decxp10; > double decx10NI = nearbyint ( decx10dbl); > double decx_10 = decx10NI / pow10; > double decx_10intx = intx + decx_10; > double decx_10dblNIintxsgn = sgn * decx_10intx; > return (double) (decx_10dblNIintxsgn); > > } > else > { > pow10 = R_pow_di(10., -dig); > return (double) (sgn * nearbyint ((double) (x / pow10)) * > pow10); > } > } > > double res = fround (55.5555, 3); > printf ("%f", res); > return 0; > >} >Thanks >Adam Wheeler
Isn't the issue here that these numbers can't be represented exactly, and so the type of rounding that happens depends on whether the nearest floating point number used is above or below the 'requested' number? For example: > > options(digits = 22) > > vals <- c(5.5, 5.55, 5.555, 5.5555, 5.55555, 5.555555) > > vals > [1] 5.500000000000000000000 5.549999999999999822364 5.554999999999999715783 > [4] 5.555500000000000326850 5.555550000000000210321 5.555555000000000021032 > > round(vals, 2) > [1] 5.500000000000000000000 5.549999999999999822364 5.549999999999999822364 > [4] 5.559999999999999609201 5.559999999999999609201 5.559999999999999609201 > > options(digits = 7) > > round(vals, 2) > [1] 5.50 5.55 5.55 5.56 5.56 5.56
Hi Kevin, That is what I thought at first too. However, I realized that 55.555 can be represented close enough to not have this error, where as 0.555 cannot. The problem occurs when rounding 55.555 (and like numbers) when the whole part is, unnecessarily subtracted from the decimal part.
This is baffling indeed. Your patch just simplifies the code, removing the "offset calculation". That 'intx' "offset" has already been in a very early version of fround.c, more than two years *before* R 1.0.0: ------------------------------------------------------------------------ r574 | ihaka | 1998-01-14 00:32:17 +0100 (Wed, 14 Jan 1998) | 2 lines New IEEE aware math library. ------------------------------------------------------------------------ currently, I cannot find a case where the old code is better than the new one, and I am doing quite a few checks now even than those that are (implicitly) in R's "make check-all"
Yes, one other thing I for got to add into this patch was to remove the If statement. it is unnecessary as well if you just use: pow10 = R_pow_di(10., sgn * dig); Thanks, Adam Wheeler
(In reply to Martin Maechler from comment #3) > This is baffling indeed. Your patch just simplifies the code, removing the > "offset calculation". That 'intx' "offset" has already been in a very > early version of fround.c, more than two years *before* R 1.0.0: but in that version we had a relatively simplistic 'private_rint()' whereas now we (almost always) have a smart probably very close to optimal nearbyint() (C99 system) function. This explains why the "offsetting" has no longer been necessary, and now only has drawbacks.
Fixed (for R-devel only) in svn rev 77609, a few minutes ago.
Believe it or not, there is more to it. With the change, we now have > round(c(-2,2), digits = .Machine$integer.max) [1] -Inf Inf or also already > round(4, 400) [1] Inf which breaks the 'R CMD check' of at least one CRAN package. So one might consider using the old code in case of non-small 'digits'. However it looks to me that the old code was only "incidentally" correct in this case. For such large digits, the old code could give "nonsense" too: > e <- 5.555555555555555555555e-308 > round(e, 1000L) [1] 6e-308 ooops?? Rounding to 1000 digits after the decimal should not loose any digits ! > e - round(e, 1000L) [1] -4.444444e-309 > .... even though I've promised the CRAN team to fix this problem "now", I'm less sure what we really want, and I do doubt if we should allow digits = .Machine$integer.max even though that is used in at least one CRAN package's code...
Hmm... I should think a bit longer about this. We have had previously and still now if (digits > MAX_DIGITS) digits = MAX_DIGITS; where MAX_DIGITS is effectively 308. So no wonder we got what we got for > e <- 5.555555555555555555555e-308 > round(e, 1000L) and also, we can continue to just truncate the digits "down to MAX_DIGITS". Probably we should mention it in the help page at least.
It may be instructive to delve into what signif(), "the sister" of round(), does here. It's definitely better than round() ((though also not quite optimal, it seems for digits >= 16 or so)) [both R <= 3.6.x, and current R-devel]: > e <- 5.555555555555555555555e-308 > d <- 20:1 ; s.e <- signif(e, d) ; names(s.e) <- paste0("d", d) > d <- 310:305; r.e <- round (e, d) ; names(r.e) <- paste0("d", d) > op <- options(digits=20) > cbind(signif = c(e, s.e)) ##-- this always rounds up (= to even) signif 5.5555555555555559302e-308 d20 5.5555555555555559302e-308 d19 5.5555555555555559302e-308 d18 5.5555555555555549420e-308 d17 5.5555555555555559302e-308 d16 5.5555555555555569183e-308 d15 5.5555555555555569183e-308 d14 5.5555555555555974317e-308 d13 5.5555555555559966367e-308 d12 5.5555555555599965922e-308 d11 5.5555555555999971350e-308 d10 5.5555555559999966344e-308 d9 5.5555555599999975572e-308 d8 5.5555555999999969036e-308 d7 5.5555559999999972844e-308 d6 5.5555599999999971399e-308 d5 5.5555999999999966832e-308 d4 5.5559999999999970575e-308 d3 5.5599999999999968477e-308 d2 5.5999999999999977136e-308 d1 5.9999999999999974797e-308 > cbind( round = c(e, r.e)) round 5.5555555555555559302e-308 d310 5.9999999999999964916e-308 d309 5.9999999999999964916e-308 d308 5.9999999999999964916e-308 d307 9.9999999999999951407e-308 d306 0.0000000000000000000e+00 d305 0.0000000000000000000e+00 > options(op) >
New commit svn rev 77618 should address the remaining problems I've identified; there's at least two more cases in the new regression tests where the new code is better (clearly) than the R <= 3.6.x version.
There is more: 1) In semi-private communication, a) Jeroen Ooms package 'jsonlite' not fulfilling its own check, he noticed that new R-devel now behaves differently than R's sprintf() .. which is based on (GNU) libc. b) Uwe Ligges remarked that at least in some cases sprintf() [and hence R <= 3.6.x] behavior] was better than the current R-devel because it does round to the nearest. I did notice that in other cases the new R-devel behavior was still more correct. More on this, once my vacations are over. 2) Just found today when fooling around -- this is the same wrong behavior (in previous R and current R-devel) for digits < 0 for largish negative 'digits' : > M <- .Machine$double.xmax > options(digits=4) > round(M, -(1:400)) [1] 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 Inf Inf 1.798e+308 1.798e+308 [10] 1.798e+308 1.798e+308 Inf 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 [19] 1.798e+308 Inf 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 [28] 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 Inf 1.798e+308 1.798e+308 [37] 1.798e+308 1.798e+308 1.798e+308 Inf 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 [46] 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 [55] 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 [64] Inf 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 [73] 1.798e+308 Inf 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 [82] Inf 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 Inf Inf [91] 1.798e+308 1.798e+308 Inf 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 [100] 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 Inf 1.798e+308 1.798e+308 Inf [109] Inf Inf 1.798e+308 1.798e+308 1.798e+308 Inf 1.798e+308 1.798e+308 1.798e+308 [118] Inf 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 [127] 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 Inf 1.798e+308 1.798e+308 [136] 1.798e+308 1.798e+308 1.798e+308 Inf 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 [145] 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 [154] 1.798e+308 1.798e+308 1.798e+308 1.798e+308 Inf 1.798e+308 1.798e+308 1.798e+308 1.798e+308 [163] 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 [172] 1.798e+308 Inf 1.798e+308 1.798e+308 1.798e+308 1.798e+308 Inf Inf 1.798e+308 [181] 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 Inf 1.798e+308 [190] 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 [199] 1.798e+308 1.798e+308 1.798e+308 1.798e+308 Inf 1.798e+308 1.798e+308 1.798e+308 1.798e+308 [208] 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 [217] 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 Inf 1.798e+308 1.798e+308 Inf [226] Inf 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 Inf 1.798e+308 Inf [235] 1.798e+308 1.798e+308 1.798e+308 Inf 1.798e+308 Inf Inf 1.798e+308 1.798e+308 [244] 1.798e+308 1.798e+308 1.798e+308 Inf 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 [253] 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 Inf 1.798e+308 1.798e+308 [262] 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 [271] Inf Inf 1.798e+308 Inf 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 [280] 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 [289] 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 1.798e+308 [298] Inf Inf 1.798e+308 1.798e+308 1.798e+308 1.798e+308 Inf Inf Inf [307] Inf Inf NaN NaN NaN NaN NaN NaN NaN [316] NaN NaN NaN NaN NaN NaN NaN NaN NaN [325] NaN NaN NaN NaN NaN NaN NaN NaN NaN [334] NaN NaN NaN NaN NaN NaN NaN NaN NaN [343] NaN NaN NaN NaN NaN NaN NaN NaN NaN [352] NaN NaN NaN NaN NaN NaN NaN NaN NaN [361] NaN NaN NaN NaN NaN NaN NaN NaN NaN [370] NaN NaN NaN NaN NaN NaN NaN NaN NaN [379] NaN NaN NaN NaN NaN NaN NaN NaN NaN [388] NaN NaN NaN NaN NaN NaN NaN NaN NaN [397] NaN NaN NaN NaN Warning message: NaNs produced > ----------------------- the above with the maximal double number is of course border line, but similar less horrendous behavior is also seen here ------------------------------------------------------------------------------------------------------------------------------ > round(3141.5, -(0:400)) [1] 3142 3140 3100 3000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [22] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [43] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [64] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [85] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [106] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [127] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [148] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [169] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [190] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [211] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [232] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [253] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [274] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [295] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN NaN NaN NaN NaN NaN [316] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN [337] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN [358] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN [379] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN [400] NaN NaN Warning message: NaNs produced > -------------- Note that here all numbers are correct; but the '0' should continue, not become NaN. Note 2, that round(x, -Inf) does work giving '0' for all 'x' .. because there's always been a special check in the C source for 'digits == -Inf'. -------------- This problem "2)" is almost completely orthogonal (i.e. unrelated) to the usual cases about which we talked here, where digits > 0
As Kevin noted, part of the problem is determining if the round-to-even rule applies in the first place for numbers that cannot be exactly represented as .5. I think the new code actually has a regression in some cases. In the unit test of jsonlite, the problematic number is 9.18665. In R < 4.0 the behavior of round() was consistent with libc/printf: x <- 9.18665 round(x, 4) # [1] 9.1867 format(x, digits = 5) # [1] "9.1867" But in the new R-devel behavior, 9.1867 gets rounded down and no longer matches printf: x <- 9.18665 round(x, 4) # [1] 9.1866 > format(x, digits = 5) # [1] "9.1867" The new behavior may seem more accurate according to round-to-even, but the problem is that 9.18665 cannot be exactly represented: print(9.18665, digits = 18) # [1] 9.1866500000000002 So if we take this into account the round-to-even does not apply and hence the old behavior (which also matches printf), is more correct?
(In reply to Jeroen Ooms from comment #12) > ..... ..... > The new behavior may seem more accurate according to round-to-even, but the > problem is that 9.18665 cannot be exactly represented: > > print(9.18665, digits = 18) > # [1] 9.1866500000000002 > > So if we take this into account the round-to-even does not apply and hence > the old behavior (which also matches printf), is more correct? It's not so easy (as I also wrote to you in another private thread). For this combination round(9.18665, 5) you are right that the new behavior is not good, and the previous behavior was better. In general, the printf() behavior is not more correct than the new behavior; actually according to one simulation run I did, printf() on average is both worse than the old and than the new behavior. First thing R of 2020 for me was writing an R package `round` which now waits for CRAN approval .. and needs changes before they look at it again .. which shows old and new algorithm and printf and more versions, with round :: roundX(). Before it makes its way to CRAN, it is available as remotes::install_gitlab("mmaechler/round") then run the smallish simulations via example(roundX) Given that I'm arguing that the "r3" method is "the correct" method for rounding to nearest + rounding to even in case of "equal distance", you see how somewhat surprisingly diverse the results are.
The current state of the art on the somewhat related issue of printing floating point numbers seems to be http://cseweb.ucsd.edu/~lerner/papers/fp-printing-popl16.pdf. This might be worth a look.
(In reply to Martin Maechler from comment #13) > First thing R of 2020 for me was writing an R package `round` which now > waits for CRAN approval .. and needs changes before they look at it again .. > which shows old and new algorithm and printf and more versions, with > round :: roundX(). Very nice package Martin... is this expected? > options(warn = 2) > M <- .Machine$double.xmax > dig <- -(1:314) > resM <- roundAll(M, dig) Error in roundX(x, digits, ver) : (converted from warning) NAs introduced by coercion > sessionInfo() R version 3.6.2 (2019-12-12) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.3 LTS Matrix products: default BLAS: /home/btyner/R362/lib/R/lib/libRblas.so LAPACK: /home/btyner/R362/lib/R/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] round_0.11-0 loaded via a namespace (and not attached): [1] compiler_3.6.2 Matrix_1.2-18 tools_3.6.2 remotes_2.1.0 [5] grid_3.6.2 lattice_0.20-38 tcltk_3.6.2
(In reply to Benjamin Tyner from comment #15) ... is this expected? > > > options(warn = 2) > > M <- .Machine$double.xmax > > dig <- -(1:314) > > resM <- roundAll(M, dig) > Error in roundX(x, digits, ver) : > (converted from warning) NAs introduced by coercion Yes, in the same spirit as > tanpi(1/2) [1] NaN Warning message: In tanpi(1/2) : NaNs produced > and many other floating point operations in R that warn on producing NaNs. Why do think options(warn=2) is a good idea?
(In reply to Martin Maechler from comment #16) > Yes, in the same spirit as > > > tanpi(1/2) > [1] NaN > Warning message: > In tanpi(1/2) : NaNs produced > > > > and many other floating point operations in R that warn on producing NaNs. (On my machine) it's the "sprintf" and "r3" versions that produce the NaN; the other four versions do not. But of course you are right: rounding (especially to a negative number of digits in this fashion) involves floating-point operations! Thank you Martin. > > Why do think options(warn=2) is a good idea? Because in general (and especially when debugging!) I tend to treat warnings as "guilty until proven innocent"; in my experience, >95% of warnings are a sign of a bug, and I would like execution to stop upon the first instance so an investigation can be conducted. For the other <5% (upon proving them innocent, such as is the case here) I'll use suppressWarnings().
In the mean time, my 'round' package has made it to CRAN, (where a slightly newer version is on gitlab, notably with a better vignette, which you can also get from here (for now): https://stat.ethz.ch/~maechler/R/Rounding.html Main message: If we adhere to the principle (IEEE 754, etc) "rounding to nearest + rounding to even in case of tie" and really apply it also for 'digits != 0' (about which the IEEE/ISO standards don't say anything), then all three - R <= 3.6.x - sprintf() {as by the GNU libc's at least} - previous R-devel (up to few minutes ago!) are not good enough {BTW, R 3.6.x and sprintf are not at all equivalent, even though Jeroen thought so,because they gave the same in his example(s)} I now have committed --- svn rev 77727 --- what I had proposed for a while (in the 'round' vignette), hopefully the last time at least before R 4.0.0. Being optimistic, I now close the issue again as fixed.