Bug 17383 - mantelhaen.test() fails with integer overflow for tables containing large values
Summary: mantelhaen.test() fails with integer overflow for tables containing large values
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Analyses (show other bugs)
Version: R-devel (trunk)
Hardware: x86_64/x64/amd64 (64-bit) Linux
: P5 minor
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2018-01-27 14:55 UTC by Ben Bolker
Modified: 2018-04-09 10:09 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Bolker 2018-01-27 14:55:38 UTC
I've labeled this as "minor bug", which seems fair; it doesn't give an
*incorrect* answer, but it does fail to give an answer in a case where
a sensible one exists and can easily be computed ...

The following example gives an overflow warning/NA in foreign code error:

set.seed(101); n <- 500000
db <- data.frame(education=
                   factor(sample(1:3,replace=TRUE,size=n)),
                 score=
                   factor(sample(1:5,replace=TRUE,size=n)),
                 sex=
                   sample(c("M","F"),replace=TRUE,size=n))
mantelhaen.test(db$education, db$score, db$sex)

The problem occurs when internal calculations based on total numbers and
column/row sums of the table overflow (the table contains integers,
so all of these derived values are also integers).

The patch below solves the problem by coercing ntot to double before
multiplying. The performance impact should be negligible. I can't rule
out the possibility of some change to output in some very weird/extreme
edge cases, but it seems very unlikely.  A slightly more paranoid solution
would be to test the values to see if they would overflow, and only
coerce to double if necessary. A more drastic solution would coerce
ntot to double upstream (e.g. conceivably ntot^2*(ntot-1) could also
overflow ...)

===================================================================
--- mantelhaen.test.R	(revision 74171)
+++ mantelhaen.test.R	(working copy)
@@ -280,9 +280,9 @@
                                         # n_{.jk}, j = 1 to J-1
             n <- n + c(f[-I, -J])
             m <- m + c(outer(rowsums, colsums, "*")) / ntot
-            V <- V + (kronecker(diag(ntot * colsums, nrow = J - 1)
+            V <- V + (kronecker(diag(as.numeric(ntot) * colsums, nrow = J - 1)
                                 - outer(colsums, colsums),
-                                diag(ntot * rowsums, nrow = I - 1)
+                                diag(as.numeric(ntot) * rowsums, nrow = I - 1)
                                 - outer(rowsums, rowsums))
                       / (ntot^2 * (ntot - 1)))
         }
Comment 1 Ben Bolker 2018-04-08 17:46:24 UTC
bump ... ? (it's been a few months since I submitted this, still listed as "unconfirmed" ...)
Comment 2 Martin Maechler 2018-04-09 07:37:33 UTC
yeah... some things get buried in stacks of todos..

Your patch looks very sensible to me.

Note that  ntot^2*(ntot-1)   cannot overflow because '^2' is double also for integer input.