Bug 15593 - [Patch] Integer overflows in fft() with large input
Summary: [Patch] Integer overflows in fft() with large input
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Analyses (show other bugs)
Version: R 3.0.2
Hardware: All All
: P5 minor
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2013-12-05 15:38 UTC by Mikko Korpela
Modified: 2013-12-10 13:03 UTC (History)
0 users

See Also:


Attachments
Patch for overflow problems, increased size of nfac (4.47 KB, patch)
2013-12-05 15:38 UTC, Mikko Korpela
Details | Diff
Test program for factorization (6.38 KB, text/x-csrc)
2013-12-05 15:39 UTC, Mikko Korpela
Details
Testing results of fft() and mvfft() (839 bytes, text/x-r-source)
2013-12-05 15:40 UTC, Mikko Korpela
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mikko Korpela 2013-12-05 15:38:15 UTC
Created attachment 1536 [details]
Patch for overflow problems, increased size of nfac

Overview

When computing fft() or mvfft() with very large input z, it is
possible that the setup phase (factorization and memory allocation)
will fail because results of some arithmetic operations are not
representable by variables of the C int type.  I have a proposed
solution for the problems (patch included).  Most of the problems will
only appear when the input size n is larger than what fits in the
memory of a typical desktop computer.  Therefore much of the testing
has been performed using an ad-hoc standalone program comparing
factorization results before and after the proposed patch.  In
addition to fixing the overflows, the patch increases the size of a
factor array from 15 to 20, so that the FFT can be computed for any
"int" input size (not long vectors), i.e. n <= 2147483647 (INT_MAX,
2^31 - 1).


Details and proposed solutions

There are three sections below, as follows:
1. fixing overflows in factorization
2. fixing overflows related to the size of an array
3. changing the size of a factor array (optional, not a bugfix)

Now to 1 and 2.  When investigating the C code implementing fft() and
mvfft(), I noticed that integer overflow is possible (at least) in the
following places (line numbers are from the development version
r64390, but the computational part has not changed between R 3.0.2 and
the development version):
* lines 778, 806 and 855 of src/library/stats/src/fft.c
* lines 83, 104 and 166 of src/library/stats/src/fourier.c

1. Overflows in factorization

Lines 778 and 806 of fft.c belong to function fft_factor().  Here the
integer overflows may result in an invalid factorization (overflow of
a signed integer causes undefined behavior).  An example using the
demo program for n = 2147439315:

Factorization before changes:
n = 2147439315, 3 factors: 364889 -557053 364889 
Factorization after changes:
n = 2147439315, 5 factors: 3 5 7 257 79579 

We can see that the first result is wrong.  In this case, on line 778,
odd values of j between 3 and 46339 were examined without any of the
squares jj = j * j being a factor of n.  Then j was increased to 46341
and squared, but the square 2147488281 > INT_MAX and cannot be stored
in jj.  My proposed fix for this is to replace the comparison of jj
and k with a comparison of j and sqrt(k).  The "after" result in this
example is a valid factorization.

On line 806 of fft.c, the result will overflow if j > INT_MAX - 2 (a
rare occurrence).  To trigger this bug, one could first fix the
previous bug and then use n = INT_MAX.  My proposed fix, probably not
the most elegant one possible, is to check if j > INT_MAX - 2 each
time before j is updated.

2. Overflows related to the size of an array

On line 855 of src/library/stats/src/fft.c and lines 83, 104 and 166
of src/library/stats/src/fourier.c the problem is that k * maxf may
overflow, where k is 2, 3 or 4.  The proposed solution is to produce
an error if 4 * maxf is larger than what size_t can hold (prevents
requesting an incorrect amount of memory to be allocated) and convert
maxf to the size_t type before multiplication.

3. Size of a factor array

Another proposed change to the factorization is to increase the size
of the factor array nfac from 15 to 20 elements, which should be
enough to hold the factorization of every n <= 2^31 - 1.  Here is a
number with 20 factors, as produced by the patched factorization
algorithm:

n = 1033121304, 20 factors: 3 3 3 3 3 3 3 3 2 2 3 2 3 3 3 3 3 3 3 3

Now, to the best of my knowledge, the smallest number with 21 factors
is three times that, which makes it larger than INT_MAX.  This is not
really a bug fix, and can be considered an optional part of this
report and patch.  The limited size of the array is documented in the
source code: "the smallest number exceeding the 15 locations provided
is 12,754,584".  However, I see no harm in increasing the size of
nfac, which seems to be small only for historical reasons.  Therefore
this change is included in the patch file.


Steps to reproduce, expected vs actual results

The test program (source code fftfactors_demo.c is attached)
factorizes a user-provided positive integer with both unpatched
(function fft_factor_old) and patched code (function fft_factor_new).
The code of these functions is copied from the R source code (3.0.2
and development version r64390 agree for this part).  The main
function reads user input, calls the factorization functions and
prints the results.

The test program can be compiled with the following command line (gcc
assumed) in the directory where the source file is located:

gcc fftfactors_demo.c -lm -o fftfactors_demo

then, assuming a unix-like operating system, a number n can be
factorized with

./fftfactors_demo n

The actual results (before the patch) and expected results (after the
patch) have already been shown above for n = 2147439315.  For example,
also the following values of n produce different factorization when
comparing the unpatched and patched function: 2147303427, 2147310038.
In some cases, overflow happens but it doesn't change the
factorization result.  This was tested on Linux x86-64.

./fftfactors_demo 2147303427
Factorization before changes:
n = 2147303427, 3 factors: 65533 -5461 65533 
Factorization after changes:
n = 2147303427, 5 factors: 3 43 53 127 2473 

./fftfactors_demo 2147310038
Factorization before changes:
n = 2147310038, 3 factors: 92645 -314 92645 
Factorization after changes:
n = 2147310038, 5 factors: 2 23 73 157 4073 

Some numbers that the patched version is able to factorize, thanks to
the increased size of the factor array, are 12754584 (first number
with 16 factors) and 38263752 (first number with 17 factors).

./fftfactors_demo 12754584
Factorization before changes:
factorization error
Factorization after changes:
n = 12754584, 16 factors: 3 3 3 3 3 3 2 2 3 2 3 3 3 3 3 3 

./fftfactors_demo 38263752
Factorization before changes:
factorization error
Factorization after changes:
n = 38263752, 17 factors: 3 3 3 3 3 3 3 2 2 2 3 3 3 3 3 3 3 

After building R 3.0.2 both with and without the patch (a modified
patch where only line numbers are different than with the included
patch file), I also performed a quick test that fft() and mvfft()
produce the same results, with data sizes such that the factorization
also works in the unpatched version.  The test uses random data and
prints a digest (hash function) of the results.  By source()ing the
test file (fft_test.R) using both versions, I got identical results,
"ca2180993a164f3192b9c4c3cb74ce3d".  The test computer has
insufficient memory for processing an array of a size where the
overflows in factorization happen (close to INT_MAX items).  Therefore
neither fft() nor mvfft() was tested for those input sizes.  The
results of the patched version using input sizes with more than 15
factors were sanity checked as follows:

> foo <- abs(fft(rep.int(1, 12754584))) # 16 factors
> all.equal(length(foo), foo[1])
[1] TRUE
> all.equal(foo[-1], rep.int(0, length(foo)-1))
[1] TRUE
> foo <- abs(fft(rep.int(1, 38263752))) # 17 factors
> all.equal(length(foo), foo[1])
[1] TRUE
> all.equal(foo[-1], rep.int(0, length(foo)-1))
[1] TRUE

Which means that the transform of a constant signal has one peak, at
zero frequency, and the scale of the result depends on the input
length (is unnormalized).

Overflow in memory allocation can be demonstrated as follows (results
on a 32-bit platform would probably be different).  Problems are to be
expected if the largest factor of the input size exceeds INT_MAX / 4
(line 83 of fourier.c).  Now, 536870923 is a prime number and larger
than INT_MAX / 4.  In the unpatched R 3.0.2:

> foo <- fft(rep.int(1, 536870923))
Error in fft(rep.int(1, 536870923)) : 
  cannot allocate memory block of size 134217728 Tb

where clearly 134217728 Tb (!) is more than would have actually been
needed.  With the patched version, the amount of memory required was
reasonable (but still too much for the computer):

> foo <- fft(rep.int(1, 536870923))
Error: cannot allocate vector of size 16.0 Gb

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=fi_FI.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=fi_FI.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=fi_FI.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] digest_0.6.4
Comment 1 Mikko Korpela 2013-12-05 15:39:30 UTC
Created attachment 1537 [details]
Test program for factorization
Comment 2 Mikko Korpela 2013-12-05 15:40:58 UTC
Created attachment 1538 [details]
Testing results of fft() and mvfft()
Comment 3 Brian Ripley 2013-12-10 13:03:58 UTC
Thanks, incorporated in R-devel now.