Created attachment 1536 [details] Patch for overflow problems, increased size of nfac Overview When computing fft() or mvfft() with very large input z, it is possible that the setup phase (factorization and memory allocation) will fail because results of some arithmetic operations are not representable by variables of the C int type. I have a proposed solution for the problems (patch included). Most of the problems will only appear when the input size n is larger than what fits in the memory of a typical desktop computer. Therefore much of the testing has been performed using an ad-hoc standalone program comparing factorization results before and after the proposed patch. In addition to fixing the overflows, the patch increases the size of a factor array from 15 to 20, so that the FFT can be computed for any "int" input size (not long vectors), i.e. n <= 2147483647 (INT_MAX, 2^31 - 1). Details and proposed solutions There are three sections below, as follows: 1. fixing overflows in factorization 2. fixing overflows related to the size of an array 3. changing the size of a factor array (optional, not a bugfix) Now to 1 and 2. When investigating the C code implementing fft() and mvfft(), I noticed that integer overflow is possible (at least) in the following places (line numbers are from the development version r64390, but the computational part has not changed between R 3.0.2 and the development version): * lines 778, 806 and 855 of src/library/stats/src/fft.c * lines 83, 104 and 166 of src/library/stats/src/fourier.c 1. Overflows in factorization Lines 778 and 806 of fft.c belong to function fft_factor(). Here the integer overflows may result in an invalid factorization (overflow of a signed integer causes undefined behavior). An example using the demo program for n = 2147439315: Factorization before changes: n = 2147439315, 3 factors: 364889 -557053 364889 Factorization after changes: n = 2147439315, 5 factors: 3 5 7 257 79579 We can see that the first result is wrong. In this case, on line 778, odd values of j between 3 and 46339 were examined without any of the squares jj = j * j being a factor of n. Then j was increased to 46341 and squared, but the square 2147488281 > INT_MAX and cannot be stored in jj. My proposed fix for this is to replace the comparison of jj and k with a comparison of j and sqrt(k). The "after" result in this example is a valid factorization. On line 806 of fft.c, the result will overflow if j > INT_MAX - 2 (a rare occurrence). To trigger this bug, one could first fix the previous bug and then use n = INT_MAX. My proposed fix, probably not the most elegant one possible, is to check if j > INT_MAX - 2 each time before j is updated. 2. Overflows related to the size of an array On line 855 of src/library/stats/src/fft.c and lines 83, 104 and 166 of src/library/stats/src/fourier.c the problem is that k * maxf may overflow, where k is 2, 3 or 4. The proposed solution is to produce an error if 4 * maxf is larger than what size_t can hold (prevents requesting an incorrect amount of memory to be allocated) and convert maxf to the size_t type before multiplication. 3. Size of a factor array Another proposed change to the factorization is to increase the size of the factor array nfac from 15 to 20 elements, which should be enough to hold the factorization of every n <= 2^31 - 1. Here is a number with 20 factors, as produced by the patched factorization algorithm: n = 1033121304, 20 factors: 3 3 3 3 3 3 3 3 2 2 3 2 3 3 3 3 3 3 3 3 Now, to the best of my knowledge, the smallest number with 21 factors is three times that, which makes it larger than INT_MAX. This is not really a bug fix, and can be considered an optional part of this report and patch. The limited size of the array is documented in the source code: "the smallest number exceeding the 15 locations provided is 12,754,584". However, I see no harm in increasing the size of nfac, which seems to be small only for historical reasons. Therefore this change is included in the patch file. Steps to reproduce, expected vs actual results The test program (source code fftfactors_demo.c is attached) factorizes a user-provided positive integer with both unpatched (function fft_factor_old) and patched code (function fft_factor_new). The code of these functions is copied from the R source code (3.0.2 and development version r64390 agree for this part). The main function reads user input, calls the factorization functions and prints the results. The test program can be compiled with the following command line (gcc assumed) in the directory where the source file is located: gcc fftfactors_demo.c -lm -o fftfactors_demo then, assuming a unix-like operating system, a number n can be factorized with ./fftfactors_demo n The actual results (before the patch) and expected results (after the patch) have already been shown above for n = 2147439315. For example, also the following values of n produce different factorization when comparing the unpatched and patched function: 2147303427, 2147310038. In some cases, overflow happens but it doesn't change the factorization result. This was tested on Linux x86-64. ./fftfactors_demo 2147303427 Factorization before changes: n = 2147303427, 3 factors: 65533 -5461 65533 Factorization after changes: n = 2147303427, 5 factors: 3 43 53 127 2473 ./fftfactors_demo 2147310038 Factorization before changes: n = 2147310038, 3 factors: 92645 -314 92645 Factorization after changes: n = 2147310038, 5 factors: 2 23 73 157 4073 Some numbers that the patched version is able to factorize, thanks to the increased size of the factor array, are 12754584 (first number with 16 factors) and 38263752 (first number with 17 factors). ./fftfactors_demo 12754584 Factorization before changes: factorization error Factorization after changes: n = 12754584, 16 factors: 3 3 3 3 3 3 2 2 3 2 3 3 3 3 3 3 ./fftfactors_demo 38263752 Factorization before changes: factorization error Factorization after changes: n = 38263752, 17 factors: 3 3 3 3 3 3 3 2 2 2 3 3 3 3 3 3 3 After building R 3.0.2 both with and without the patch (a modified patch where only line numbers are different than with the included patch file), I also performed a quick test that fft() and mvfft() produce the same results, with data sizes such that the factorization also works in the unpatched version. The test uses random data and prints a digest (hash function) of the results. By source()ing the test file (fft_test.R) using both versions, I got identical results, "ca2180993a164f3192b9c4c3cb74ce3d". The test computer has insufficient memory for processing an array of a size where the overflows in factorization happen (close to INT_MAX items). Therefore neither fft() nor mvfft() was tested for those input sizes. The results of the patched version using input sizes with more than 15 factors were sanity checked as follows: > foo <- abs(fft(rep.int(1, 12754584))) # 16 factors > all.equal(length(foo), foo[1]) [1] TRUE > all.equal(foo[-1], rep.int(0, length(foo)-1)) [1] TRUE > foo <- abs(fft(rep.int(1, 38263752))) # 17 factors > all.equal(length(foo), foo[1]) [1] TRUE > all.equal(foo[-1], rep.int(0, length(foo)-1)) [1] TRUE Which means that the transform of a constant signal has one peak, at zero frequency, and the scale of the result depends on the input length (is unnormalized). Overflow in memory allocation can be demonstrated as follows (results on a 32-bit platform would probably be different). Problems are to be expected if the largest factor of the input size exceeds INT_MAX / 4 (line 83 of fourier.c). Now, 536870923 is a prime number and larger than INT_MAX / 4. In the unpatched R 3.0.2: > foo <- fft(rep.int(1, 536870923)) Error in fft(rep.int(1, 536870923)) : cannot allocate memory block of size 134217728 Tb where clearly 134217728 Tb (!) is more than would have actually been needed. With the patched version, the amount of memory required was reasonable (but still too much for the computer): > foo <- fft(rep.int(1, 536870923)) Error: cannot allocate vector of size 16.0 Gb > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=fi_FI.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=fi_FI.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] digest_0.6.4

Created attachment 1537 [details] Test program for factorization

Created attachment 1538 [details] Testing results of fft() and mvfft()

Thanks, incorporated in R-devel now.