Bug 16862 - Apparent intermittent memory corruption
Summary: Apparent intermittent memory corruption
Status: UNCONFIRMED
Alias: None
Product: R
Classification: Unclassified
Component: Low-level (show other bugs)
Version: R 3.2.4 revised
Hardware: x86_64/x64/amd64 (64-bit) Linux-Ubuntu
: P3 major
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2016-05-03 16:11 UTC by PeterG
Modified: 2016-05-05 15:53 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description PeterG 2016-05-03 16:11:12 UTC
A user I support has reported intermittent errors with 3.2.4 under GNU/Linux Ubuntu 14.04 where inverting a matrix with Choleski decomposition occasionally returns "NA" in some cells. He has been able to reduce this to inversion of a unity matrix as in:

-----------------------------------------
for (i in 1:20000) {
  v <- diag(1, nrow = 103,ncol = 103)
  chol_v <- chol(v)
  inv_v <- chol2inv(v)
  if (any(is.na(inv_v))) stop()
  print(i)
}
-----------------------------------------

Run with:

-----------------------------------------
$ R --version | grep R.ver
R version 3.2.4 Revised (2016-03-16 r70336) -- "Very Secure Dishes"
-----------------------------------------

And the result is:

-----------------------------------------
[1] 72
[1] 73
[1] 74
[1] 75
[1] 76
[1] 77
[1] 78
[1] 79
Error: 
Execution halted
-----------------------------------------

Additional notes:

* The R binary is from the CRAN Ubuntu repository.

* I have tried matrix sizes other than 103 without being able to reproduce.

* Sometimes the failure happens on iterations other than 80.

* In the original failing script adding tracing 'print' statements made the symptom go away.

* I was able to reproduce this on some systems with 3.2.2 but not with 3.2.5, but it may be version independent, just happens rarely.

This seems to be a bug that occasionally happens to others, this seems to be a previous manifestation:

http://stackoverflow.com/questions/26745943/r-cor-returns-nan-sometimes
Comment 1 PeterG 2016-05-04 10:02:04 UTC
From 3.22 'R.Version()':

-----------------------------------------
$platform
[1] "x86_64-pc-linux-gnu"

$arch
[1] "x86_64"

$os
[1] "linux-gnu"

$system
[1] "x86_64, linux-gnu"

$status
[1] ""

$major
[1] "3"

$minor
[1] "2.2"

$year
[1] "2015"

$month
[1] "08"

$day
[1] "14"

$`svn rev`
[1] "69053"

$language
[1] "R"

$version.string
[1] "R version 3.2.2 (2015-08-14)"

$nickname
[1] "Fire Safety"
-----------------------------------------

Repeating the test (under that 3.2.2 instance):

-----------------------------------------
> n_na_values <- 0
>   for (i in 1:20000) {
+     v <- diag(1, nrow = 103)
+     chol_v <- chol(v)
+     inv_v <- chol2inv(chol_v)
+     n_na_values <- n_na_values + as.numeric(any(is.na(inv_v)))
+   }
> 
>   expect_equal(n_na_values, 0)
-----------------------------------------

-----------------------------------------
Error: n_na_values not equal to 0
0 - 24 == -24
-----------------------------------------

This to me looks like a long-standing as-yet-unfixed bug probably related to memory overwriting in R, one that probably happens rarely but still a lot more often than apparent, where some elements of matrices get overwritten with random or zero values or illegal values. This rare corruption probably is mostly undetected because the results usually still look plausible.
Comment 2 Simon Urbanek 2016-05-05 02:27:12 UTC
I cannot reproduce it neither on R 3.2.4 nor R 3.3.0 using CRAN Ubuntu R binaries on Ubuntu 14.04.4 LTS (libblas3 1.2.20110419-7, liblapack3 3.5.0-2ubuntu1).

I doubt this has anything to do with R itself, I would rather look at the used BLAS libraries - some of the accelerated ones tend to show non-determinitic behavior. You may have better luck first trying to bounce it off the Debian/Ubuntu lists to diagnose it a bit better and/or providing exact details such as the actual dpkgs used etc.
Comment 3 PeterG 2016-05-05 15:07:24 UTC
This is an intermittent problem that is quite hard to reproduce, as my guess is that it depends on specific history of memory requests and layouts, and also does not necessarily manifest with corruption into easily detected 'NA'/'NaN' values, but also guessing into random values. A hardcore use of 'valgrind' might be needed.

As to BLAS and non-deterministic behaviour: that usually happens because of floating point approximations dependent on order of execution, when code is compiled for pure speed, and which typically manifest in ill-conditioned situations.

Here the non-deterministic behaviour happens during the steps ('chol', 'chol2inv') for inverting a unity/identity matrix, and it is exceedingly unlikely that involves floating-point approximation mishaps.
Comment 4 PeterG 2016-05-05 15:11:55 UTC
Package versions:
  installed r-base-core:amd64 3.2.2-1trusty0
  installed r-base-core:amd64 3.2.4-revised-1trusty0
Comment 5 Brian Ripley 2016-05-05 15:53:56 UTC
Please try with a current version of R installed from the sources.  The symptoms point to a problem with external libraries (e.g. BLAS) linked in on your system, as Dr Urbanek also surmised.

I am unable to reproduce this with a Fedora 22 system using the BLAS/LAPACK shipped with R, in either R 3.2.5 or 3.3.0.  Including under valgrind.