Bug 16638 - Spearman correlation occasionally returns values below -1
Summary: Spearman correlation occasionally returns values below -1
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Accuracy (show other bugs)
Version: R-devel (trunk)
Hardware: x86_64/x64/amd64 (64-bit) Linux
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2015-12-19 20:47 UTC by Keith Hughitt
Modified: 2015-12-23 12:11 UTC (History)
1 user (show)

See Also:


Attachments
Small reproducible example matrix. (340 bytes, application/x-r-data)
2015-12-19 20:47 UTC, Keith Hughitt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Keith Hughitt 2015-12-19 20:47:33 UTC
Created attachment 1948 [details]
Small reproducible example matrix.

While working with a Spearman-based correlation matrix, I noticed that some of the values of the matrix were outside of the range [-1, 1].

Upon further inspection, I found some values that at first glance appeared to be equal to -1, but actually were slightly lower.

I've attached a small matrix which can be used to reproduce the issue:

> load("test.RData")
> mat
         6961        6962       6963       6964       6966       6967       6968       6969
a  0.44052070 -0.44598314 -1.4587333  1.4641958 -0.7586740 -1.0223099  1.0590426  0.7219413
b -0.19429487  0.29103209  0.8106627 -0.9073999  0.3305041  0.3665862 -0.4325788 -0.2645114
c -0.03715713  0.09068725  0.4146191 -0.4681492  0.1675740  0.4106130 -0.3456950 -0.2324920
> sprintf("%0.50f", min(cor(t(mat), method='spearman')))
[1] "-1.00000000000000022204460492503130808472633361816406"

R Under development (unstable) (2015-12-04 r69737)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] setwidth_1.0-4 colorout_1.1-0

This was reproducible across several recent version of R running on different Linux systems.
Comment 1 Duncan Murdoch 2015-12-22 17:18:02 UTC
I'll fix this to guarantee that cor() always returns a value in the [-1, 1] interval.
Comment 2 Duncan Murdoch 2015-12-22 21:36:50 UTC
Now committed in R-devel.
Comment 3 Keith Hughitt 2015-12-23 12:11:58 UTC
Thanks for the quick fix!