Bug 16845 - There appears to be an error computing the CDF of a non-central T distributed random variable using the pt() function
Summary: There appears to be an error computing the CDF of a non-central T distributed...
Status: UNCONFIRMED
Alias: None
Product: R
Classification: Unclassified
Component: Accuracy (show other bugs)
Version: R 3.2.4 revised
Hardware: x86_64/x64/amd64 (64-bit) Windows 64-bit
: P5 major
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2016-04-19 18:37 UTC by flyonthemap
Modified: 2016-04-19 20:37 UTC (History)
1 user (show)

See Also:


Attachments
Describes problem and provides code to replicate the discontinuity in pt() and the disagreements noted (e.g. between cdf value computed using pt() and empirically using rt()) (48.84 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2016-04-19 18:37 UTC, flyonthemap
Details

Note You need to log in before you can comment on or make changes to this bug.
Description flyonthemap 2016-04-19 18:37:21 UTC
Created attachment 2065 [details]
Describes problem and provides code to replicate the discontinuity in pt() and the disagreements noted (e.g. between cdf value computed using pt() and empirically using rt())

Issue is visible as:

(1) a discontinuity in pt(q, df, ncp) when plotted against a function of ncp – the non-centrality parameter.   In the example presented later (Figure 1 of attachment), this function is the true %RSD = σ/µ * 100 = sqrt(n)/ncp * 100 where n is the sample size.   
(2) a disagreement with the CDF computed empirically using rt() 
(3) a disagreement with CDF values produced by both JMP and SAS

  The example in attachment (Problem Details) shows a disagreement in CDF values of 0.01457 (or 1.457%).    A discontinuity in CDF values as large as 0.0467 (4.67%) was also observed (see Appendix).

  Problem was present on R running under Windows 7, 10 and Red Hat Enterprise Linux Server release 6.6.  Present in R 3.2.4 Revised (64-bit and 32-bit versions).
Comment 1 Peter Dalgaard 2016-04-19 20:37:56 UTC
It would be helpful if you would 

(a) provide your code as plain text rather than inside a word document
(b) highlight exactly which values of q, df, ncp show the problem 

Notice that the documentation has

     ncp: non-centrality parameter delta; currently except for ‘rt()’,
          only for ‘abs(ncp) <= 37.62’.  If omitted, use the central t
          distribution.

and

     The code for non-zero ‘ncp’ is principally intended to be used for
     moderate values of ‘ncp’: it will not be highly accurate,
     especially in the tails, for large values.


So it is well known that the code has issues. Concretely, the code switches to a normal approximation at ncp = +- 37.62, and that transition is not quite smooth enough. However, the algorithm for smaller ncp underflows at that point. More mathematical analysis is needed to improve the approximation. Numerical examples that just display the discontinuity are informative, but not really helpful for improving the code.