Created attachment 1121 [details] r script with different uses of t.test t.test seems to deliver incorrect results if the data vectors contain missing values. Suppose, we have two vectors with numbers and some missing values that refer to the same individuals and that should therefore be evaluated with a paired t-test: > testdata.A <- c(1.15, -0.2, NA, 1, -2, -0.5, 0.1, 1.2, -1.4, 0.01); > testdata.B <- c(1.2, 1.1, 3, -0.1, 3, 1.1, 0, 1.3, 4, NA); Then > print(t.test(testdata.A, testdata.B, paired=TRUE, alternative="two.sided", na.action="na.pass")) and > print(t.test(testdata.A, testdata.B, paired=TRUE, alternative="two.sided", na.action="na.exclude")) deliver the same p value (0.1162). However, after combining the two vectors with > testdata <- c(testdata.A, testdata.B); and defining a criterion vector with > criterion <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1); we get a different p-value (0.01453) with > print(t.test(testdata ~ criterion, paired=TRUE, alternative="two.sided", na.action="na.exclude")) . The statement > print(t.test(testdata ~ criterion, paired=TRUE, alternative="two.sided", na.action="na.pass")) however, delivers a p-value of 0.1162 again. With > print(t.test(testdata[criterion==0], testdata[criterion==1], paired=TRUE, alternative="two.sided", na.action="na.exclude")) that imitates the first form, we get again a p value of 0.1162. Excel, StatView and KaleidaGraph all display a p-value of 0.1162.

I have just learned from Thomas Lumley that "if there is a bug it is only that t.test.formula doesn't throw an error when given the paired=TRUE option. The documentation says "The formula interface is only applicable for the 2-sample tests.", but there probably should be an explicit check -- I didn't see that in the documentation until I realized that t.test.formula couldn't work for paired tests because you don't tell it which observations are paired." Therefore, it would be advisable to implement the check for a paired test in the formula notation. The t.test function itself seems to be correct.

r-bugs@r-project.org wrote: > https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14359 > > > > > > --- Comment #1 from jwdietrich <j.w.dietrich.bz@medizinische-kybernetik.de> 2010-08-13 18:51:20 --- > I have just learned from Thomas Lumley that "if there is a bug it is only that > t.test.formula doesn't throw an error when given the paired=TRUE option. Before anyone gets carried away, let me remind you of example(sleep). I.e., we're actually using the current feature, even though we document it not to exist... (I se that example(t.test) is still performing the incorrect test. I thought Martin was on the warpath agaist it at some point?) The other option is to document what currently happens. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd.mes@cbs.dk Priv: PDalgd@gmail.com

On Sat, Aug 14, 2010 at 01:44, Peter Dalgaard <pd.mes@cbs.dk> wrote: > r-bugs@r-project.org wrote: >> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14359 >> >> >> >> >> >> --- Comment #1 from jwdietrich <j.w.dietrich.bz@medizinische-kybernetik.de> Â 2010-08-13 18:51:20 --- >> I have just learned from Thomas Lumley that "if there is a bug it is only that >> t.test.formula doesn't throw an error when given the paired=TRUE option. > > Before anyone gets carried away, let me remind you of example(sleep). > I.e., we're actually using the current feature, even though we document > it not to exist... Yes. {note that I did not understand what you wrote before thinking about the 'sleep' example}. I think there are two issues here: 1) "sleep": A data frame of dim. 20 x 2 (variable 'extra' (=y) and 'group'). As the help file explains these are measurements on 10 (!) individuals, once with and once without "sleeping pill". *BUT* the data itself do not contain this information: You just have to know that obs. 1 & 11, 2 & 12,... each or of the same patient. I propose to add a third variable, an "ID" (factor), and also change the example, using that ID. *As* the sleep data is really about a paired experiment, I would feel very bad as a statistics professor, not to perform a paired test. But there's no need to do it using the formula interface. 2) t.test: documentation / implementation: a) NA handling: We have this on the help page Arguments: .......... na.action: a function which indicates what should happen when the data contain â€˜NAâ€™s. Defaults to â€˜getOption("na.action")â€™. ..... Details: ......... Missing values are removed (in pairs if â€˜pairedâ€™ is â€˜TRUEâ€™). and I'd say that 'Details' should be a bit more specific: I currently describes what happens in t.test.default(), and that is called from t.test.formula *after* it has applied na.action in building the model.frame. > (I se that example(t.test) is still performing the incorrect test. I > thought Martin was on the warpath agaist it at some point?) Yes! The sleep data is about a paired experiment, and we are bad teachers of statistics, if we advertize to do an unpaired test (we could do it *after* doing it paired, and *with* a comment saying that this is sub-optimal, losing power, ...,) > The other option is to document what currently happens. I would not want that... unless "we must" because of dozens of cases people use this unfortunately. The sleep data is a bad example for non-paired t-tests, see above, and, as long it's missing the 'ID' code, actually a bad example of a data set... and I still agree with whoever had decided that "paired=TRUE" and 'formula' should not be allowed to go toegether... Martin > -- > Peter Dalgaard > Center for Statistics, Copenhagen Business School > Phone: (+45)38153501 > Email: pd.mes@cbs.dk Â Priv: PDalgd@gmail.com > > _______________________________________________ > R-core list: https://stat.ethz.ch/mailman/listinfo/r-core > >

Martin Maechler wrote: > On Sat, Aug 14, 2010 at 01:44, Peter Dalgaard <pd.mes@cbs.dk> wrote: >> r-bugs@r-project.org wrote: >>> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14359 >>> >>> >>> >>> >>> >>> --- Comment #1 from jwdietrich <j.w.dietrich.bz@medizinische-kybernetik.de> 2010-08-13 18:51:20 --- >>> I have just learned from Thomas Lumley that "if there is a bug it is only that >>> t.test.formula doesn't throw an error when given the paired=TRUE option. > >> Before anyone gets carried away, let me remind you of example(sleep). >> I.e., we're actually using the current feature, even though we document >> it not to exist... > > Yes. {note that I did not understand what you wrote before thinking > about the 'sleep' example}. > > I think there are two issues here: > 1) "sleep": A data frame of dim. 20 x 2 (variable 'extra' (=y) and 'group'). > As the help file explains these are measurements on 10 (!) > individuals, once with and once without > "sleeping pill". > *BUT* the data itself do not contain this information: You just > have to know that obs. 1 & 11, 2 & 12,... > each or of the same patient. > I propose to add a third variable, an "ID" (factor), and also change > the example, using that ID. It still remains that there is no formula interface for paired tests which is a bit of a nuisance, at least paedagogically. "y ~ g + strata(ID)" is a definite possibility for "long"-format data, with the paired test being the extreme case, but it doesn't carry too well to the wilcox.test, where the natural large-strata stratified analysism does not reduce to the signed-rank test. I don't really think it is that harmful to have to assume that the data set is sorted by pair (or pair within "group") and say so in the documentation. The NA handling mess-up is kind of serious, though. There must be a similar issue with subset=, but it is easier in that case to require user discretion. > > *As* the sleep data is really about a paired experiment, I would > feel very bad as a statistics professor, > not to perform a paired test. But there's no need to do it using > the formula interface. > > 2) t.test: documentation / implementation: > a) NA handling: We have this on the help page > > Arguments: > .......... > na.action: a function which indicates what should happen when the data > contain â€˜NAâ€™s. Defaults to â€˜getOption("na.action")â€™. Yes. I would suggest that a simple workaround for the observed behaviour is to zap this option and internally hardcode it to na.pass. t.test.default is doing NA removal anyway, so the only possible point of using getOption would be if it were set na.fail globally (does anyone ever do that?). (BTW, model.frame has na.action formally defaulting to na.fail, but internally circumvents it. Terribly confusing.) > ..... > Details: > ......... > Missing values are removed (in pairs if â€˜pairedâ€™ is â€˜TRUEâ€™). > > and I'd say that 'Details' should be a bit more specific: I currently > describes what happens in t.test.default(), > and that is called from t.test.formula *after* it has applied > na.action in building the model.frame. > > >> (I se that example(t.test) is still performing the incorrect test. I >> thought Martin was on the warpath agaist it at some point?) > > Yes! The sleep data is about a paired experiment, and we are bad > teachers of statistics, > if we advertize to do an unpaired test > (we could do it *after* doing it paired, and *with* a comment saying > that this is sub-optimal, > losing power, ...,) > >> The other option is to document what currently happens. > > I would not want that... unless "we must" because of dozens of cases > people use this unfortunately. > > The sleep data is a bad example for non-paired t-tests, see above, > and, as long it's missing the 'ID' code, actually a bad example of a > data set... > > and I still agree with whoever had decided that "paired=TRUE" and > 'formula' should not be allowed to go toegether... > > Martin > >> -- >> Peter Dalgaard >> Center for Statistics, Copenhagen Business School >> Phone: (+45)38153501 >> Email: pd.mes@cbs.dk Priv: PDalgd@gmail.com >> >> _______________________________________________ >> R-core list: https://stat.ethz.ch/mailman/listinfo/r-core >> >> > -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd.mes@cbs.dk Priv: PDalgd@gmail.com

On Sat, Aug 14, 2010 at 10:34, Peter Dalgaard <pd.mes@cbs.dk> wrote: > Martin Maechler wrote: >> On Sat, Aug 14, 2010 at 01:44, Peter Dalgaard <pd.mes@cbs.dk> wrote: >>> r-bugs@r-project.org wrote: >>>> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14359 >>>> >>>> >>>> >>>> >>>> >>>> --- Comment #1 from jwdietrich <j.w.dietrich.bz@medizinische-kybernetik.de> Â 2010-08-13 18:51:20 --- >>>> I have just learned from Thomas Lumley that "if there is a bug it is only that >>>> t.test.formula doesn't throw an error when given the paired=TRUE option. >> >>> Before anyone gets carried away, let me remind you of example(sleep). >>> I.e., we're actually using the current feature, even though we document >>> it not to exist... >> >> Yes. {note that I did not understand what you wrote before thinking >> about the 'sleep' example}. >> >> I think there are two issues here: >> 1) "sleep": Â A data frame of dim. Â 20 x 2 Â (variable 'extra' (=y) and 'group'). >> Â Â Â As the help file explains these are measurements on 10 (!) >> individuals, once with and once without >> Â Â "sleeping pill". >> Â Â *BUT* the data itself do not contain this information: Â You just >> have to know that obs. 1 & 11, 2 & 12,... >> Â Â each or of the same patient. >> Â I propose to add a third variable, an "ID" (factor), and also change >> the example, using that ID. .. I changed the sleep.Rd example .. (but not using the ID), using with(.., and t.test(x,y,..)) the example also in R-patched, so at least we do *not* work against the docs there. ... if needed you can always change that more > > It still remains that there is no formula interface for paired tests > which is a bit of a nuisance, at least paedagogically. > > "y ~ g + strata(ID)" is a definite possibility for "long"-format data, > with the paired test being the extreme case, but it doesn't carry too > well to the wilcox.test, where the natural large-strata stratified > analysism does not reduce to the signed-rank test. > > I don't really think it is that harmful to have to assume that the data > set is sorted by pair (or pair within "group") and say so in the > documentation. For the t-test (or other "paired" situations) I could -- just barely -- agree. But in a more general case, I think just assuming observations ordered in a certain way, and giving complete non-sense answers when they are ordered differently, is something very dangerous, and I would not want to support such unsafe practice by a formula notation. More natural (to me) than y ~ g + strata("ID"), would be to use y ~ g | ID in this case. I have to run ... for the rest of the day ... and I didn't understand the "problem" with wilcox.test. > The NA handling mess-up is kind of serious, though. There > must be a similar issue with subset=, but it is easier in that case to > require user discretion. > >> >> Â *As* the sleep data is really about a paired experiment, I would >> feel very bad as a statistics professor, >> Â not to perform a paired test. Â But there's no need to do it using >> the formula interface. >> >> 2) t.test: documentation / implementation: >> Â a) NA handling: Â We have this on the help page >> >> Arguments: >> Â .......... >> na.action: a function which indicates what should happen when the data >> Â Â Â Â Â contain â€˜NAâ€™s. Â Defaults to â€˜getOption("na.action")â€™. > > > Yes. I would suggest that a simple workaround for the observed behaviour > Â is to zap this option and internally hardcode it to na.pass. > t.test.default is doing NA removal anyway, so the only possible point of > using getOption would be if it were set na.fail globally (does anyone > ever do that?). Some people may.. As they have been allowed to use t.test that way for a very long time, it seems dangerous to change now > (BTW, model.frame has na.action formally defaulting to na.fail, but > internally circumvents it. Terribly confusing.) > >> ..... >> Details: >> Â ......... >> Â Â Missing values are removed (in pairs if â€˜pairedâ€™ is â€˜TRUEâ€™). >> >> and I'd say that 'Details' should be a bit more specific: I currently >> describes what happens in t.test.default(), >> and that is called from t.test.formula *after* it has applied >> na.action in building the model.frame. >> >> >>> (I se that example(t.test) is still performing the incorrect test. I >>> thought Martin was on the warpath agaist it at some point?) >> >> Yes! Â The sleep data is about a paired experiment, and we are bad >> teachers of statistics, >> if we advertize to do an unpaired test >> (we could do it *after* doing it paired, and *with* a comment saying >> that this is sub-optimal, >> Â losing power, ...,) >> >>> The other option is to document what currently happens. >> >> I would not want that... unless "we must" because of dozens of cases >> people use this unfortunately. >> >> The sleep data is a bad example for non-paired t-tests, see above, >> and, as long it's missing the 'ID' code, actually a bad example of a >> data set... >> >> and I still agree with whoever had decided that "paired=TRUE" and >> 'formula' Â should not be allowed to go toegether... >> >> Martin

Thomas Lumley wrote: > On Sat, 14 Aug 2010, Peter Dalgaard wrote: > >> r-bugs@r-project.org wrote: >>> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14359 >>> >>> >>> >>> >>> >>> --- Comment #1 from jwdietrich <j.w.dietrich.bz@medizinische-kybernetik.de> 2010-08-13 18:51:20 --- >>> I have just learned from Thomas Lumley that "if there is a bug it is only that >>> t.test.formula doesn't throw an error when given the paired=TRUE option. >> Before anyone gets carried away, let me remind you of example(sleep). >> I.e., we're actually using the current feature, even though we document >> it not to exist... > > Um, no. = sleep) > asks for a two-sample t-test. It's conf> t.test(extra ~ group, data using, because the previous line in the example asks for a paired t-test on the same data, but it is not inconsistent with the documentation. Um, yes. In example(sleep), we do (well, did, now that Martin changed it.) >> (I se that example(t.test) is still performing the incorrect test. I >> thought Martin was on the warpath agaist it at some point?) > > The example is statistically incorrect but computationally correct: it is performing a two-sample t-test, and does not use paired=TRUE > >> The other option is to document what currently happens. > > I really don't think that's an option: > > "The formula interface with paired=TRUE will delete missing data and then assume that the non-missing data for the lhs and rhs still form pairs in the order in which they appear in the file" I was thinking more like "Beware that this is unreliable if there are missing values in data." > > > It would be an option to fix the missing-data handling so that paired=TRUE works when the data are correctly ordered, but I really don't like assuming a data frame just happens to be in the right order. Vince Carey's gee package assumes this and in my experience it does fairly often lead to errors. That is a point. On the other hand, it is hard to guard against users assuming R to have extra-sensory perception: If you give only y and g, how else would R know what the pairs are? (Like assuming R to infer that if g goes 1 2 1 2 1 1 2 2, then there are really 6 pairs, one with a missing 2 and one with a missing 1. I suppose that is the sort of thing happening with gee()?) > > A more helpful option would be to allow paired=id (or paired=~id) rather than paired=TRUE in the formula interface, where id is a variable identifying the pairs. Or perhaps outcome~group+cluster(id). or outcome ~ group | id, as Martin suggested. The paired = ~id has the perspective that we could actually fudge in paired = TRUE as equivalent to ~foo with foo set to unsplit(list(1:N,1:N), g). With a stern warning, of course. Even if we don't want that, it gives an opportunity to make paired=TRUE defunct ("paired must be a formula"). > > -thomas > > Thomas Lumley > Professor of Biostatistics > University of Washington, Seattle > > > -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd.mes@cbs.dk Priv: PDalgd@gmail.com

On Sat, Aug 14, 2010 at 19:35, Peter Dalgaard <pd.mes@cbs.dk> wrote: > Thomas Lumley wrote: >> On Sat, 14 Aug 2010, Peter Dalgaard wrote: >> >>> r-bugs@r-project.org wrote: >>>> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14359 >>>> >>>> >>>> >>>> >>>> >>>> --- Comment #1 from jwdietrich <j.w.dietrich.bz@medizinische-kybernetik.de> Â 2010-08-13 18:51:20 --- >>>> I have just learned from Thomas Lumley that "if there is a bug it is only that >>>> t.test.formula doesn't throw an error when given the paired=TRUE option. >>> Before anyone gets carried away, let me remind you of example(sleep). >>> I.e., we're actually using the current feature, even though we document >>> it not to exist... >> >> Um, no. > = sleep) >> asks for a two-sample t-test. It's conf> t.test(extra ~ group, data > using, because the previous line in the example asks for a paired t-test > on the same data, but it is not inconsistent with the documentation. > > Um, yes. In example(sleep), we do (well, did, now that Martin changed it.) > > >>> (I se that example(t.test) is still performing the incorrect test. I >>> thought Martin was on the warpath agaist it at some point?) >> >> The example is statistically incorrect but computationally correct: it is performing a two-sample t-test, and does not use paired=TRUE Sure. But statistically incorrect in an example for the t.test (for many people *the* thing they learn from (inferential) statistics)) is what brought me "on the warpath" (PD). As I also said in a different place in the thread, we can leave it there, if we also do the paired test, and use comments explaining why/ that "paired" is what really makes sense. >> >>> The other option is to document what currently happens. >> >> I really don't think that's an option: >> >> "The formula interface with paired=TRUE will delete missing data and > then assume that the non-missing data for the lhs and rhs still form > pairs in the order in which they appear in the file" > > I was thinking more like "Beware that this is unreliable if there are > missing values in data." > >> >> >> It would be an option to fix the missing-data handling so that > paired=TRUE works when the data are correctly ordered, but I really > don't like assuming a data frame just happens to be in the right order. Exactly, that's the point. I agree fully. > Vince Carey's gee package assumes this and in my experience it does > fairly often lead to errors. > > That is a point. On the other hand, it is hard to guard against users > assuming R to have extra-sensory perception: If you give only y and g, > how else would R know what the pairs are? (Like assuming R to infer that > if g goes 1 2 1 2 1 1 2 2, then there are really 6 pairs, one with a > missing 2 and one with a missing 1. I suppose that is the sort of thing > happening with gee()?) > >> >> A more helpful option would be to allow paired=id (or paired=~id) rather > than paired=TRUE in the formula interface, where id is a variable > identifying the pairs. Or perhaps outcome~group+cluster(id). > > or outcome ~ group | id, as Martin suggested. The paired = ~id has the > perspective that we could actually fudge in paired = TRUE as equivalent > to ~foo with foo set to unsplit(list(1:N,1:N), g). With a stern warning, > of course. Even if we don't want that, it gives an opportunity to make > paired=TRUE defunct ("paired must be a formula"). I'm liking the idea ("paired = ~id ") quite a bit (noting that paired = TRUE/FALSE would still be there for t.test.default()). You'd get my vote for switching there. Martin

This seems to have become a wish to enhance the formula interface