2014-08-28
2015-08-11
bzive 2014-08-28
The description for the ToothGrowth dataset in {datasets} is not clearly written and has been misconstrued by users based on my experience in a MOOC offering on Coursera.  

The description cites "the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Victamin C…with each of two delivery methods."  Some users take this to mean 10 guinea pigs receiving all treatments (thus a paired sample study), and others take this to mean 60 guinea pigs receving one of the treatments (thus an independent samples study).

The data source C. I. Bliss (1952) "The Statistics of Biosassay" actually cites the original study by Crampton, E.W. "The growth of the odontoblasts of the incisor tooth as a criterion of the vitamin C intake of the guinea pig.", published in The Journal of Nutrition, vol. 33, issue 5, May 1947, pp. 491-504.  The Crampton paper makes it clear that these data are 60 distinct guinea pigs, as odontoblasts measurements were taken under microscope for each guinea pig after the guinea pigs were sacrificed and has their teeth removed.  

Perhaps the ToothGrowth desscription could be modified to read "The response is the length of odontoblasts (teeth) in each of 60 guinea pigs, 10 for each combination of dose level of Vitamin C (0.5, 1, and 2 mg) and delivery method (orange juice or ascorbic acid)".
jrkuehner 2015-04-13
Edward Kuns 2015-04-17
I came across this bug in the context of the same MOOD, I imagine.  With some research, I found a lot of information about this dataset, including a copy of the original study from which the data was derived.  You can view this original study, in its entirety, at:


This is the original study from which Bliss, in his textbook of 1952, took the data.  (The raw data is not included in the paper.)

I recommend changing the R documentation for this data set to something like this:


The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs, each receiving one of three dose levels of Vitamin C (0.5, 1.0, and 2.0 mg) with one of two delivery methods (orange juice or an aqueous solution of ascorbic acid).  No guinea pig received a dose of zero as they would acquire scurvy at that dose.




A data frame with 60 observations on 3 variables.
[,1] 	len 	numeric 	Odontoblast length in microns.
[,2] 	supp 	factor 		Supplement type (VC or OJ).
[,3] 	dose 	numeric 	Dose in milligrams. 


C. I. Bliss (1952) The Statistics of Bioassay. Academic Press.


McNeil, D. R. (1977) Interactive Data Analysis. New York: Wiley.

Crampton, E. W. (1947) The Growth of the Odontoblast of the Incisor Teeth as a Criterion of Vitamin C Intake of the Guinea Pig. The Journal of Nutrition 33 (5): 491–504.


coplot(len ~ dose | supp, data = ToothGrowth, panel = panel.smooth,
       xlab = "ToothGrowth data: length vs dose, given type of supplement")
bzive 2015-04-17
Edward, the description you wrote is better than the one I proposed. The key here is to get the fact that there are 60 guinea pigs represented in the dataset.

Yes, that link to the original study is the same study I found in the journal at the MIT libraries.
Enrique Pérez 2015-07-19
Other issue is that `dose` is not measured in mg but in mg/day.
Brian Ripley 2015-08-11

Fixed for R 3.2.2.
Fixed for R 3.2.2.
Brian Ripley 2015-08-11
Clarification: fixed for 3.2.2 patched -- 3.2.2 is in code freeze.