Bug 16614 - Predict problem nlme
Summary: Predict problem nlme
Status: ASSIGNED
Alias: None
Product: R
Classification: Unclassified
Component: Analyses (show other bugs)
Version: R 3.1.3
Hardware: All All
: P5 enhancement
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2015-11-27 16:54 UTC by Clayton
Modified: 2016-04-17 14:17 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Clayton 2015-11-27 16:54:37 UTC
When I try to use the "predict" with a non-linear mixed model to a new dataset (newdata), the following error occurs:

 Error in t(b[[i]])[groups[[i]], rmapRel[[i]][[nm]], drop = FALSE] : 
  subscript out of bounds 
  
More details:

 getParsNlme(plist, object$map$fmap, object$map$rmapRel, object$map$bmap, 
    grpsRev, fix, ranVec, ran, level[i], N) 
7 data.frame(dataMix, getParsNlme(plist, object$map$fmap, object$map$rmapRel, 
    object$map$bmap, grpsRev, fix, ranVec, ran, level[i], N)) 
6 eval(modForm, data.frame(dataMix, getParsNlme(plist, object$map$fmap, 
    object$map$rmapRel, object$map$bmap, grpsRev, fix, ranVec, 
    ran, level[i], N)))
Comment 1 Martin Maechler 2015-12-01 08:58:37 UTC
Please provide a minimal reproducible example (i.e., self contained; using standarad data set; generated data; data via dput(..) and 'exdat <- ....' i.e., from beginning to end the exact R code to reproduce your problem.

In many cases,  predict(*, newdata=.)   does work fine for nlme() fitted models.
Comment 2 Clayton 2015-12-01 16:15:42 UTC
#CMR

data1 = structure(list(parcela = c(11457, 11570, 11570, 15450, 11526, 
14692, 13318, 11566, 12396, 14637, 15415, 15415, 14684, 14688, 
9926, 11566, 11567, 11575, 15401, 15415), dap1 = c(5, 5, 5, 5, 
5, 5, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.2, 5.2, 5.2, 
5.2, 5.2, 5.2), dap2 = c(5.1, 5.1, 5.1, 5.4, 5.2, 6.4, 7.7, 5.2, 
6.2, 8, 5.2, 5.4, 5.5, 5.2, 5.3, 5.3, 5.7, 5.5, 6.8, 5.1), idade1 = c(60.2546201232033, 
33.1, 57.3634496919918, 27.95509924709, 45.5, 26.3819301848049, 
23.7535934291581, 33.1, 23.9, 20.2381930184805, 25.4965027758751, 
51.0225872689938, 28.5831622176591, 45.4045174537988, 60.6, 45.305954825462, 
45.305954825462, 33.1, 22.9979466119097, 32.8213552361396), idade2 = c(72.3778234086242, 
45.305954825462, 69.4866529774127, 40.7063655030801, 58.217659137577, 
37.0112168225708, 35.7782340862423, 45.305954825462, 32.8542094455852, 
30.4558521560575, 32.8213552361396, 62.9486652977413, 39.2279260780287, 
61.4045174537988, 73.1334702258727, 57.3634496919918, 57.3634496919918, 
45.305954825462, 38.0123203285421, 51.0225872689938)), .Names = c("parcela", 
"dap1", "dap2", "idade1", "idade2"), row.names = c(NA, 20L), class = "data.frame")

mod1 = nlme(dap2 ~ dap1*exp(-b1*(idade2**b2-idade1**b2)),
	 control=lmeControl(nIterEM=15200, msMaxIter=15200, msVerbose=T),
	 fixed=b1+b2~1,
	 data=data1,
	 random=b2 ~1 | parcela,
	 start=c(b1=100,b2=-1),
	 na.action='na.exclude')
	 
data1.2 = structure(list(parcela = c(14653, 15450, 11526, 14678, 12436, 
12436, 14688, 14692, 9926, 11566, 15450, 15450, 15468, 12437, 
9926, 12351, 14637, 14637, 15450, 15450), dap1 = c(5.2, 5.2, 
5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.4, 
5.4, 5.4, 5.4, 5.4, 5.4), dap2 = c(5.2, 5.6, 5.8, 6.9, 7.5, 8.8, 
5.3, 5.7, 5.4, 5.4, 5.4, 5.4, 5.4, 6.4, 5.5, 7.4, 9, 8.4, 5.4, 
5.6), idade1 = c(23.5564681724846, 53.5523613963039, 58.217659137577, 
21.5852156057495, 24.7, 24.7, 61.4045174537988, 26.3819301848049, 
73.1334702258727, 57.3634496919918, 27.95509924709, 27.95509924709, 
25.380704616322, 24.7, 84.435318275154, 59.6960985626283, 20.2381930184805, 
20.2381930184805, 40.7063655030801, 53.5523613963039), idade2 = c(33.4784394250513, 
65.9712525667351, 69.0595482546201, 31.2443531827515, 37.0924024640657, 
37.0924024640657, 74.3490759753593, 37.0112168225708, 84.435318275154, 
69.4866529774127, 40.7063655030801, 40.7063655030801, 40.6735112936345, 
37.0924024640657, 98.0041067761807, 70.570841889117, 30.4558521560575, 
30.4558521560575, 53.5523613963039, 65.9712525667351)), .Names = c("parcela", 
"dap1", "dap2", "idade1", "idade2"), row.names = 21:40, class = "data.frame")

# Here ocorrs the error:
#Error in b[[i]][rmapRel[[i]][[nm]], groups[[i]]] : 
  #subscript out of bounds
  
data1.2$dap2est = predict(mod1, newdata=data1.2)
Comment 3 Martin Maechler 2015-12-01 17:22:42 UTC
Thank you.  That is well reproducible.

The "bug" is more with you than with `nlme`.... though I agree it should give a more helpful error message:

You take `parcela` as grouping variable, i.e. random effect.
Now the default  `predict()` call tries to also take the values of `parcela` into account when predicting {so it will predict MAP ..},
*and* it fails here, because in your `data1.2`  you have  `parcela` values that are not in the data with which the model was fit :

> with(data1, table(parcela))
parcela
 9926 11457 11526 11566 11567 11570 11575 12396 13318 14637 14684 14688 14692 15401 15415 
    1     1     1     2     1     2     1     1     1     1     1     1     1     1     3 
15450 
    1 
> with(data1.2, table(parcela))
parcela
 9926 11526 11566 12351 12436 12437 14637 14653 14678 14688 14692 15450 15468 
    2     1     1     1     2     1     2     1     1     1     1     5     1 
> 
-----------
Also, given that (in your data) you have so many `parcela` values which appear only once (and two appear twice, one three times); that  RE is a bit doubtful.

Nevertheless, if you want you *can* predict() from your fitted model, but must basically "not consider" parcela at all but just the fixed effects.
You can do that by asking for  'level = 0' prediction instead of the default which here is 'level = 1' :

> summary(
+     pred.0 <- predict(mod1, newdata = data1.2, level = 0)
+     )
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  5.212   5.381   5.921   6.132   6.391   8.709 
> 

--------

So the only bug here is that the error message is not much of help to the average user.  
The `lme4` package is better here I think ((but  `nlmer()` may not be for prime-time use either)),
and I'll see if I can tweak predict.nlme() to give a better error message for such a case.
Comment 4 Clayton 2015-12-01 17:41:48 UTC
Good afternoon,

I believe that you reported is not the problem because the same situation occurred in this example works when treated by mixed linear models adjusted for function 'lme'. For these cases it works, see the CMR below.

Necessary for the function of the behavior to predict nlme be the same as the lme: predicts for the existing 'parcela' values and makes NA for others 'parcela' values not present in data1, for in them staying NA will use to predict function with "level = 0".

So what happens today with new data to predict with models fitted with nlme still a problem.

data1=structure(list(parcela=c(11457,11570,11570,15450,11526,14692,13318,11566,12396,14637,15415,15415,14684,14688,9926,11566,11567,11575,15401,15415),dap1=c(5,5,5,5,5,5,5.1,5.1,5.1,5.1,5.1,5.1,5.1,5.1,5.2,5.2,5.2,5.2,5.2,5.2),dap2=c(5.1,5.1,5.1,5.4,5.2,6.4,7.7,5.2,6.2,8,5.2,5.4,5.5,5.2,5.3,5.3,5.7,5.5,6.8,5.1),idade1=c(60.2546201232033,33.1,57.3634496919918,27.95509924709,45.5,26.3819301848049,23.7535934291581,33.1,23.9,20.2381930184805,25.4965027758751,51.0225872689938,28.5831622176591,45.4045174537988,60.6,45.305954825462,45.305954825462,33.1,22.9979466119097,32.8213552361396),idade2=c(72.3778234086242,45.305954825462,69.4866529774127,40.7063655030801,58.217659137577,37.0112168225708,35.7782340862423,45.305954825462,32.8542094455852,30.4558521560575,32.8213552361396,62.9486652977413,39.2279260780287,61.4045174537988,73.1334702258727,57.3634496919918,57.3634496919918,45.305954825462,38.0123203285421,51.0225872689938)),.Names=c("parcela","dap1","dap2","idade1","idade2"),row.names=c(NA,20L),class="data.frame")

mod1.1 = lme(dap2 ~ dap1+idade1+idade2,
            control=lmeControl(nIterEM=15200, msMaxIter=15200, msVerbose=T),
			data = data1,
            random=~1 | parcela,
            na.action='na.exclude')

			
data1.2=structure(list(parcela=c(14653,15450,11526,14678,12436,12436,14688,14692,9926,11566,15450,15450,15468,12437,9926,12351,14637,14637,15450,15450),dap1=c(5.2,5.2,5.2,5.2,5.2,5.2,5.2,5.2,5.3,5.3,5.3,5.3,5.3,5.3,5.4,5.4,5.4,5.4,5.4,5.4),dap2=c(5.2,5.6,5.8,6.9,7.5,8.8,5.3,5.7,5.4,5.4,5.4,5.4,5.4,6.4,5.5,7.4,9,8.4,5.4,5.6),idade1=c(23.5564681724846,53.5523613963039,58.217659137577,21.5852156057495,24.7,24.7,61.4045174537988,26.3819301848049,73.1334702258727,57.3634496919918,27.95509924709,27.95509924709,25.380704616322,24.7,84.435318275154,59.6960985626283,20.2381930184805,20.2381930184805,40.7063655030801,53.5523613963039),idade2=c(33.4784394250513,65.9712525667351,69.0595482546201,31.2443531827515,37.0924024640657,37.0924024640657,74.3490759753593,37.0112168225708,84.435318275154,69.4866529774127,40.7063655030801,40.7063655030801,40.6735112936345,37.0924024640657,98.0041067761807,70.570841889117,30.4558521560575,30.4558521560575,53.5523613963039,65.9712525667351)),.Names=c("parcela","dap1","dap2","idade1","idade2"),row.names=21:40,class="data.frame")

#Works fine! Why predict with nlme dont work?
data1.2$dap2est = predict(mod1.1, newdata=data1.2)			


#Others values of 'parcela' estimated by level=0
data1.2$dap2estN = predict(mod1.1, newdata=data1.2, level=0)			

#Replace the NA in first predict by estimated with level=0, mantain values estimated for existing 'parcela' values.
data1.2$dap2est = ifelse(is.na(data1.2$dap2est), 
       data1.2$dap2estN,
       data1.2$dap2est) 



data1.2$dap2estN = NULL
Comment 5 Martin Maechler 2015-12-02 19:24:46 UTC
(In reply to Clayton from comment #4)
> Good afternoon,
> 
> I believe that you reported is not the problem because the same situation
> occurred in this example works when treated by mixed linear models adjusted
> for function 'lme'. For these cases it works, see the CMR below.
> 
> Necessary for the function of the behavior to predict nlme be the same as
> the lme: predicts for the existing 'parcela' values and makes NA for others
> 'parcela' values not present in data1, for in them staying NA will use to
> predict function with "level = 0".
> 
> So what happens today with new data to predict with models fitted with nlme
> still a problem.


Good point: Such level=1  prediction works with  predict( lme(..), ...) 
in the way you show --- or more easily even with the simple
   example(predict.lme)
example.

I guess this will not have ever worked for nlme() and that part of 'predict()' is quite different for nlme than for lme,
 but I agree that would be desirable, even a bit more than just "wishlist".

There is another "fixed sigma" patch under evaluation for the 'nlme' package.
After that, I hope to get back to see how to achieve your wish.
Comment 6 Clayton 2015-12-02 19:35:25 UTC
Hi,

To provide a little more arguments and emphasize that my request is not just a "wish", I note that the external behavior of any module in this case, function, should not be different when such a function has the same name. That is, the semantic should be maintained, even if the entry (if the model type) switch.

Thank you very much, I look very anxious.
Comment 7 Clayton 2015-12-06 17:29:20 UTC
Hi,

Have any deadline for fix problem?

I have had a lot of extra work to make the prediction of nonlinear models using new databases
Comment 8 Clayton 2016-03-18 15:55:54 UTC
Hi Martin,

Any plans to fix this bug?

Best regards
Comment 9 Martin Maechler 2016-03-21 16:34:08 UTC
(In reply to Clayton from comment #8)
> Hi Martin,
> 
> Any plans to fix this bug?

Well, as always, I'm busy .. but your poking helped:

I will soon commit changes to the development version of nlme,
  https://svn.r-project.org/R-packages/trunk/nlme/
which will implement the new feature (;-)  you have so persistently been requiring.
Indeed, I've found that a relatively simple change to the auxiliary function
getParsNlme()  does indeed solve the problem.
Comment 10 Clayton 2016-03-21 16:39:15 UTC
Thanks!
Comment 11 Clayton 2016-04-17 14:17:30 UTC
I updated the package to version 3.1-127 and the "predict" works without errors. One suggestion is to give the option for the user to tell if if some level of mixed effect is not present, the estimate is made only with fixed effects, rather than always returning "NA". So he can choose to get "NA" or an estimate using only fixed effects.