Bugzilla – Full Text Bug Listing

Summary: | Incorrect prediction with original data in loess | ||
---|---|---|---|

Product: | R | Reporter: | Peter Saunders <petersaunders1> |

Component: | Models | Assignee: | R-core <R-core> |

Status: | CLOSED FIXED | ||

Severity: | normal | CC: | maechler |

Priority: | P4 | ||

Version: | R-devel (trunk) | ||

Hardware: | All | ||

OS: | All |

Description
Peter Saunders
2015-11-03 12:41:57 UTC
After a bit of testing we think we have figured out what the issue is. The issue is with the scaling of the 'robust' variable in the simpleLoess and predLoess functions. On line 133 of loess.R, in the function simpleLoess you find robust is scaled by the weights: robust <- weights * robust This is then stored into the loess$pars list towards the end of the function so that the robust variable which is stored in the loess object is scaled by the weights. On lines 289 and 310 of loess.R, in the function predLoess, the weights vector is multiplied by the robust vector in the call to C_loess_dfitse and C_loess_dfit respectively. Since the 'robust' value has already been multiplied by the weights we are calculating the new prediction passing the weights variable as the weights squared, which is wrong. Here is an example to demonstrate this: # ----- Example Code ----- # # Loess with weights set.seed(123) wt = runif(nrow(cars)) cars.wt = loess(dist ~ speed, cars, weights = wt, control = loess.control(surface="direct")) carsPred = predict(cars.wt) carsPredNew = predict(cars.wt, newdata=cars) # Divide robust by weights to remove re-scaling effect cars.wt.robustFixed = cars.wt cars.wt.robustFixed$pars$robust = cars.wt$pars$robust / cars.wt$weight carsPredNewFixed = predict(cars.wt.robustFixed, newdata=cars) all.equal(carsPred, carsPredNew, check.attributes = FALSE) # [1] "Mean relative difference: 0.02900143" # This is not right all.equal(carsPred, carsPredNewFixed, check.attributes = FALSE) # [1] TRUE # This is now good # ----- End of Example ----- # I think, for family="gaussian", that the weights argument (the 4th argument) in the calls to C_loess_dfitse and C_loess_dfit should not be multiplied by robust. The family = "symmetric" case is even more complicated, as the iterations can be > 1 and I am less sure if this is correct. I see that this is clearly a bug. You are right the robust case (family = "symmetric") needs special care... and so I will look at that particularly. (In reply to Martin Maechler from comment #2) > I see that this is clearly a bug. > You are right the robust case (family = "symmetric") needs special care... > and so I will look at that particularly. After lots of looking, I've also added a new feature: `iterTrace = TRUE` now allows to get some robustness iteration monitoring... and I also have added (incomplete) checks for the surface x family x statistics x trace.hat combinations, and had to look at the C code to see where and when 'robust' and 'weights' are used there. {{and there may be more simplification possible,..}} Commited to R-devel (svn rev ) -- to be ported to R '3.2.2 patched' subsequently. |