When using loess with "direct" surface and weights the result of 'predict' is not the same as the fitted values. This appears similar to Bug 611, which was fixed, but when using weights. This only occurs for surface="direct", but for families both "symmetric" and "gaussian", statistics both "approximate" and "exact" and trace.hat both "exact" and "approximate". # ----- Example Code ----- # # Loess cars.lo = loess(dist ~ speed, cars, control = loess.control(surface="direct")) carsPred = predict(cars.lo) carsPredNew = predict(cars.lo, newdata=cars) all.equal(carsPred, carsPredNew, check.attributes = FALSE) #[1] TRUE # This is fine, equivalent to the fix of bug 611 # Loess with weights set.seed(123) wt = runif(nrow(cars)) cars.wt = loess(dist ~ speed, cars, weights = wt, control = loess.control(surface="direct")) carsPred = predict(cars.wt) carsPredNew = predict(cars.wt, newdata=cars) all.equal(carsPred, carsPredNew, check.attributes = FALSE) #[1] "Mean relative difference: 0.02900143" # This is not right # Look at mean square error weighted.mean((cars$dist - predict(cars.wt))^2, wt) weighted.mean((cars$dist - predict(cars.wt, newdata=cars))^2, wt) #[1] 184.4413 #[1] 186.9075 # The error is (always) worse for the re-predict method, # which suggests this is where the problem is # ----- End of Example ----- #

After a bit of testing we think we have figured out what the issue is. The issue is with the scaling of the 'robust' variable in the simpleLoess and predLoess functions. On line 133 of loess.R, in the function simpleLoess you find robust is scaled by the weights: robust <- weights * robust This is then stored into the loess$pars list towards the end of the function so that the robust variable which is stored in the loess object is scaled by the weights. On lines 289 and 310 of loess.R, in the function predLoess, the weights vector is multiplied by the robust vector in the call to C_loess_dfitse and C_loess_dfit respectively. Since the 'robust' value has already been multiplied by the weights we are calculating the new prediction passing the weights variable as the weights squared, which is wrong. Here is an example to demonstrate this: # ----- Example Code ----- # # Loess with weights set.seed(123) wt = runif(nrow(cars)) cars.wt = loess(dist ~ speed, cars, weights = wt, control = loess.control(surface="direct")) carsPred = predict(cars.wt) carsPredNew = predict(cars.wt, newdata=cars) # Divide robust by weights to remove re-scaling effect cars.wt.robustFixed = cars.wt cars.wt.robustFixed$pars$robust = cars.wt$pars$robust / cars.wt$weight carsPredNewFixed = predict(cars.wt.robustFixed, newdata=cars) all.equal(carsPred, carsPredNew, check.attributes = FALSE) # [1] "Mean relative difference: 0.02900143" # This is not right all.equal(carsPred, carsPredNewFixed, check.attributes = FALSE) # [1] TRUE # This is now good # ----- End of Example ----- # I think, for family="gaussian", that the weights argument (the 4th argument) in the calls to C_loess_dfitse and C_loess_dfit should not be multiplied by robust. The family = "symmetric" case is even more complicated, as the iterations can be > 1 and I am less sure if this is correct.

I see that this is clearly a bug. You are right the robust case (family = "symmetric") needs special care... and so I will look at that particularly.

(In reply to Martin Maechler from comment #2) > I see that this is clearly a bug. > You are right the robust case (family = "symmetric") needs special care... > and so I will look at that particularly. After lots of looking, I've also added a new feature: `iterTrace = TRUE` now allows to get some robustness iteration monitoring... and I also have added (incomplete) checks for the surface x family x statistics x trace.hat combinations, and had to look at the C code to see where and when 'robust' and 'weights' are used there. {{and there may be more simplification possible,..}} Commited to R-devel (svn rev ) -- to be ported to R '3.2.2 patched' subsequently.