Bug 16665

Summary: dummy.coef fails when transformations are included in formula
Product: R Reporter: Werner A. Stahel <stahel>
Component: AnalysesAssignee: R-core <R-core>
Status: ASSIGNED ---    
Severity: enhancement CC: maechler
Priority: P5    
Version: R 3.2.3   
Hardware: Other   
OS: Linux   
Attachments: dummy.coef.fix.R

Description Werner A. Stahel 2016-01-11 12:32:10 UTC
Created attachment 1999 [details]
dummy.coef.fix.R

The function  dummy.coef.lm  fails in more complex cases, notably when terms
include variables that are transformed in the  formula  of the model.

r.lm <- lm(Fertility ~ cut(Agriculture, breaks=4) + Infant.Mortality,
             data=swiss)
dummy.coef(r.lm)

Error in model.frame.default(Terms, dummy, na.action = function(x) x,  : 
  factor cut(Agriculture, breaks = 4) has new level (0.9995,1]

The problem is that ii works with  all.vars , which returns untransformed
variables. This is fixed by using  model.frame  instead -- which is needed
later in the function anyway.

The function  dummy.coef.fix  does this.

dummy.coef.fix(r.lm)

Thus,  dummy.coef.lm  should be replaced by  dummt.coef.fix .

In the function, there is a warning
warning("some terms will have NAs due to the limits of the method")
I wonder why this is a "limit' (->limitation) of the method.
If some interaction coefficients are undetermined because the respective
combination of levels is not available, NA is the appropriate result.
Are there other cases?

I have extended the function to include confidence intervals and t-tests
and call the extended function  allcoef .
The latter are what is shown by summary.lm, except that for the (dumy)
variable that is eliminated by the  contrasts . For treatment contrasts,
the added information is trivial (0 with 0 standard error), but for
sum (or weighted sum) contrasts, it is not, and for other contrasts, it may
still recover more useful information. 
The function would need some polishing to work in general contexts.
Let me know if you are interested.

Werner Stahel, Jan 4, 2016
Comment 1 Martin Maechler 2016-01-22 07:45:43 UTC
Thank you, Werner.

I can confirm that your version works for the example where the current `stats` package one fails.
Your version also fixes the similar problem reported to R-help 
  "bug in dummy.coef?"
  https://stat.ethz.ch/pipermail/r-help/2013-October/362106.html

I've spent a bit of time because your version had quite a few changes that were not necessary (you renamed three of the internal variables) and your version must have come from simple "print()"ing of the function definition in an older version of R, so your code misses the comments from the source code and e.g., the newer  anyNA() use.
Note that the most current source (of "R-devel") is always (for this function)
 https://svn.r-project.org/R/trunk/src/library/base/R/dummy.coef.R
((but to find this file, you most easly get a source "tarball" from one of the places linked from https://www.r-project.org/sources.html  -- note the daily versions provided by "SfS"!) or if you prefer the web, you can use the 'site:svn.r-project.org/R' trick :
https://www.google.ch/search?q=site:svn.r-project.org/R++%27dummy.coef%27&ie=utf-8&oe=utf-8&gws_rd=cr&ei=1t2hVqqGDoXxUt_spugL))

Your question about the warning:  I also find it a bit "strange".
One could replace  "due to the limits of the method" 
by "due to the design" (meaning the linear model design matrix),
but I think you are suggesting that *no* warning should be given there, right?

I did not easily find a case that triggers the warning.  Do you have one?


Best regards,
Martin