Bug 14630 - xtabs excludes NAs, even when exclude=NULL,na.action=na.pass
Summary: xtabs excludes NAs, even when exclude=NULL,na.action=na.pass
Status: REOPENED
Alias: None
Product: R
Classification: Unclassified
Component: Analyses (show other bugs)
Version: R 2.13.1 patched
Hardware: All All
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2011-07-16 13:40 UTC by Timothy Bates
Modified: 2017-01-24 11:15 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Timothy Bates 2011-07-16 13:40:51 UTC
As this complete example shows, xtabs removes NAs from factors, even when they are explicitly requested to be shown. This could lead to errors in analysis and reporting 

test = data.frame(a=as.factor(c(1,NA)))
xtabs(~a,exclude=NULL,na.action=na.pass, data=test)
# a
#    1 <NA> 
#    1    1 
> test = data.frame(a=as.factor(c(1,NA)))
> xtabs(~a,exclude=NULL,na.action=na.pass, data=test)
# a
# 1 
# 1
Comment 1 Peter Dalgaard 2011-07-28 14:22:11 UTC
exclude= is used "when forming the levels of the classifying factors". This refers to cases where (e.g.) a numerical vector is converted. If the classifier is a factor already, this conversion is not done and the argument is ignored. To change the way a factor handles NA, see addNA().
Comment 2 Milan Bouchet-Valat 2017-01-18 15:28:19 UTC
Even if it's (very implicitly) document, this behavior is really counter-intuitive. This can be seen from the multiple posts about this on the Web. Why should factors behave differently from character vectors in that regard?

FWIW, addNA is only a partial replacement since one needs to repeat it for every variable.
Comment 3 Martin Maechler 2017-01-19 10:34:07 UTC
(In reply to Milan Bouchet-Valat from comment #2)
> Even if it's (very implicitly) document, this behavior is really
> counter-intuitive. This can be seen from the multiple posts about this on
> the Web. 

"the web" is a bit too large for a reference. Large parts of it are full of crap, so 
we do like precise references.

> Why should factors behave differently from character vectors in that regard?

Because factors can have missing values, and <NA> levels, and "NA" levels... and yes, that
is not easy for users.  But they idea is that the factor is / has been built with care and functions dealing with factors should use the same distinguishing care.

 
> FWIW, addNA is only a partial replacement since one needs to repeat it for
> every variable.

For beginners this may be cumbersome, indeed.  But you will know that for a data.frame 'd'

  d[] <- lapply(d, addNA, ifany=TRUE)    

is not such a large job.
However, I agree that for several reasons it may make sense to 
add another optional argument to 'xtabs' to help users and make the task more self contained.
There are several possibilities and caveats.  Can you start a discussion on the R-devel mailing list, please?
If you agree, I can reopen this as 'Wishlist' item.

Bonnes salutations,
Martin
Comment 4 Milan Bouchet-Valat 2017-01-19 13:00:54 UTC
Sure. Here it is:
https://stat.ethz.ch/pipermail/r-devel/2017-January/073621.html

(The Web is full of crap, of course. I was referring in particular to threads on R-help and stack overflow, showing people struggling with xtabs() and NAs.)
Comment 5 Martin Maechler 2017-01-24 11:15:16 UTC
In the mean time, I have found a real bug with such NA's and  'sparse = TRUE'.
Also, we are treating this as wishlist item, and tend to agree that the current code  which contains a  'x[is.na(x)] <- 0'    is clearly sub optimal.