Bug 16985 - median(integer or logical) return various types of results, while quantile does not
Summary: median(integer or logical) return various types of results, while quantile do...
Status: CLOSED Works as documented
Alias: None
Product: R
Classification: Unclassified
Component: Misc (show other bugs)
Version: R 3.3.*
Hardware: Other Other
: P5 minor
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2016-07-08 17:57 UTC by Bill Dunlap
Modified: 2018-08-15 07:00 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bill Dunlap 2016-07-08 17:57:35 UTC
If you give median() a vector of integers or logicals the result is sometimes numeric and sometimes the type of the input data, depending on whether interpolation was needed to compute the median.

  > str(median(c(TRUE,TRUE,TRUE)))
   logi TRUE
  > str(median(c(1:3)))
   int 2
  > str(median(c(TRUE,TRUE,TRUE,TRUE)))
   num 1
  > str(median(1:4))
   num 2.5

quantile(x, 0.5) seems to always return a numeric vector when given logicals or doubles.

This could affect the growing number of people who like to call R functions from C++ code and don't check the type of the result.
Comment 1 Stephen Milborrow 2018-08-14 20:10:11 UTC
--- Some more details and a proposed fix ---

When given a non-numeric argument x, median.default incorrectly
returns class numeric if the length of x is even.  Examples:

> median(c(TRUE))
[1] TRUE       # correct

> median(c(TRUE, TRUE))
[1] 1          # wrong

> median(c(TRUE, TRUE, TRUE))
[1] TRUE       # correct

> median(c(TRUE, TRUE, TRUE, TRUE))
[1] 1          # wrong

This is because median.default invokes mean when length(x) is even.
The offending lines in stats::median.default are (R version 3.5.1):

  if(n %% 2L == 1L) sort(x, partial = half)[half]
  else mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])

Adopting the principle that median should return the same class
it is given, a proposed fix replaces those lines with:

  if((is.double(x) || is.complex(x)) && (n %% 2L == 0L))
      mean(sort(x, partial = half + 0L:1L)[half + 0L:1L]) # interpolate
  else
      sort(x, partial = half)[half]

In order to return the same class as it is given, the fixed code interpolates
only if x is continuous. Notably, it doesn't interpolate for integer x.  (It does interpolate for Date objects, since is.double() returns TRUE for such objects.)

There is an implementation dependency: when x is not continuous and
length(x) is even, the function returns the _lower_ of the two "middle"
values. (An alternative implementation would return the upper of these
two values.)  The old implementation returned the mean of these two
values.

Tests for the above fix are

  stopifnot(identical(median(c(3)), 3))
  stopifnot(identical(median(c(3, 3)), 3))
  stopifnot(identical(median(c(3, 3, 4)), 3))
  stopifnot(identical(median(c(4, 4, 3)), 4))
  stopifnot(identical(median(c(4, 4, 4, 3)), 4))
  stopifnot(identical(median(c(4, 3, 4, 3)), 3.5)) # interpolates

  stopifnot(identical(median(c(3+5i)), 3+5i))
  stopifnot(identical(median(c(3+5i, 3+5i)), 3+5i))
  stopifnot(identical(median(c(3+5i, 3+5i, 4+6i)), 3+5i))
  stopifnot(identical(median(c(4+6i, 4+6i, 4+6i, 3+5i)), 4+6i))
  stopifnot(identical(median(c(4+6i, 3+5i, 4+6i, 3+5i)), 3.5+5.5i)) # interpolates

  stopifnot(identical(median(c(3L)), 3L))
  stopifnot(identical(median(c(3L, 3L)), 3L))
  stopifnot(identical(median(c(3L, 3L, 4L)), 3L))
  stopifnot(identical(median(c(4L, 4L, 4L, 3L)), 4L))
  stopifnot(identical(median(c(4L, 3L, 4L, 3L)), 3L)) # does not interpolate, implementation dependent

  stopifnot(identical(median(c(TRUE)), TRUE))
  stopifnot(identical(median(c(TRUE, TRUE)), TRUE))
  stopifnot(identical(median(c(TRUE, TRUE, FALSE)), TRUE))
  stopifnot(identical(median(c(FALSE, FALSE, FALSE, TRUE)), FALSE))
  stopifnot(identical(median(c(FALSE, FALSE, TRUE, TRUE)), FALSE)) # implementation dependent

  stopifnot(identical(median(c("a")), "a"))
  stopifnot(identical(median(c("a", "a")), "a"))
  stopifnot(identical(median(c("a", "a", "b")), "a"))
  stopifnot(identical(median(c("b", "b", "b", "a")), "b"))
  stopifnot(identical(median(c("b", "b", "a", "a")), "a")) # implementation dependent
Comment 2 Stephen Milborrow 2018-08-14 20:48:44 UTC
Note also in the current implementation:

> median(c(FALSE, FALSE, TRUE, TRUE))
[1] 0.5    # wrong
Comment 3 Martin Maechler 2018-08-15 07:00:48 UTC
Thank you, Bill and Stephen, for raising the issue, and suggestions.

my TLDR: This is not a bug, and well documented

The behavior to return 0.5 e.g. for an even number of TRUE and FALSE is really entirely in line with R behavior otherwise:  logicals are treated as {0, 1} in arithmetic/numeric context.

I'm almost sure Bill Dunlap suggested the result should always be double as he mentioned the calls from C / C++ 
To that I'd say they need to wrap it  by  asReal(.)  [the C function in R's API) 
and in general should be careful as median() and the mean(.) used inside may dispatch!

Stephen Milborrow's suggestion goes in the other direction, i.e. to preserve logical even in the "even case".
No, I don't think we want to go there.  
The help page even explicitly mentions those cases:

Value:

     The default method returns a length-one object of the same type as
     ‘x’, except when ‘x’ is logical or integer of even length, when
     the result will be double.