|Summary:||hist.Date uses breaks at end of years, showing misleading x-axis|
|Product:||R||Reporter:||Mike Toews <mwtoews>|
Description Mike Toews 2016-01-22 05:36:44 UTC
Plotting a histogram of years from dates shows a figure with a misleadingly offset x-axis. Consider a few simple dates in the years 1970 and 1971 (near the Unix epoch): > dates <- as.Date(c("1970-01-01", "1971-08-16", "1971-12-31")) > d <- hist(dates, "years", freq=TRUE) > d$breaks # -1 364 729 With R version 3.2.3, the histogram shows x-axis labels 1969, 1970 and 1971, which incorrectly implies that the dates are in the years 1969 and 1970. Showing more formatting information reveals why: > hist(dates, "years", freq=TRUE, format="%Y-%m-%d") now the x-axis labels are 1969-12-31, 1970-12-31, 1971-12-31. These breakpoints are at the end of the year. Shouldn't the breakpoints be set at the beginning of the year, at 1970-01-01, 1971-01-01 and 1972-01-01? (or 0, 365, 730) By establishing breakpoints at the beginning of each year, the default formatted breakpoints along the x-axis would be 1970, 1971 and 1972, which is the correct interpretation.
Comment 1 Duncan Murdoch 2016-02-07 13:28:40 UTC
If you look at the source to graphics:::hist.Date, you'll see what's happening. It works out the breaks as years, then subtracts 1, so that the breaks are on Dec 31 of the preceding year. With the default "right = TRUE", this is the right thing to do, but it does have the unfortunate side effect you noticed. One solution would be to change the default for "right" so the subtraction is unnecessary. Another would be to just change the labels, but then they'll be wrong if someone chooses a format like "%Y-%m-%d". A third would be to change the default format for the labels so at least they aren't misleading. A version of the same problem will occur for months, quarters and years, which all back up the breaks by a day. My inclination is to make the format change. Changing "right" will cause small changes in existing histograms. Changing the labels has the problem mentioned above. I'd like to add a warning about the choice of breaks to ?hist.Date, and change the default format for the labels, so at least they aren't misleading. Comments?
Comment 2 Duncan Murdoch 2016-02-08 12:26:44 UTC
I've fixed this now, with two changes. - The default format is %Y-%m-%d, so the label won't be misleading. - The break is still set to Dec 31 by default, but is set to Jan 1 if right=FALSE. Similar changes apply to monthly or quarterly breaks, and hist.POSIXt.