Bug 16761 - Not quite right pretty(dt, min.n=*) for Date or POSIXct object dt
Summary: Not quite right pretty(dt, min.n=*) for Date or POSIXct object dt
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Misc (show other bugs)
Version: R-devel (trunk)
Hardware: All All
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2016-03-13 07:49 UTC by Suharto Anggono
Modified: 2016-05-20 16:31 UTC (History)
1 user (show)

See Also:


Attachments
Against R devel r70421, a fix to new code only (2.33 KB, patch)
2016-04-05 16:26 UTC, Suharto Anggono
Details | Diff
Against R devel r70532 (2.83 KB, patch)
2016-04-23 01:35 UTC, Suharto Anggono
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Suharto Anggono 2016-03-13 07:49:42 UTC
I have reported this on R-devel mailing list, https://stat.ethz.ch/pipermail/r-devel/2016-February/072353.html.

Looking at function 'prettyDate' in https://svn.r-project.org/R/trunk/src/library/grDevices/R/prettyDate.R, the early return part (if(isDate && D <= n * 24*3600)) is not quite right:
- The result doesn't have attribute "labels".
Help on 'pretty.Date', "Value":
A vector (of the suitable class) of locations, with attribute '"labels"' giving corresponding formatted character labels.
- Argument 'min.n' is not respected.

Example, adapted from examples for 'pretty.Date':
x <- as.Date("2002-02-02")
at <- pretty(seq(x, by = "1 day", length = 2), n = 5, min.n = 5)
attr(at, "labels")  # should be non-null
length(at)  # should be at least 6

Output:

R> x <- as.Date("2002-02-02")
R> at <- pretty(seq(x, by = "1 day", length = 2), n = 5, min.n = 5)
R> attr(at, "labels")  # should be non-null
NULL
R> length(at)  # should be at least 6
[1] 2
R> sessionInfo()
R Under development (unstable) (2016-02-25 r70222)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows XP (build 2600) Service Pack 2

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.3.0


Regarding 'min.n', I think min.n > n should be an error, as in 'pretty.default'.

Another issue that I have not reported on R-devel mailing list: In the case of of only one unique value in date-time object D, pretty(D, n = n, min.n = n) fails. It succeeds in R 3.2.3.

Example:

R> print.default(pretty(as.POSIXct("2002-02-02 02:02"), n = 5, min.n = 5))
Error in prettyDate(x = x, n = n, min.n = min.n, sep = sep, ...) :
  range too small for 'min.n'
R> sessionInfo()
R Under development (unstable) (2016-02-25 r70222)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows XP (build 2600) Service Pack 2

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.3.0


With R 3.2.3:

R> print.default(pretty(as.POSIXct("2002-02-02 02:02"), n = 5, min.n = 5))
 [1] 1012590120 1012590122 1012590124 1012590126 1012590128 1012590130
 [7] 1012590132 1012590134 1012590136 1012590138 1012590140 1012590142
[13] 1012590144 1012590146 1012590148 1012590150 1012590152 1012590154
[19] 1012590156 1012590158 1012590160 1012590162 1012590164 1012590166
[25] 1012590168 1012590170 1012590172 1012590174 1012590176 1012590178
[31] 1012590180
attr(,"class")
[1] "POSIXct" "POSIXt"
attr(,"tzone")
[1] ""
attr(,"labels")
 [1] "00" "02" "04" "06" "08" "10" "12" "14" "16" "18" "20" "22" "24" "26" "28"
[16] "30" "32" "34" "36" "38" "40" "42" "44" "46" "48" "50" "52" "54" "56" "58"
[31] "00"
R> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows XP (build 2600) Service Pack 2

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
Comment 1 Martin Maechler 2016-03-17 17:31:42 UTC
I will look into this, to be amended for R 3.3.0 (which has become "alpha" earlier today) and newer.
Comment 2 Martin Maechler 2016-03-19 20:34:30 UTC
(In reply to Martin Maechler from comment #1)
> I will look into this, to be amended for R 3.3.0 (which has become "alpha"
> earlier today) and newer.

I've dealt with both of your examples, and also followed the proposal that
min.n > n  should give an error.

The fix is committed to R-devel and  R 3.4.0 alpha -- svn r 70354,5.

Thank you for the report!
Comment 3 Suharto Anggono 2016-03-22 16:26:20 UTC
At function 'prettyDate' in r70361, for the early return part (if(isDate && D <= n * DAY)), the attribute "labels" of the result is
paste("%b", "%d", sep = sep) .
With default 'sep', it is a single string "%b %d", literally, not something like "Feb 02".
Comment 4 Martin Maechler 2016-03-23 11:21:39 UTC
(In reply to Suharto Anggono from comment #3)
> At function 'prettyDate' in r70361, for the early return part (if(isDate &&
> D <= n * DAY)), the attribute "labels" of the result is
> paste("%b", "%d", sep = sep) .
> With default 'sep', it is a single string "%b %d", literally, not something
> like "Feb 02".

Embarrassingly for me, you are 100% right. Thank you!

Fixed by r70367 for R-devel and 70368 for R 3.3.0 alpha.
Comment 5 Suharto Anggono 2016-03-25 02:25:39 UTC
Looking again, I see that, for the "POSIXct" case (my last example), the length of the result is not quite right after fixing.

For 'pretty', 'n' is number of intervals. Number of intervals of n corresponds to number of points of (n+1).
min.n = 5 means at least 5 intervals. It means that number of points, which is length of the 'pretty' result, is at least 6.

at <- pretty(as.POSIXct("2002-02-02 02:02"), n = 5, min.n = 5)
print.default(at)
length(at)  # should be at least 6

Output:

R> at <- pretty(as.POSIXct("2002-02-02 02:02"), n = 5, min.n = 5)
R> print.default(at)
[1] 1012590118 1012590119 1012590120 1012590121 1012590122
attr(,"class")
[1] "POSIXct" "POSIXt"
attr(,"tzone")
[1] ""
attr(,"labels")
[1] "58" "59" "00" "01" "02"
R> length(at)  # should be at least 6
[1] 5
R> sessionInfo()
R Under development (unstable) (2016-03-23 r70368)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows XP (build 2600) Service Pack 2

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base


I prefer keep failing (no fix) to giving result that is not as requested.
Comment 6 Martin Maechler 2016-03-30 07:23:15 UTC
(In reply to Suharto Anggono from comment #5)
> Looking again, I see that, for the "POSIXct" case (my last example), the
> length of the result is not quite right after fixing.
> 
> For 'pretty', 'n' is number of intervals. Number of intervals of n
> corresponds to number of points of (n+1).

You are right, Suharto, thank you: Part of the new code assumed 'n' was the number of "ticks" instead of the number of intervals.

Looking at this and other examples, I found that *actually*  prettyDate() is partly "wrong" as not working according to the specifications of pretty() more generally:  help(pretty) says that the resulting interval range should *cover*
the range of the input, but that is not what happens now, mostly for the POSIX?t case.  So there will be more to be done, with this.
Comment 7 Suharto Anggono 2016-04-05 16:26:05 UTC
Created attachment 2055 [details]
Against R devel r70421, a fix to new code only
Comment 8 Martin Maechler 2016-04-06 10:02:24 UTC
Thank you Suharto,  I will use part of your modification to the tests script,
but my version of prettyDate() will be very much different.
Comment 9 Suharto Anggono 2016-04-10 07:49:13 UTC
Examples in the documentation on 'pretty.Date' can be test case for covering the range of the input.

x <- as.POSIXct("2002-02-02 02:02")
attr(pretty(seq(x, by = s, length = 2), n = 5), "labels")

Result that I expect that covers the range of the input in English locale, looking at variable 'steps' in code of function 'prettyDate':
- For s = "10 secs":
c("00", "02", "04", "06", "08", "10")
- For s = "1 min":
c("00", "15", "30", "45", "00") or
c("00", "10", "20", "30", "40", "50", "00")
- For s = "5 mins":
c("02:02", "02:03", "02:04", "02:05", "02:06", "02:07")
- For s = "30 mins":
c("02:00", "02:10", "02:20", "02:30", "02:40") or
c("02:00", "02:05", "02:10", "02:15", "02:20", "02:25", "02:30", "02:35")
- For s = "6 hours":
c("00:00", "03:00", "06:00", "09:00") or
c("02:00", "03:00", "04:00", "05:00", "06:00", "07:00", "08:00", "09:00")
- For s = "12 hours":
c("00:00", "03:00", "06:00", "09:00", "12:00", "15:00")
- For s = "1 DSTday":
c("Feb 02 00:00", "Feb 02 06:00", "Feb 02 12:00", "Feb 02 18:00", "Feb 03 00:00", "Feb 03 06:00")
- For s = "2 weeks":
c("Jan 28", "Feb 04", "Feb 11", "Feb 18") or
c("Feb 02", "Feb 04", "Feb 06", "Feb 08", "Feb 10", "Feb 12", "Feb 14", "Feb 16", "Feb 18")
- For s = "1 months":
c("Jan 28", "Feb 04", "Feb 11", "Feb 18", "Feb 25", "Mar 04")
- For s = "6 months":
c("Jan", "Apr", "Jul", "Oct") or
c("Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep")
- For s = "1 year":
c("Jan", "Apr", "Jul", "Oct", "Jan", "Apr")
- For s = "10 years":
c("2000", "2005", "2010", "2015") or
c("2002", "2004", "2006", "2008", "2010", "2012", "2014")
- For s = "50 years":
c("2000", "2020", "2040", "2060") or
c("2000", "2010", "2020", "2030", "2040", "2050", "2060")
- For s = "1000 years":
c("2000", "2500", "3000", "3500") or
c("2000", "2200", "2400", "2600", "2800", "3000", "3200")
Comment 10 Martin Maechler 2016-04-12 10:00:24 UTC
(In reply to Suharto Anggono from comment #9)
> Examples in the documentation on 'pretty.Date' can be test case for covering
> the range of the input.

 ...........

You are right... the never ending story will continue
(the code becomes longer and longer but maybe we will converge ..).

I have tested better working code.... will commit later today or tomorrow
Comment 11 Suharto Anggono 2016-04-16 03:36:01 UTC
In code of function 'prettyDate' in R devel r70486, there is
nat <- seqDtime(init.at[1], by = paste("-1",st.i$spec), length=2)[2]

It is in if(init.i == 1L). When the branch is reached, st.i is steps[[1]]. So, st.i$spec is "1 sec". So, paste("-1",st.i$spec) is "-1 1 sec". When it is used as 'by' argument in call to 'seq', error occurs.
An example that triggers the error is with range(x) of 1 second and 'min.n' of 3.

R> at <- pretty(as.POSIXct("2002-02-02 02:02") + 0:1, n = 5, min.n = 3)
Error in seq.POSIXt(beg, end, by = by, length.out = length) :
  invalid 'by' string
R> traceback()
6: stop("invalid 'by' string")
5: seq.POSIXt(beg, end, by = by, length.out = length)
4: seq(beg, end, by = by, length.out = length)
3: seqDtime(init.at[1], by = paste("-1", st.i$spec), length = 2)
2: pretty.POSIXt(as.POSIXct("2002-02-02 02:02") + 0:1, n = 5, min.n = 3)
1: pretty(as.POSIXct("2002-02-02 02:02") + 0:1, n = 5, min.n = 3)

Similar instance inside function 'calcSteps' should never be executed. It is reached when lim[1] < at[1]. Because 'at' is seqDtime(startTime, end = lim[2], by = s$spec), at[1] is startTime. Because 'startTime' is trunc_POSIXt(lim[1], units = s$start), it should be that startTime <= lim[1].


Another issue is that, in some cases, "tzone" attribute is lost in the result.

Example:
x <- as.POSIXct("2002-02-02 02:02", tz = "UTC")
print.default(pretty(seq(x, by = "30 mins", length = 2), n = 5))
x <- as.POSIXct("2002-02-02 02:02", tz = "EST5EDT")
print.default(pretty(seq(x, by = "30 mins", length = 2), n = 5))

Output on my system:
R> x <- as.POSIXct("2002-02-02 02:02", tz = "UTC")
R> print.default(pretty(seq(x, by = "30 mins", length = 2), n = 5))
[1] 1012615200 1012615800 1012616400 1012617000 1012617600
attr(,"class")
[1] "POSIXct" "POSIXt"
attr(,"labels")
[1] "09:00" "09:10" "09:20" "09:30" "09:40"
R> x <- as.POSIXct("2002-02-02 02:02", tz = "EST5EDT")
R> print.default(pretty(seq(x, by = "30 mins", length = 2), n = 5))
[1] 1012633200 1012633800 1012634400 1012635000 1012635600
attr(,"class")
[1] "POSIXct" "POSIXt"
attr(,"labels")
[1] "14:00" "14:10" "14:20" "14:30" "14:40"

After stepping, I found that, for the example, "tzone" attribute is lost after
at <- c(at, nat)
in "add point at right" inside 'calcSteps'.


R> sessionInfo()
R Under development (unstable) (2016-04-14 r70486)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows XP (build 2600) Service Pack 2

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.4.0
R> Sys.timezone()
[1] "Asia/Bangkok"
Comment 12 Suharto Anggono 2016-04-23 01:35:49 UTC
Created attachment 2067 [details]
Against R devel r70532
Comment 13 Martin Maechler 2016-04-23 12:55:48 UTC
(In reply to Suharto Anggono from comment #12)
> Created attachment 2067 [details]
> Against R devel r70532

Thank you very much, Suharto !!

Your patch is excellent ... I have done some testing now, added to the regression test and committed rev 70534.

One could still improve the current code for large 'n'...
but then pretty() also the numeric version has never been designed for large n; 
n = 1000 already being "large".

For this reason, I'm now optimistic and closing the PR.
Martin
Comment 14 Suharto Anggono 2016-04-24 03:33:57 UTC
I now understand that the result of 'seq.POSIXt' can have NA for time that does not exist.

An example that fails because of my previous patch:
pretty(as.POSIXct(c("2015-10-13", "2015-10-19"), tz = "America/Sao_Paulo"))

For the example, time change is on 2015-10-18 00:00.

A possible fix:
Insert
if (anyNA(at)) at <- at[!is.na(at)]
after
at <- seqDtime(startTime, end = lim[2], by = s$spec)
inside function 'calcSteps'.

Another option, allowing NA in 'calcSteps' output:
Inside function 'calcSteps', change constructions of 'r1' and 'r2' to the following.
r1 <- which(at <= lim[1])
r1 <- if (length(r1)) r1[length(r1)] else 1L
r2 <- which(at >= lim[2])
r2 <- if (length(r2)) r2[1L] else if (is.na(at[length(at)])) length(at) else length(at) + 1
Comment 15 Suharto Anggono 2016-04-24 07:06:42 UTC
(In reply to Suharto Anggono from comment #14)
> A possible fix:
> Insert
> if (anyNA(at)) at <- at[!is.na(at)]
> after
> at <- seqDtime(startTime, end = lim[2], by = s$spec)
> inside function 'calcSteps'.

This is a little safer.
if (anyNA(at)) { at <- at[!is.na(at)]; if (!length(at)) return(at) }
Comment 16 Martin Maechler 2016-04-24 16:53:26 UTC
(In reply to Suharto Anggono from comment #15)
> (In reply to Suharto Anggono from comment #14)
> > A possible fix:
> > Insert
> > if (anyNA(at)) at <- at[!is.na(at)]
> > after
> > at <- seqDtime(startTime, end = lim[2], by = s$spec)
> > inside function 'calcSteps'.
> 
> This is a little safer.

> if (anyNA(at)) { at <- at[!is.na(at)]; if (!length(at)) return(at) }

looks safer and efficient indeed.

However I cannot trigger any wrong behavior (on my Linux platform)
with the example you gave,

> (prD <- pretty(as.POSIXct(c("2015-10-13", "2015-10-19"), tz = "America/Sao_Paulo")))
[1] "2015-10-13 00:00:00 BRT"  "2015-10-14 00:00:00 BRT" 
[3] "2015-10-15 00:00:00 BRT"  "2015-10-16 00:00:00 BRT" 
[5] "2015-10-17 00:00:00 BRT"  "2015-10-18 01:00:00 BRST"
[7] "2015-10-19 00:00:00 BRST"
> diff(prD)
Time differences in hours
[1] 24 24 24 24 24 23
> 

and I'd be really happy to *see* an example where such NA's appear also on Linux.
Comment 17 Suharto Anggono 2016-04-25 16:44:46 UTC
(In reply to Martin Maechler from comment #16)
> However I cannot trigger any wrong behavior (on my Linux platform)
> with the example you gave,

I tried in http://www.tutorialspoint.com/execute_r_online.php, even the example mentioned in ?as.POSIXct doesn't give NA. The help text does mention "What happens in such cases is OS-specific".

> as.POSIXct(strptime("2011-03-27 01:30:00", "%Y-%m-%d %H:%M:%S"),                                                               
+ tz = "Europe/London")                                                                                                          
[1] "2011-03-27 02:30:00 BST"                                                                                                    
> sessionInfo()                                                                                                                  
R version 3.2.3 (2015-12-10)                                                                                                     
Platform: x86_64-redhat-linux-gnu (64-bit)                                                                                       
Running under: Fedora 23 (Twenty Three)                                                                                          
                                                                                                                                 
locale:                                                                                                                          
[1] C                                                                                                                            
                                                                                                                                 
attached base packages:                                                                                                          
[1] stats     graphics  grDevices utils     datasets  methods   base
Comment 18 Suharto Anggono 2016-04-25 23:40:10 UTC
(In reply to Suharto Anggono from comment #17)
> > as.POSIXct(strptime("2011-03-27 01:30:00", "%Y-%m-%d %H:%M:%S"),                                                               
> + tz = "Europe/London")                                                     
> 
> [1] "2011-03-27 02:30:00 BST"                                               
> 

I should have used the following.

> as.POSIXct(strptime("2011-03-27 01:30:00", "%Y-%m-%d %H:%M:%S",                                                                
+ tz = "Europe/London"))                                                                                                         
[1] "2011-03-27 00:30:00 GMT"
Comment 19 Martin Maechler 2016-05-20 16:31:50 UTC
(In reply to Martin Maechler from comment #16)
> (In reply to Suharto Anggono from comment #15)
> > (In reply to Suharto Anggono from comment #14)
> > > A possible fix:
> > > Insert
> > > if (anyNA(at)) at <- at[!is.na(at)]
> > > after
> > > at <- seqDtime(startTime, end = lim[2], by = s$spec)
> > > inside function 'calcSteps'.
> > 
> > This is a little safer.
> 
> > if (anyNA(at)) { at <- at[!is.na(at)]; if (!length(at)) return(at) }
> 
> looks safer and efficient indeed.
> 
> However I cannot trigger any wrong behavior (on my Linux platform)
> with the example you gave,
> 
> > (prD <- pretty(as.POSIXct(c("2015-10-13", "2015-10-19"), tz = "America/Sao_Paulo")))
> [1] "2015-10-13 00:00:00 BRT"  "2015-10-14 00:00:00 BRT" 
> [3] "2015-10-15 00:00:00 BRT"  "2015-10-16 00:00:00 BRT" 
> [5] "2015-10-17 00:00:00 BRT"  "2015-10-18 01:00:00 BRST"
> [7] "2015-10-19 00:00:00 BRST"
> > diff(prD)
> Time differences in hours
> [1] 24 24 24 24 24 23
> > 
> 
> and I'd be really happy to *see* an example where such NA's appear also on
> Linux.

I finally saw that I can get such cases when I build R configured with
   --enable-no-ldouble
and now have added the extra "safety" line you had proposed.