I have reported this on R-devel mailing list, https://stat.ethz.ch/pipermail/r-devel/2016-February/072353.html. Looking at function 'prettyDate' in https://svn.r-project.org/R/trunk/src/library/grDevices/R/prettyDate.R, the early return part (if(isDate && D <= n * 24*3600)) is not quite right: - The result doesn't have attribute "labels". Help on 'pretty.Date', "Value": A vector (of the suitable class) of locations, with attribute '"labels"' giving corresponding formatted character labels. - Argument 'min.n' is not respected. Example, adapted from examples for 'pretty.Date': x <- as.Date("2002-02-02") at <- pretty(seq(x, by = "1 day", length = 2), n = 5, min.n = 5) attr(at, "labels") # should be non-null length(at) # should be at least 6 Output: R> x <- as.Date("2002-02-02") R> at <- pretty(seq(x, by = "1 day", length = 2), n = 5, min.n = 5) R> attr(at, "labels") # should be non-null NULL R> length(at) # should be at least 6 [1] 2 R> sessionInfo() R Under development (unstable) (2016-02-25 r70222) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows XP (build 2600) Service Pack 2 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.3.0 Regarding 'min.n', I think min.n > n should be an error, as in 'pretty.default'. Another issue that I have not reported on R-devel mailing list: In the case of of only one unique value in date-time object D, pretty(D, n = n, min.n = n) fails. It succeeds in R 3.2.3. Example: R> print.default(pretty(as.POSIXct("2002-02-02 02:02"), n = 5, min.n = 5)) Error in prettyDate(x = x, n = n, min.n = min.n, sep = sep, ...) : range too small for 'min.n' R> sessionInfo() R Under development (unstable) (2016-02-25 r70222) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows XP (build 2600) Service Pack 2 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.3.0 With R 3.2.3: R> print.default(pretty(as.POSIXct("2002-02-02 02:02"), n = 5, min.n = 5)) [1] 1012590120 1012590122 1012590124 1012590126 1012590128 1012590130 [7] 1012590132 1012590134 1012590136 1012590138 1012590140 1012590142 [13] 1012590144 1012590146 1012590148 1012590150 1012590152 1012590154 [19] 1012590156 1012590158 1012590160 1012590162 1012590164 1012590166 [25] 1012590168 1012590170 1012590172 1012590174 1012590176 1012590178 [31] 1012590180 attr(,"class") [1] "POSIXct" "POSIXt" attr(,"tzone") [1] "" attr(,"labels") [1] "00" "02" "04" "06" "08" "10" "12" "14" "16" "18" "20" "22" "24" "26" "28" [16] "30" "32" "34" "36" "38" "40" "42" "44" "46" "48" "50" "52" "54" "56" "58" [31] "00" R> sessionInfo() R version 3.2.3 (2015-12-10) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows XP (build 2600) Service Pack 2 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base
I will look into this, to be amended for R 3.3.0 (which has become "alpha" earlier today) and newer.
(In reply to Martin Maechler from comment #1) > I will look into this, to be amended for R 3.3.0 (which has become "alpha" > earlier today) and newer. I've dealt with both of your examples, and also followed the proposal that min.n > n should give an error. The fix is committed to R-devel and R 3.4.0 alpha -- svn r 70354,5. Thank you for the report!
At function 'prettyDate' in r70361, for the early return part (if(isDate && D <= n * DAY)), the attribute "labels" of the result is paste("%b", "%d", sep = sep) . With default 'sep', it is a single string "%b %d", literally, not something like "Feb 02".
(In reply to Suharto Anggono from comment #3) > At function 'prettyDate' in r70361, for the early return part (if(isDate && > D <= n * DAY)), the attribute "labels" of the result is > paste("%b", "%d", sep = sep) . > With default 'sep', it is a single string "%b %d", literally, not something > like "Feb 02". Embarrassingly for me, you are 100% right. Thank you! Fixed by r70367 for R-devel and 70368 for R 3.3.0 alpha.
Looking again, I see that, for the "POSIXct" case (my last example), the length of the result is not quite right after fixing. For 'pretty', 'n' is number of intervals. Number of intervals of n corresponds to number of points of (n+1). min.n = 5 means at least 5 intervals. It means that number of points, which is length of the 'pretty' result, is at least 6. at <- pretty(as.POSIXct("2002-02-02 02:02"), n = 5, min.n = 5) print.default(at) length(at) # should be at least 6 Output: R> at <- pretty(as.POSIXct("2002-02-02 02:02"), n = 5, min.n = 5) R> print.default(at) [1] 1012590118 1012590119 1012590120 1012590121 1012590122 attr(,"class") [1] "POSIXct" "POSIXt" attr(,"tzone") [1] "" attr(,"labels") [1] "58" "59" "00" "01" "02" R> length(at) # should be at least 6 [1] 5 R> sessionInfo() R Under development (unstable) (2016-03-23 r70368) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows XP (build 2600) Service Pack 2 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base I prefer keep failing (no fix) to giving result that is not as requested.
(In reply to Suharto Anggono from comment #5) > Looking again, I see that, for the "POSIXct" case (my last example), the > length of the result is not quite right after fixing. > > For 'pretty', 'n' is number of intervals. Number of intervals of n > corresponds to number of points of (n+1). You are right, Suharto, thank you: Part of the new code assumed 'n' was the number of "ticks" instead of the number of intervals. Looking at this and other examples, I found that *actually* prettyDate() is partly "wrong" as not working according to the specifications of pretty() more generally: help(pretty) says that the resulting interval range should *cover* the range of the input, but that is not what happens now, mostly for the POSIX?t case. So there will be more to be done, with this.
Created attachment 2055 [details] Against R devel r70421, a fix to new code only
Thank you Suharto, I will use part of your modification to the tests script, but my version of prettyDate() will be very much different.
Examples in the documentation on 'pretty.Date' can be test case for covering the range of the input. x <- as.POSIXct("2002-02-02 02:02") attr(pretty(seq(x, by = s, length = 2), n = 5), "labels") Result that I expect that covers the range of the input in English locale, looking at variable 'steps' in code of function 'prettyDate': - For s = "10 secs": c("00", "02", "04", "06", "08", "10") - For s = "1 min": c("00", "15", "30", "45", "00") or c("00", "10", "20", "30", "40", "50", "00") - For s = "5 mins": c("02:02", "02:03", "02:04", "02:05", "02:06", "02:07") - For s = "30 mins": c("02:00", "02:10", "02:20", "02:30", "02:40") or c("02:00", "02:05", "02:10", "02:15", "02:20", "02:25", "02:30", "02:35") - For s = "6 hours": c("00:00", "03:00", "06:00", "09:00") or c("02:00", "03:00", "04:00", "05:00", "06:00", "07:00", "08:00", "09:00") - For s = "12 hours": c("00:00", "03:00", "06:00", "09:00", "12:00", "15:00") - For s = "1 DSTday": c("Feb 02 00:00", "Feb 02 06:00", "Feb 02 12:00", "Feb 02 18:00", "Feb 03 00:00", "Feb 03 06:00") - For s = "2 weeks": c("Jan 28", "Feb 04", "Feb 11", "Feb 18") or c("Feb 02", "Feb 04", "Feb 06", "Feb 08", "Feb 10", "Feb 12", "Feb 14", "Feb 16", "Feb 18") - For s = "1 months": c("Jan 28", "Feb 04", "Feb 11", "Feb 18", "Feb 25", "Mar 04") - For s = "6 months": c("Jan", "Apr", "Jul", "Oct") or c("Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep") - For s = "1 year": c("Jan", "Apr", "Jul", "Oct", "Jan", "Apr") - For s = "10 years": c("2000", "2005", "2010", "2015") or c("2002", "2004", "2006", "2008", "2010", "2012", "2014") - For s = "50 years": c("2000", "2020", "2040", "2060") or c("2000", "2010", "2020", "2030", "2040", "2050", "2060") - For s = "1000 years": c("2000", "2500", "3000", "3500") or c("2000", "2200", "2400", "2600", "2800", "3000", "3200")
(In reply to Suharto Anggono from comment #9) > Examples in the documentation on 'pretty.Date' can be test case for covering > the range of the input. ........... You are right... the never ending story will continue (the code becomes longer and longer but maybe we will converge ..). I have tested better working code.... will commit later today or tomorrow
In code of function 'prettyDate' in R devel r70486, there is nat <- seqDtime(init.at[1], by = paste("-1",st.i$spec), length=2)[2] It is in if(init.i == 1L). When the branch is reached, st.i is steps[[1]]. So, st.i$spec is "1 sec". So, paste("-1",st.i$spec) is "-1 1 sec". When it is used as 'by' argument in call to 'seq', error occurs. An example that triggers the error is with range(x) of 1 second and 'min.n' of 3. R> at <- pretty(as.POSIXct("2002-02-02 02:02") + 0:1, n = 5, min.n = 3) Error in seq.POSIXt(beg, end, by = by, length.out = length) : invalid 'by' string R> traceback() 6: stop("invalid 'by' string") 5: seq.POSIXt(beg, end, by = by, length.out = length) 4: seq(beg, end, by = by, length.out = length) 3: seqDtime(init.at[1], by = paste("-1", st.i$spec), length = 2) 2: pretty.POSIXt(as.POSIXct("2002-02-02 02:02") + 0:1, n = 5, min.n = 3) 1: pretty(as.POSIXct("2002-02-02 02:02") + 0:1, n = 5, min.n = 3) Similar instance inside function 'calcSteps' should never be executed. It is reached when lim[1] < at[1]. Because 'at' is seqDtime(startTime, end = lim[2], by = s$spec), at[1] is startTime. Because 'startTime' is trunc_POSIXt(lim[1], units = s$start), it should be that startTime <= lim[1]. Another issue is that, in some cases, "tzone" attribute is lost in the result. Example: x <- as.POSIXct("2002-02-02 02:02", tz = "UTC") print.default(pretty(seq(x, by = "30 mins", length = 2), n = 5)) x <- as.POSIXct("2002-02-02 02:02", tz = "EST5EDT") print.default(pretty(seq(x, by = "30 mins", length = 2), n = 5)) Output on my system: R> x <- as.POSIXct("2002-02-02 02:02", tz = "UTC") R> print.default(pretty(seq(x, by = "30 mins", length = 2), n = 5)) [1] 1012615200 1012615800 1012616400 1012617000 1012617600 attr(,"class") [1] "POSIXct" "POSIXt" attr(,"labels") [1] "09:00" "09:10" "09:20" "09:30" "09:40" R> x <- as.POSIXct("2002-02-02 02:02", tz = "EST5EDT") R> print.default(pretty(seq(x, by = "30 mins", length = 2), n = 5)) [1] 1012633200 1012633800 1012634400 1012635000 1012635600 attr(,"class") [1] "POSIXct" "POSIXt" attr(,"labels") [1] "14:00" "14:10" "14:20" "14:30" "14:40" After stepping, I found that, for the example, "tzone" attribute is lost after at <- c(at, nat) in "add point at right" inside 'calcSteps'. R> sessionInfo() R Under development (unstable) (2016-04-14 r70486) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows XP (build 2600) Service Pack 2 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.4.0 R> Sys.timezone() [1] "Asia/Bangkok"
Created attachment 2067 [details] Against R devel r70532
(In reply to Suharto Anggono from comment #12) > Created attachment 2067 [details] > Against R devel r70532 Thank you very much, Suharto !! Your patch is excellent ... I have done some testing now, added to the regression test and committed rev 70534. One could still improve the current code for large 'n'... but then pretty() also the numeric version has never been designed for large n; n = 1000 already being "large". For this reason, I'm now optimistic and closing the PR. Martin
I now understand that the result of 'seq.POSIXt' can have NA for time that does not exist. An example that fails because of my previous patch: pretty(as.POSIXct(c("2015-10-13", "2015-10-19"), tz = "America/Sao_Paulo")) For the example, time change is on 2015-10-18 00:00. A possible fix: Insert if (anyNA(at)) at <- at[!is.na(at)] after at <- seqDtime(startTime, end = lim[2], by = s$spec) inside function 'calcSteps'. Another option, allowing NA in 'calcSteps' output: Inside function 'calcSteps', change constructions of 'r1' and 'r2' to the following. r1 <- which(at <= lim[1]) r1 <- if (length(r1)) r1[length(r1)] else 1L r2 <- which(at >= lim[2]) r2 <- if (length(r2)) r2[1L] else if (is.na(at[length(at)])) length(at) else length(at) + 1
(In reply to Suharto Anggono from comment #14) > A possible fix: > Insert > if (anyNA(at)) at <- at[!is.na(at)] > after > at <- seqDtime(startTime, end = lim[2], by = s$spec) > inside function 'calcSteps'. This is a little safer. if (anyNA(at)) { at <- at[!is.na(at)]; if (!length(at)) return(at) }
(In reply to Suharto Anggono from comment #15) > (In reply to Suharto Anggono from comment #14) > > A possible fix: > > Insert > > if (anyNA(at)) at <- at[!is.na(at)] > > after > > at <- seqDtime(startTime, end = lim[2], by = s$spec) > > inside function 'calcSteps'. > > This is a little safer. > if (anyNA(at)) { at <- at[!is.na(at)]; if (!length(at)) return(at) } looks safer and efficient indeed. However I cannot trigger any wrong behavior (on my Linux platform) with the example you gave, > (prD <- pretty(as.POSIXct(c("2015-10-13", "2015-10-19"), tz = "America/Sao_Paulo"))) [1] "2015-10-13 00:00:00 BRT" "2015-10-14 00:00:00 BRT" [3] "2015-10-15 00:00:00 BRT" "2015-10-16 00:00:00 BRT" [5] "2015-10-17 00:00:00 BRT" "2015-10-18 01:00:00 BRST" [7] "2015-10-19 00:00:00 BRST" > diff(prD) Time differences in hours [1] 24 24 24 24 24 23 > and I'd be really happy to *see* an example where such NA's appear also on Linux.
(In reply to Martin Maechler from comment #16) > However I cannot trigger any wrong behavior (on my Linux platform) > with the example you gave, I tried in http://www.tutorialspoint.com/execute_r_online.php, even the example mentioned in ?as.POSIXct doesn't give NA. The help text does mention "What happens in such cases is OS-specific". > as.POSIXct(strptime("2011-03-27 01:30:00", "%Y-%m-%d %H:%M:%S"), + tz = "Europe/London") [1] "2011-03-27 02:30:00 BST" > sessionInfo() R version 3.2.3 (2015-12-10) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Fedora 23 (Twenty Three) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base
(In reply to Suharto Anggono from comment #17) > > as.POSIXct(strptime("2011-03-27 01:30:00", "%Y-%m-%d %H:%M:%S"), > + tz = "Europe/London") > > [1] "2011-03-27 02:30:00 BST" > I should have used the following. > as.POSIXct(strptime("2011-03-27 01:30:00", "%Y-%m-%d %H:%M:%S", + tz = "Europe/London")) [1] "2011-03-27 00:30:00 GMT"
(In reply to Martin Maechler from comment #16) > (In reply to Suharto Anggono from comment #15) > > (In reply to Suharto Anggono from comment #14) > > > A possible fix: > > > Insert > > > if (anyNA(at)) at <- at[!is.na(at)] > > > after > > > at <- seqDtime(startTime, end = lim[2], by = s$spec) > > > inside function 'calcSteps'. > > > > This is a little safer. > > > if (anyNA(at)) { at <- at[!is.na(at)]; if (!length(at)) return(at) } > > looks safer and efficient indeed. > > However I cannot trigger any wrong behavior (on my Linux platform) > with the example you gave, > > > (prD <- pretty(as.POSIXct(c("2015-10-13", "2015-10-19"), tz = "America/Sao_Paulo"))) > [1] "2015-10-13 00:00:00 BRT" "2015-10-14 00:00:00 BRT" > [3] "2015-10-15 00:00:00 BRT" "2015-10-16 00:00:00 BRT" > [5] "2015-10-17 00:00:00 BRT" "2015-10-18 01:00:00 BRST" > [7] "2015-10-19 00:00:00 BRST" > > diff(prD) > Time differences in hours > [1] 24 24 24 24 24 23 > > > > and I'd be really happy to *see* an example where such NA's appear also on > Linux. I finally saw that I can get such cases when I build R configured with --enable-no-ldouble and now have added the extra "safety" line you had proposed.