Bug 17324 - Suggested addition packageDate()
Summary: Suggested addition packageDate()
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Wishlist (show other bugs)
Version: R-devel (trunk)
Hardware: All Linux
: P5 enhancement
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2017-08-08 17:33 UTC by Dirk Eddelbuettel
Modified: 2017-12-19 17:33 UTC (History)
1 user (show)

See Also:


Attachments
Diff to package utils implementing packageDate (plus NAMESPACE and Rd) (2.50 KB, patch)
2017-08-08 17:33 UTC, Dirk Eddelbuettel
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dirk Eddelbuettel 2017-08-08 17:33:38 UTC
Created attachment 2288 [details]
Diff to package utils implementing packageDate (plus NAMESPACE and Rd)

Attached is a small (tested) diff against current SVN which adds a function 'packageDate()' -- I find myself using 'packageVersion(somePkg)' a lot and
sometimes wish we had 'packageDate()'.  If you consider it to be too trivial
I can of course stick it into a local helper package.

As an aside, CRAN lets the Date be free-format which sadly prevents us from
doing (easy) date arithmetic.  So here I just return the character string, and
not a Date object (as we would in symmetry with packageVersion()). Illustration
using the wonderful CRAN_package_db() follows:

R> db <- tools::CRAN_package_db()
R> summary(as.Date(db[,"Date"]))
        Min.      1st Qu.       Median         Mean 
   "4-12-20" "2014-10-08" "2016-04-16" "2014-07-17" 
     3rd Qu.         Max.         NA's 
"2017-03-08" "2017-12-03"       "2698" 
R> summary(anytime::anydate(db[,"Date"]))
        Min.      1st Qu.       Median         Mean 
"2004-01-03" "2014-09-16" "2016-04-08" "2015-09-10" 
     3rd Qu.         Max.         NA's 
"2017-03-06" "2017-12-03"       "2768" 
R> 


Regards,  Dirk
Comment 1 Dirk Eddelbuettel 2017-09-01 14:33:03 UTC
I take it there is no interest in this?
Comment 2 Martin Maechler 2017-09-03 14:53:58 UTC
(In reply to Dirk Eddelbuettel from comment #1)
> I take it there is no interest in this?

There's some, from me.
I think it would make sense _if_ we additionally returned a "Date" object
(possibly NA), and also we the function gets an option (i.e. optional argument which when flipped) takes the  'Built:' date if that's available.

Maybe some simple (R only, no C(++)) heurestics from anydate() could be used
for the  NN/NN/NN  and  NN/NN/NNNN dates?
Comment 3 Dirk Eddelbuettel 2017-09-03 15:46:02 UTC
I agree on Date being preferable, but see the analysis I included in the initial post: too few packages "do it right".

Now, over time CRAN could enforce this.

Heuristics are fine--what anytime and anydate do internally can be done in R as well. It "simply" tries a bunch of formats.

BTW Gabor Csardi has a package that ported the (much more powerful) Date parser from Linus himself (IIRC).  I can look that up.

But how do we return "either a Date or a character" ?
Comment 4 Martin Maechler 2017-09-04 16:08:49 UTC
(In reply to Dirk Eddelbuettel from comment #3)
> I agree on Date being preferable, but see the analysis I included in the
> initial post: too few packages "do it right".

I know... and the new function could entice *some* to do better.
> 
> Now, over time CRAN could enforce this.
> 
> Heuristics are fine--what anytime and anydate do internally can be done in R
> as well. It "simply" tries a bunch of formats.

good.  
What I should have said in 'Comment 2' was that the 'Packaged: ' field should be looked at if the regular 'Date:' does not give a valid result; then maybe
a 'Date/Publication: ' as CRAN adds and only then a possible  'Built: ' field.

> 
> BTW Gabor Csardi has a package that ported the (much more powerful) Date
> parser from Linus himself (IIRC).  I can look that up.

and you'd propose that full parser to be added to R?

> 
> But how do we return "either a Date or a character" ?

I don't understand,
where did you cite the "............................" from?
Comment 5 Dirk Eddelbuettel 2017-09-04 16:49:03 UTC
Good comments, and good suggestion re alternate fields for fallback.  My last comment was mostly because I did not understand how you suggested to not return a character fallback if no Date was found.


But now with suggestion, a quick test:


packageDate <- function(pkg, lib.loc = NULL) {
    res <- suppressWarnings(packageDescription(pkg, lib.loc=lib.loc,
                                               fields = "Date"))

    if (is.na(res))
        stop(gettextf("package %s not found", sQuote(pkg)), domain = NA)

    res <- as.Date(res)
    if (!is.na(res)) return(res)

    for (fld in c("Date/Publication", "Built", "Packaged")) {
        res <- suppressWarnings(packageDescription(pkg, 
                                                   lib.loc=lib.loc, 
                                                   fields = fld))
        res <- as.Date(res)
        if (!is.na(res)) return(res)
    }

    res                                 # default NA value
}


Happy to update the formal diff with something like this.

The aforementioned package by Gabor is 'parsedate' but it also contains C code, and is on CRAN.   Maybe be overkill here.
Comment 6 Martin Maechler 2017-09-05 06:45:00 UTC
(In reply to Dirk Eddelbuettel from comment #5)

and then you, DE, entered a better version into a completely unrelated bug
report (https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16496)

Here it is, modified in the order of fields to be tried, according to my
own proposal above:

packageDate <- function(pkg, lib.loc = NULL) {
    for (fld in c("Date", "Packaged", "Date/Publication", "Built")) {
        res <- suppressWarnings(packageDescription(pkg, 
                                                   lib.loc=lib.loc, 
                                                   fields = fld))
        res <- as.Date(res)
        if (!is.na(res)) return(res)
    }
    res # default NA value
}

This looks quite good now in my view.
I don't like the use of suppressWarnings() and may want to add an argument
to packageDescription() instead.
Comment 7 Dirk Eddelbuettel 2017-09-05 12:17:30 UTC
Yes, sorry.  For me (Chrome, Linux) bugzilla jumps to a _different_ bug report of mine after I save an update.  

I think the suppressMessages() use was copied over from packageVersion.  In r-devel right now:

packageVersion <- function(pkg, lib.loc = NULL)
{
    res <- suppressWarnings(packageDescription(pkg, lib.loc=lib.loc,
                                               fields = "Version"))
    if (!is.na(res)) package_version(res) else
    stop(gettextf("package %s not found", sQuote(pkg)), domain = NA)
}

I am fine either way and happy to update the suggest patch (including the help
page mentioning that we now try multiple fields).
Comment 8 Martin Maechler 2017-12-19 17:33:50 UTC
I have now committed a "first proposal" of a new  packageDate()  to R-devel
(svn rev 73925).

It does start with trying the "Date" field and trying *some* date formats.
but these are both arguments to the functions that also could still get different defaults before release.

After applying it the result of  installed.packages()  for a huge library of "all" (almost) of CRAN and 100s of more packages (mostly bioConductor),
I think it could become smarter:  If the formats it tried for "Date" gives a date with year before 2000 (say, or '1950') it should drop it and try other formats or other fields.  Notably "Packaged" would be quite reliable ... but has not always been present in my huge list of packages.

Let's do some experiments with this definition and get proposals for improvements, ideally before release next spring.

As "PR" I now close this... which does not preclude us adding feature requests etc here.