Bug 17398

Summary: Printing issues with recursive data structures
Product: R Reporter: Lionel Henry <lionel.hry>
Component: LanguageAssignee: R-core <R-core>
Status: ASSIGNED ---    
Severity: normal CC: maechler
Priority: P5    
Version: R 3.4.3   
Hardware: All   
OS: All   
Attachments: Refactor print dispatch to reduce duplication and fix bugs
Forward user-supplied parameters when recursing into print()
Print functions consistently
Evaluate print() in calling environment
Use stack-local state in print routines

Description Lionel Henry 2018-03-20 18:33:01 UTC
Created attachment 2325 [details]
Refactor print dispatch to reduce duplication and fix bugs

Overview
========

This set of patches fixes many print() bugs and should make print.c more
maintainable. In particular printing recursive data structures (lists,
pairlists, attributes) is made more reliable and consistent. The main
fixes concern the callbacks to the R function print() when printing
recursive data containing S3 or S4 objects:

- Calls with a S3 class (e.g. `structure(quote(mycall()), class = "foo")`)
  are no longer evaluated when printed in recursive data.

- Arguments supplied to print() are now forwarded when print() is
  called back on an object.

- print() is now called back in the right environment. This makes
  dispatch to local S3 methods consistent with auto-printing and the
  first print call.

- show() is preferred for S4 objects.

- Indexing tags are always preserved after a print() callback.


Other bugfixes:

- Functions are now printed with sourcerefs even when printed from
  recursive data.

- The global state in `R_print` is restored each time it might be
  reset or modified, for instance when deparsing or calling the R
  print() on an object.


For maintainability:

- Code duplication between do_printdefault() and the routines for
  printing lists, pairlists and attributes has been reduced. This way
  all these routines share the same code path and bug fixes.

- The routines in print.c now use stack-local data rather than the
  global state in `R_print`.



First patch
===========

The intent of this patch is to reduce code duplication and fix several
bugs by reusing the same routines for dispatching on objects in
PrintGenericVector(), printList() and PrintValueEnv() (and also
printAttributes() in the second patch).

- PrintObjectS4() takes care to print S4 objects with show() rather
  than print(). Previously only auto-printed objects or objects inside
  attributes were printed with show(), now objects within lists and
  pairlists are as well.

  It uses code from [ripley 41099] and [luke 68702] to pluck the
  method from the namespace.

- PrintObjectS3() uses the logic in [luke 67993] to avoid evaluating
  symbolic objects. It is now possible to print S3 calls within lists,
  pairlist and attributes without side effects.

- PrintObject() calls the S3 or S4 routines. It saves and restores
  `tagbuf` when dispatching because print.default() will reset
  it. This fixes a bunch of issues where the indexing tag would be
  reset on print-dispatch.

Our main concern is to stop print() from evaluating calls with S3
classes because this prevents us from introducing this kind of data
structures in our packages. Since we support old versions of R in our
packages (currently R 3.1 and older), it would help if we could get
this fixed in the patched releases as well. Would you consider
applying smaller and targetted fixes in the next patched releases to
that effect?


Second patch
============

Depends on first patch. Replaces embryo logic from [ripley 22922] for
forwarding user-supplied arguments when recursing with R-side
print(). This logic was partial and only used for printing objects in
attributes.

- Allows reusing PrintObject() in printAttributes() as well.

- Handling of user-supplied arguments has been moved to PrintObject()
  so that other routines benefit from it. Following this change, user
  can now do this:

  obj <- structure(list(), class = "foo")
  print(list(obj, obj), digits = 2, width = 30)

  And the two parameters will be forwarded when recursing into the
  list elements.

- The whole `R_print` structure is saved and restored before calling
  into print() since print.default() will reset all paramaters.
  Following this change the parameters are preserved even after
  dispatch. In tho following snippet `x` is printed up to 2 digits
  before and after printing `obj`:

  x <- 1.23456789
  print(list(x, obj, x), digits = 2)

- In addition the R_print structure is now saved and restored when
  calling into deparsing routines as these were calling
  PrintDefaults(). This fixes many issues where user-supplied
  parameters would be reset.


Third patch
===========

With this patch functions are printed in a consistent way. The
sourcerefs are now also used when printing elements of recursive data
structures.

- Refactor PrintLanguage() and PrintClosure() so that
  PrintLanguageEtc() is no longer needed.

- print.function() just calls print.default() instead of
  do_printfunction(). The latter did not set R_print parameters
  correctly.

  Correspondingly, do_printdefault() no longer calls print.function().
  This was creating an exception where printing would be internally
  generic except for functions.


Fourth patch
============

Currently dispatch to print() methods occurs in the execution
environment of print.default() rather than in the lexical context of
the caller. With this patch the parent frame is passed to
do_printdefault(). This way locally defined methods are found even
when printing objects within lists, pairlists, or attributes.

In the future it is probably a good idea to add an `envir` argument to
print() since it is often called in print methods and those methods
should be able to propagate the calling environment without do.call()
tricks.


Fifth patch
===========

This patch does not modify any behaviour, it just replaces in print.c
all uses of the global `R_print` state by stack-local state. Since the
structure is now part of function signatures, it is renamed to
`R_PrintData` for style consistency with other parts of the R source. A
define keeps `R_print_par_t` as an alias for backward compatibility.
Comment 1 Lionel Henry 2018-03-20 18:33:49 UTC
Created attachment 2326 [details]
Forward user-supplied parameters when recursing into print()
Comment 2 Lionel Henry 2018-03-20 18:34:24 UTC
Created attachment 2327 [details]
Print functions consistently
Comment 3 Lionel Henry 2018-03-20 18:34:54 UTC
Created attachment 2328 [details]
Evaluate print() in calling environment
Comment 4 Lionel Henry 2018-03-20 18:35:23 UTC
Created attachment 2329 [details]
Use stack-local state in print routines
Comment 5 Martin Maechler 2018-03-21 11:02:26 UTC
Whoa!   this is quite impressive  and relatively extensive.
I agree that the issues you address are desirable to be fixed, at least those I have looked at.

First note that we do never patch old releases.
The last release of the 3.4.x series,  3.4.4 has just appeared a few days ago, 
and unless there's a catastrophic security bug detected within the next weeks,
the only R releases to happen in the future are >= 3.5.0.
useRs should either have a completely frozen old version or then work with the latest one.

The release of 3.5.0 was just announced on R-devel with a link to the detailed release schedule.
Grand feature freeze GFF(= switch to R 3.5.0 alpha) is in 5 days,
and   Feature Freeze (FF = switch to R 3.5.0 beta) is 2 weeks later.

If these patches affect  CRAN or bioconductor packages in a way that they start failing their 'R CMD check's  ((which they will if they look at the output "too carefully"))  we will have to alert all the maintainers and ask them to update their checks (and possibly rarely) their code.... this is something that can take time, notably after the notification dealing with answers, exceptions, proposed schedulres, ...

I'm in a busy semester (teaching and more) and cannot devote too much time to this, and so doubt a bit if it will be possible to even do this before FF...
In other words, your timing is "optimally bad" ;-) 
I think these patches can only make 3.5.0 if more R Core members have time for them --- notably checking their effect in CRAN & Bioconductor --- even before committing them to R-devel [[[well.. or if RStudio staff could do that for us ?!? ]]
Comment 6 Lionel Henry 2018-03-21 14:11:14 UTC
> I think these patches can only make 3.5.0 if more R Core members
> have time for them --- notably checking their effect in CRAN &
> Bioconductor --- even before committing them to R-devel [[[well.. or
> if RStudio staff could do that for us?!? ]]

Unfortunately we don't currently have the infrastructure to check all
of CRAN. However if that's helpful I could check a random sample of
packages on my machine to get an estimation of the problems these
patches might cause.
Comment 7 Martin Maechler 2018-04-18 10:18:08 UTC
I have now committed a (new, different) patch from Lionel -- svn c74614
to one of the issues,
namely printing of *nested* S3-classed  calls:

In R <= 3.5.0, we see this 

 > print(obj <- structure(quote(stop("OOPS!")), class = "foo"))
stop("OOPS!")
attr(,"class")
[1] "foo"

> list(obj)
[[1]]
Error in print(stop("OOPS!")) : OOPS!
> 

.. which *is* quite undesirable and has been fixed now.
Comment 8 Martin Maechler 2018-05-23 10:57:01 UTC
Further, I had ported the fix svn c74614  to R 3.5.0 patched (74666), about four weeks ago.

As I found that *was* basically the first of the patches above.

Now I tried applying the other patches, but they (already patch #2) don't apply cleanly anymore, probably because of too much time passed.

I explicitly reverted 74614 and then applied  patch #1 (which applied cleanly), but then patch #2 gave too many rejects in src/main/print.c

There's more however: patch #2 has a nice simple part, just preserving the R_print global when deparse1 is called.

However adding the full 'userArgs' list (in R code) 
as 'callArgs' (in C code) in addition to the single arguments, seems against a DRY principle ... and I think it should be 'callArgs' in both C and R.

Also, as you add an entry to the global R_print struc, in Print.h it is noted that a change of the struc needs also an update in "R.app" -- the MacOS-only program which very very unfortunately has its sources outside the R source tree,
and so cannot be patched simultaneously.

That's not your fault, of course, but looks like a flaw in the whole R source setup. and should be raised here.

Still: Is there a chance you can replace patches #2 -- #5  with patches that apply cleanly to today's source?  {{*not* doing whitespace changes which your patch #1 did -- it replaced tabs by spaces}}.
Comment 9 Lionel Henry 2018-05-23 11:09:51 UTC
Sure I will look into rebasing the patches, addressing your comments, and creating another patch for R.app.
Comment 10 Lionel Henry 2018-05-25 18:19:47 UTC
Hi Martin,

I have now addressed your comments:

- All changes are tabified
- Unit tests are organised in paragraphs
- `userArgs` is renamed to `callArgs` in print.default()
- The arguments are passed only once.

The last item is implemented in a sixth patch because to keep things
DRY I had to use an advancer function that might not be to everyone's
taste, so it seemed better to make it separate. All things considered
I find it rather elegant.

About R.app, I believe no change is necessary over there because the
signature of PrintDefaults() was not changed.

I have uploaded all patches as separate PRs in my github mirror of
r-source. I thought it would make it easier for you to review those
and then iterate if there are any more comments. All patches assume
that r74614 is reverted and I rebased the first patch as well just in
case.

You will find in each PR a link to the relevant diff that you can
apply and commit locally. If any more changes are required, the
diff URLs will not change after I have rebased/amended the commits.

- PATCH1: https://github.com/lionel-/r-source/pull/7
- PATCH2: https://github.com/lionel-/r-source/pull/8
- PATCH3: https://github.com/lionel-/r-source/pull/9
- PATCH4: https://github.com/lionel-/r-source/pull/10
- PATCH5: https://github.com/lionel-/r-source/pull/11
- PATCH6: https://github.com/lionel-/r-source/pull/12

Please let me know if you prefer not to use this mode of patch
reviewing in the future or if I missed anything.

Best,
Lionel
Comment 12 Martin Maechler 2018-05-31 19:36:10 UTC
I have now applied patches 1 and 2 --  with very minor changes.

Note that for `doc/NEWS.Rd`  you had added to bug fixes of 'R 3.5.0 patched' but these really must go to the bug fixes for R-devel for now and well possibly for good.

Also, I have very slightly shortened the `tests/print-tests.{R,Rout.save}` and added a bit more comments about what changed from "previous" to "now".  No problem there.

--------------

Patch 3  --- applied locally but nothing committed:

There's a problem in a very simple case:  The contrary happens of what your patch wants to achieve in more delicate cases:  A function is *NOT* printed with its comment -- after the patch -- where it printed fine before:

This is the new wrong behavior:

> op <- options(keep.source = TRUE)
> fn <- function(x) {
+ x + 1 # A comment, kept as part of the source
+ }
> fn
function (x) 
{
    x + 1
}
Comment 13 Lionel Henry 2018-06-01 09:48:49 UTC
Thanks,

I have now rebased patches 3-6 on top of trunk.

I have rebuilt with patch 3 and I can't reproduce the issue with
the srcrefs. This is also covered by tests and `make check` passes.
Comment 14 Martin Maechler 2018-06-01 10:20:28 UTC
(In reply to Lionel Henry from comment #13)
> Thanks,
> 
> I have now rebased patches 3-6 on top of trunk.

Thank you!


> I have rebuilt with patch 3 and I can't reproduce the issue with
> the srcrefs. This is also covered by tests and `make check` passes.

Yes, indeed.  You were faster than I ... when I just found 30 minutes ago, that after correctly recompiling I could *not* reproduce the issue either.
... I do apologize for having wasted not only my own but also your time..