Bug 15124 - Remove trailing whitespace from printed htest objects
Remove trailing whitespace from printed htest objects
Status: CLOSED FIXED
Product: R
Classification: Unclassified
Component: I/O
R 2.15.1
Other Linux
: P5 enhancement
Assigned To: R-core
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-12-04 05:48 UTC by gwern0
Modified: 2012-12-26 01:37 UTC (History)
1 user (show)

See Also:


Attachments
Screenshot of Emacs demonstrating trailing whitespace in R terminal output (34.48 KB, image/png)
2012-12-04 05:48 UTC, gwern0
Details
Ouput of described grep commands illustrating that appending of newline is an extremely common idiom in the R codebase and a better cat is not an unreasonable suggestion (24.13 KB, text/plain)
2012-12-04 05:49 UTC, gwern0
Details

Note You need to log in before you can comment on or make changes to this bug.
Description gwern0 2012-12-04 05:48:33 UTC
Created attachment 1387 [details]
Screenshot of Emacs demonstrating trailing whitespace in R terminal output

Using the R terminal REPL, the printed output of tests like t.test has a space after the output on that line but before the newline.

The space is unnecessary, but more importantly for me, it means that I cannot simply copy-paste output, indent it 4 spaces, and insert it into a Markdown file, comment, post, or other communication, because trailing spaces are interpreted by a number of Markdown dialects as indicating that the next line is to be wrapped onto the previous line. This causes the copied R output to be formatted incorrectly, unpasteable into a terminal, and unreadable.

I downloaded the Subversion copy of R and looked at src/library/stats/R t.test.R (no apparent reason for the trailing whitespace) and then htest.R; the trailing whitespace appears to be due to the use of 'cat' and manually specifying a newline, where cat's default behavior of interpersing spaces means that a space is put in between the data and the newline - trailing whitespace. An example:

              cat("true", names(x$null.value), "is", alt.char,
                    x$null.value, "\n")

Would result in something like " is alt.char null.value \n"

The uses of cat can be fixed, I think, by adding the spaces manually, to something like:

   cat("true ", names(x$null.value), " is ", alt.char, " ", x$null.value, "\n", sep="")

Alternately, there may be a sane version of 'cat' which would keep the convenience of default spaces-as-separators but also add the newline for you, which would make the fix much much easier.

I don't know if this hypothetical version of 'cat' exists, but it seems to me like there *really* ought to be such a function:

1. a quick use of `grep "cat(" *.R | fgrep '\n")'` suggests that just in the stats directory are 159 uses of the idiom.
2. If I run a similar `find . -name "*.R" -exec fgrep "cat(" {} \; | fgrep '\n")' | wc` in the root of the repository, there are 431 matches (!).

(Some of the instances make me want to weep. Like the *48* invocations of `cat("\n")`.)
Comment 1 gwern0 2012-12-04 05:49:57 UTC
Created attachment 1388 [details]
Ouput of described grep commands illustrating that appending of newline is an extremely common idiom in the R codebase and a better cat is not an unreasonable suggestion
Comment 2 Brian Ripley 2012-12-25 11:08:11 UTC
cat(sep = "") is what you are looking for, and it is not very common.  After many years we introduced paste0 in R 2.15.0: it is far more common.

The print.htest instance has been changed.
Comment 3 gwern0 2012-12-25 22:38:03 UTC
Thank you.
Comment 4 Peter Dalgaard 2012-12-26 01:37:14 UTC
Notice though that getting rid of all instances of cat(x, "\n") is a lost cause. There are just too many instances. And trying to fix it automagically, by removing spaces that have been printed before the newline looks like a road to madness.

In general, you're likely stuck with fixing things in emacs, say, by query-replace-regexp of " *$" with "".

And, BTW, cat(x);cat("\n") does not generate the trailing space. No need to weep over that.