Bug 15261 - cat() with more than one argument and more then one separator cycles separators improperly
Summary: cat() with more than one argument and more then one separator cycles separato...
Alias: None
Product: R
Classification: Unclassified
Component: I/O (show other bugs)
Version: R 3.0.0
Hardware: All All
: P5 minor
Assignee: R-core
Depends on:
Reported: 2013-04-08 10:14 UTC by Suharto Anggono
Modified: 2013-04-09 05:50 UTC (History)
2 users (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Suharto Anggono 2013-04-08 10:14:43 UTC
This is a part of "Details" section in the documentation for function 'cat' in R 3.0.0.

     'cat' is useful for producing output in user-defined functions.
     It converts its arguments to character vectors, concatenates them
     to a single character vector, appends the given 'sep = ' string(s)
     to each element and then outputs them.

If it is true, it should be the same whether giving to 'cat' several character vectors or giving to 'cat' a single character vector that is the concatenation of them. In reality, the two cases can give different results when 'sep' contains more than one elements. This is an example.

> cat("a", "b", "c", "d", sep=c("-", "+", "x")); cat("|\n")
> cat(c("a", "b", "c", "d"), sep=c("-", "+", "x")); cat("|\n")

This is a test case of behavior of 'cat'.

> cat(c("a", "b", "c"), c(1, 2, 3),
+ sep=c("-", "+", "x", "?", "@")); cat("|\n")

In the output above, element "x" from 'sep' is not used. The "-" is used instead. It happens at the boundary between the two objects given to 'cat'. So, it seems that
(1) between adjacent objects, the first element of 'sep' is always used;
(2) between adjacent elements within each object, elements of 'sep' are used as if the objects are concatenated after being converted to character vectors.

I also have an issue with the "Note" section:

     If any element of 'sep' contains a newline character, it is
     treated as a vector of terminators rather than separators, an
     element being output after every vector element _and_ a newline
     after the last.  Entries are recycled as needed.

I think, 'sep' is always treated as a vector of separators. But, in the special case where any element of 'sep' contains a newline character, a final newline is added in the output. This is an example.

> cat(c("a", "b", "c"), sep=c("-", "+\n")); cat("|\n")

In the output above, 'c' is directly followed by newline. There is no other character in between.

> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: i386-w64-mingw32/i386 (32-bit)

[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.0.0

For consideration, this is the description of argument 'sep' in http://www.uni-muenster.de/ZIV.BennoSueselbeck/s-html/helpfiles/cat.html, which I believe to be the help page of 'cat' in S-PLUS 3.4.

    vector of character strings to insert between successive data items of each object. This argument is used cyclically and if it contains a newline, the output will contain a final newline.
Comment 1 Duncan Murdoch 2013-04-08 12:28:18 UTC
Recycling of arguments is standard in R functions.
Comment 2 Suharto Anggono 2013-04-09 04:34:39 UTC
(In reply to comment #1)
> Recycling of arguments is standard in R functions.

OK. Let me state this more clearly. I don't complain about recycling of arguments.

Take a look at
cat(c("a", "b", "c"), c(1, 2, 3), sep=c("-", "+", "x", "?", "@"))
I would expect that the output is
But, the output is
Notice that, between c and 1, it is -, not x

To the extreme, the output of
cat("a", "b", "c", "d", sep=c("-", "+", "x"))
Notice that only - (the first element of 'sep') is used.

This behavior is not stated in the help page. So, I hope that this fact is stated, that, between objects, the _first_ element of 'sep' is always used.
Comment 3 Simon Urbanek 2013-04-09 05:50:48 UTC
There are two bugs: a) the sep index (ntot) is advanced even after the last element (that's why "x" doesn't appear) and b) between elements index 0 is always used. I have fixed both, such that cat(x, y, sep=z) and cat(c(x, y), sep=z) have the same effect (as suggested by the documentation). The only exception is still a zero-length vector [other than NULL] which behaves like "" (this would be easy to change, but looking at the code it seems this was intentional and it is intuitive to force a separator between arguments in all cases).