Bug 16097 - input textConnection performance quadratic as well
Summary: input textConnection performance quadratic as well
Status: NEW
Alias: None
Product: R
Classification: Unclassified
Component: Misc (show other bugs)
Version: R 3.1.2
Hardware: All All
: P5 normal
Assignee: R-core
Depends on:
Reported: 2014-12-04 12:02 UTC by talhayon1
Modified: 2015-12-28 07:42 UTC (History)
2 users (show)

See Also:

Test case (194 bytes, text/plain)
2015-12-24 17:53 UTC, Jonathan Fry
Suggested Patch (1.42 KB, patch)
2015-12-24 17:57 UTC, Jonathan Fry
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description talhayon1 2014-12-04 12:02:50 UTC
There exists a problem that the input textConnection is quadratic:

in connection.c:text_init

for(i = 0; i < nlines; i++) {
	       type == 1 ? translateChar(STRING_ELT(text, i))
	       : ((type == 3) ?translateCharUTF8(STRING_ELT(text, i))
		  : CHAR(STRING_ELT(text, i))) );
	strcat(this->data, "\n");

this part is quadratic - strcat will search for the end of the string every time and it is very prevalent when giving textconnection long vectors of character strings.

Please note that this is not a duplicate of "14053 textConnection performance quadratic" - the solution for that one was a note in the description for an output text connection.
Comment 1 Jonathan Fry 2015-12-24 17:53:10 UTC
Created attachment 1951 [details]
Test case

The R script plots time required to create a text connection against the number of lines in its source vector.  On my (very slow) system, it takes about a minute altogether.s
Comment 2 Jonathan Fry 2015-12-24 17:57:59 UTC
Created attachment 1953 [details]
Suggested Patch

The attached file contains a suggested complete replacement for the function text_init in src/main/connections.c.
Comment 3 talhayon1 2015-12-28 07:42:41 UTC
Might be more readable and reusable to extract this logic into another utility function. Something like linux's stpcpy (found no alternative for strcat).