Bug 17138 - latexToUtf8 hangs for certain unrecognized LaTeX macros
Summary: latexToUtf8 hangs for certain unrecognized LaTeX macros
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Analyses (show other bugs)
Version: R 3.3.*
Hardware: Other Linux
: P5 normal
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2016-09-02 07:37 UTC by Matt
Modified: 2017-03-28 23:20 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Matt 2016-09-02 07:37:18 UTC
For example:
tools::latexToUtf8(tools::parseLatex("{\\a'\\i}"))

or
tools::latexToUtf8(tools::parseLatex("{\\'\\i}"))

The hang occurs because when latexToUtf8 is processing a "MACRO" tag, the switch statement at line 100 of the file here: https://svn.r-project.org/R/branches/R-3-3-branch/src/library/tools/R/parseLatex.R
does not include a case for nexttag being "MACRO"
Comment 1 Peter Dalgaard 2016-09-02 14:29:46 UTC
I can reproduce this.

Running with debug(tools::latexToUtf8) I see something that looks like internal code damage:

debug: k <- 1L
Browse[3]> 
debug: while (k <= numargs) {
    if (getNext) {
        j <- j + 1L
        if (j > length(x)) {
            warning("argument for ", c(a), " not found", domain = NA)
            nextobj <- latex_tag("", "TEXT")
            nexttag <- "TEXT"
            nextchars <- ""
...
Browse[3]> 
debug: k <- k + 1L
Browse[3]> 
debug: (while) k <= numargs
Browse[3]> 
debug: if (getNext) {
    j <- j + 1L
    if (j > length(x)) {
        warning("argument for ", c(a), " not found", domain = NA)
        nextobj <- latex_tag("", "TEXT")
        nexttag <- "TEXT"
        nextchars <- ""
    }
    else {
        nextobj <- x[[j]]
.....


Notice the garbled while() construct. This may be only cosmetic; at any rate the diagnosis is correct that the switch() 

switch(nexttag, TEXT = {
    args[[k]] <- latex_tag(nextchars[1L], "TEXT")
    nextchars <- nextchars[-1L]
    if (!length(nextchars)) getNext <- TRUE
    if (args[[k]] %in% whitespace) next
    k <- k + 1L
}, COMMENT = getNext <- TRUE, BLOCK = , ENVIRONMENT = , MATH = {
    args[[k]] <- latexToUtf8(nextobj)
    k <- k + 1L
    getNext <- TRUE
}, `NULL` = stop("Internal error:  NULL tag", domain = NA))

encounters nexttag=="MACRO" and does nothing (in particular, does not increase k) so the loop does not terminate.

This was with R 3.3.0 on an aged iMac running Mavericks.
Comment 2 Peter Dalgaard 2016-09-02 14:30:29 UTC
I can reproduce this.

Running with debug(tools::latexToUtf8) I see something that looks like internal code damage:

debug: k <- 1L
Browse[3]> 
debug: while (k <= numargs) {
    if (getNext) {
        j <- j + 1L
        if (j > length(x)) {
            warning("argument for ", c(a), " not found", domain = NA)
            nextobj <- latex_tag("", "TEXT")
            nexttag <- "TEXT"
            nextchars <- ""
...
Browse[3]> 
debug: k <- k + 1L
Browse[3]> 
debug: (while) k <= numargs
Browse[3]> 
debug: if (getNext) {
    j <- j + 1L
    if (j > length(x)) {
        warning("argument for ", c(a), " not found", domain = NA)
        nextobj <- latex_tag("", "TEXT")
        nexttag <- "TEXT"
        nextchars <- ""
    }
    else {
        nextobj <- x[[j]]
.....


Notice the garbled while() construct. This may be only cosmetic; at any rate the diagnosis is correct that the switch() 

switch(nexttag, TEXT = {
    args[[k]] <- latex_tag(nextchars[1L], "TEXT")
    nextchars <- nextchars[-1L]
    if (!length(nextchars)) getNext <- TRUE
    if (args[[k]] %in% whitespace) next
    k <- k + 1L
}, COMMENT = getNext <- TRUE, BLOCK = , ENVIRONMENT = , MATH = {
    args[[k]] <- latexToUtf8(nextobj)
    k <- k + 1L
    getNext <- TRUE
}, `NULL` = stop("Internal error:  NULL tag", domain = NA))

encounters nexttag=="MACRO" and does nothing (in particular, does not increase k) so the loop does not terminate.

This was with R 3.3.0 on an aged iMac running Mavericks.
Comment 3 Achim Zeileis 2017-03-13 21:34:11 UTC
I just came across the same problem. In principle, it should be possible to set up a vector 'index' for latexTable[[index]] because

tools:::latexTable[[c("\\'", "\\i")]]

provides the required UTF-8 character. However, the handling of the other variables in the loop might need some care.

Note that this problem generally applies to MACRO+MACRO situations and in 
tools:::latexTable there are at least:

\'\i
\`\i
\^\i
\"\i
\c\

But also non-sensical constructs like

tools::latexToUtf8(tools::parseLatex("\\'\\alpha"))
tools::latexToUtf8(tools::parseLatex("\\'\\LaTeX"))

run into the same infinite loop.
Comment 4 Duncan Murdoch 2017-03-28 23:20:04 UTC
Fixed in R-devel; will fix in 3.4.0.