Bug 16541 - R's httpd lowercases boundary string in attr(reqbody, "content-type") for multipart forms
Summary: R's httpd lowercases boundary string in attr(reqbody, "content-type") for mul...
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Misc (show other bugs)
Version: R 3.2.1
Hardware: Other Other
: P5 minor
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2015-09-22 02:22 UTC by Bill Dunlap
Modified: 2015-12-14 13:45 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bill Dunlap 2015-09-22 02:22:01 UTC
When R gets a POST request with a multipart/form-data payload, it appears to convert the "boundary" string in the Content-Type field to lower case.  The Chrome browser typically sends a mixed-case boundary string, which is used to separate parts of the payload.  (It is chosen so it does not appear in any part of the raw payload.) Since the mixed-case boundary string is used in the payload you have to use ignore.case=TRUE in grepRaw to find it in the payload, which seems ugly to me.

One way to reproduce this is to install the webutils package from CRAN and edit the demo_rhttpd function and save the 'reqbody' argument in the POST case:
        else {
            assign(envir=globalenv(), "REQBODY", reqbody) # add this line
            message("Received HTTP POST request.")
Then run
   demo_rhttpd()
(It will put up a form in a browser, which you will have to fill in and submit.)
Then look at the saved REQBODY object.  When using Chrome on Windows I get
> attr(REQBODY, "content-type")
[1] "multipart/form-data; boundary=----webkitformboundaryiavjlkprmmpawt9t"
> rawToChar(REQBODY) # not case difference
[1] "------WebKitFormBoundaryIAVjLkPRMmpawT9t\r\nContent-Disposition: form-data; name=\"username\"\r\n\r\nx\r\n------WebKitFormBoundaryIAVjLkPRMmpawT9t\r\nContent-Disposition: form-data; name=\"email_address\"\r\n\r\nx@y.z\r\n------WebKitFormBoundaryIAVjLkPRMmpawT9t\r\nContent-Disposition: form-data; name=\"picture\"; filename=\"junk.R\"\r\nContent-Type: application/octet-stream\r\n\r\nhdr <- \"Host: localhost:27382\\nConnection: keep-alive\\nContent-Length: 781321\\nCache-Control: max-age=0\\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\\nOrigin: http://localhost:27382\\nUpgrade-Insecure-Requests: 1\\nUser-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36\\nContent-Type: multipart/form-data; boundary=----WebKitFormBoundary7cNfwkVnmPZQpZDX\\nReferer: http://localhost:27382/custom/test\\nAccept-Encoding: gzip, deflate\\nAccept-Language: en-US,en;q=0.8\\n\"\n\r\n------WebKitFormBoundaryIAVjLkPRMmpawT9t\r\nContent-Disposition: form-data; name=\"food\"\r\n\r\nsushi\r\n------WebKitFormBoundaryIAVjLkPRMmpawT9t--\r\n"
> grepRaw(sub("^.*boundary=", "", attr(REQBODY, "content-type")), REQBODY, all=TRUE)
integer(0)
> grepRaw(sub("^.*boundary=", "", attr(REQBODY, "content-type")), REQBODY, ignore.case=TRUE, all=TRUE)
[1]    3   99  204  926 1022
Comment 1 Bill Dunlap 2015-09-22 16:23:06 UTC
Here is a more self-contained way to reproduce the problem.  Run the following code, wait for the browser to show a page with 2 forms (regular and multipart) in it, fill in the multipart form and submit it.  Then print the value of httpRequests_qwerty and see that the boundary string in the body's content-type attribute is all lower case, even when it is not lower case in the Content-Type header nor in the body of the post.  (Internet Explorer seems to use only lowercase hex digits, Firefox decimal digits, but Chrome tends to make mixed case general ASCII text boundary strings.)

makeHttpRequestSaver <- function(type = "test", storageName = paste0("httpRequests_", type), envir = parent.frame())
{
    port <- tools:::httpdPort()
    if (port == 0) {
        tools::startDynamicHelp()
        port <- tools:::httpdPort()
    }
    assign(envir=envir, storageName, list())
    assign(envir=tools:::.httpd.handlers.env, type,
        function (reqpath, reqquery, reqbody, reqheaders)
        {
            request <- structure(
                class="httpRequest",
                list(path=reqpath, query=reqquery, body=reqbody, headers=reqheaders))
            tmp <- get(envir=envir, storageName)
            tmp[[length(tmp)+1]] <- request
            assign(envir=envir, storageName, tmp)
            message("Stored messge #", length(tmp), " in " , storageName)
            list(
                paste(sep="",
                   "<html>", "\n",
                   "<head><title>Reply from ", type, "</title></head>", "\n",
                   "<body>",
                   "<h1>", type, "</h1>", "Sorry, I have no answer for the query ", paste(names(reqquery), reqquery, sep=":", collapse=", "), "\n",

                   "<h1>", "A regular form", "</h1>",
                   "<form action=\"", type, "\" method=\"POST\">", "\n",
                   "First name <input type=\"text\" name=\"firstname\" size=40>\n",
                   "Last name <input type=\"text\" name=\"lastname\" size=40>\n",
                   "Optional", "<input type=\"test\" size=10>", "\n", # no name for input
                   "<p> <input type=\"submit\" value=\"Submit Regular Form\">", "\n",
                   "</form>" , "\n",

                   "<h1>", "A multipart form", "</h1>", "\n",
                   "<form method=\"post\" enctype=\"multipart/form-data\">", "\n",
                   "<div class=\"form-group\">", "\n",
                   "<label for=\"firstname\">First name</label>", "\n",
                   "<input type=\"text\" class=\"form-control\" name=\"firstname\" placeholder=\"First name\" required>", "\n",
                   "</div>", "\n",
                   "<div class=\"form-group\">", "\n",
                   "<label for=\"lastname\">Last name</label>", "\n",
                   "<input type=\"text\" class=\"form-control\" name=\"lastname\" placeholder=\"Last name\" required>", "\n",
                   "</div>", "\n",
                   "<button type=\"submit\" class=\"btn btn-default\">Submit Multipart Form</button>", "\n",
                   "</form>", "\n",

                   "</body>", "\n",
                   "</html>", "\n"),
                "text/html",
                NULL,
                200)
        })
    list(port=port,
         type=type,
         storageName=storageName,
         url=paste0("http://localhost:", port, "/custom/", type, "/who?room=bedroom&weapon=lead%20pipe"))
}
print.httpRequest <- function(x, ...)
{
    cat("Path:", x$path, "\n")
    cat("Query:", paste(collapse=", ", formatDL(names(x$query), x$query, style="list")), "\n")
    cat("Headers:", strsplit(rawToChar(x$headers), "\n")[[1]], sep="\n    ")
    cat("Body:\n")
    writeLines(paste0("    ", capture.output(
        if (is.raw(x$body) && !any(x$body==0)) {
            if (length(at <- attributes(x$body)) > 0) {
                print(attributes(x$body))
            }
            cat(rawToChar(x$body))
        } else {
            x$body
        })))
    invisible(x)
}

# Try it out
z <- makeHttpRequestSaver("qwerty")
browseURL(z$url)

With Chrome on Windows I get:
[[3]]
Path: /custom/qwerty/qwerty 
Query:  
Headers:
    Request-Method: POST
    Host: localhost:10669
    Connection: keep-alive
    Content-Length: 248
    Cache-Control: max-age=0
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
    Origin: http://localhost:10669
    Upgrade-Insecure-Requests: 1
    User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36
    Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryvO8ChOD9rSiUNN7t
    Referer: http://localhost:10669/custom/qwerty/qwerty
    Accept-Encoding: gzip, deflate
    Accept-Language: en-US,en;q=0.8
Body:
    $`content-type`
    [1] "multipart/form-data; boundary=----webkitformboundaryvo8chod9rsiunn7t"
    
    ------WebKitFormBoundaryvO8ChOD9rSiUNN7t

    Content-Disposition: form-data; name="firstname"

    

    Multipart

    ------WebKitFormBoundaryvO8ChOD9rSiUNN7t

    Content-Disposition: form-data; name="lastname"

    

    Form

    ------WebKitFormBoundaryvO8ChOD9rSiUNN7t--
Comment 2 Brian Ripley 2015-09-23 08:07:11 UTC
This appears to be deliberate: line ca 959 of src/modules/internet/Rhttpd.c has

    while (*l) { if (*l >= 'A' && *l <= 'Z') *l |= 0x20; l++; };

My reading is that media types are case-insensitive, but parameters may or may not be.

It is not entirely clear to me why ASCII lower-casing is done (Simon may be able to elucidate), but it would seem better if this stopped at ';'.
Comment 3 Simon Urbanek 2015-09-23 14:53:09 UTC
Yes, converting MIME types in content type to lowercase is deliberate since they are case-insensitive. However, as Brian noted parameters should not be converted.