Bug 16805 - wininet method: downloaded length != reported length
Summary: wininet method: downloaded length != reported length
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Windows GUI / Window specific (show other bugs)
Version: R 3.3.0
Hardware: Other Other
: P5 enhancement
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2016-04-08 00:16 UTC by Jeroen Ooms
Modified: 2016-04-09 10:24 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jeroen Ooms 2016-04-08 00:16:36 UTC
When downloading a file on windows using the `wininet` method, we sometimes see warnings like this:

	> download.file("https://httpbin.org/stream/10", tempfile(), method = "wininet")
	trying URL 'https://httpbin.org/stream/10'
	Content type 'application/json' length 200 bytes
	downloaded 1870 bytes

	Warning message:
	In download.file("https://httpbin.org/stream/10", tempfile(), method = "wininet") :
	  downloaded length 1870 != reported length 200

The problem appears only when the server is not setting the `Content-Length` http response header (which is not required and often omitted for streams). It seems like the wininet method in this case interprets the HTTP response code (200) as the total length which is obviously incorrect.
Comment 1 Martin Morgan 2016-04-08 01:11:59 UTC
I don't have a windows setup to compile R, but it looks like in in_R_HTTPOpen2 'status' is being used first to hold the HTTP response code, then the content length. It is not reset prior to the second call, and appears to remain unchanged if the length is missing. So the '200' code is interpreted as length. Maybe the following is a simple fix? 


Index: src/modules/internet/internet.c
===================================================================
--- src/modules/internet/internet.c	(revision 70445)
+++ src/modules/internet/internet.c	(working copy)
@@ -950,6 +950,7 @@
 		  HTTP_QUERY_CONTENT_TYPE, &buf, &d3, &d2);
     d2 = 0;
     // NB: this can only retrieve in a DWORD, so up to 2GB or 4GB?
+    status = 0;
     HttpQueryInfo(wictxt->session,
 		  HTTP_QUERY_CONTENT_LENGTH | HTTP_QUERY_FLAG_NUMBER,
 		  &status, &d1, &d2);

Alternatively, from https://msdn.microsoft.com/en-us/library/windows/desktop/aa384238(v=vs.85).aspx HttpQueryInfo() returns FALSE on error, and it seems that there is an ERROR_HTTP_HEADER_NOT_FOUND, so

Index: src/modules/internet/internet.c
===================================================================
--- src/modules/internet/internet.c	(revision 70445)
+++ src/modules/internet/internet.c	(working copy)
@@ -950,9 +950,12 @@
 		  HTTP_QUERY_CONTENT_TYPE, &buf, &d3, &d2);
     d2 = 0;
     // NB: this can only retrieve in a DWORD, so up to 2GB or 4GB?
-    HttpQueryInfo(wictxt->session,
-		  HTTP_QUERY_CONTENT_LENGTH | HTTP_QUERY_FLAG_NUMBER,
-		  &status, &d1, &d2);
+    if (!HttpQueryInfo(wictxt->session,
+                       HTTP_QUERY_CONTENT_LENGTH | HTTP_QUERY_FLAG_NUMBER,
+                       &status, &d1, &d2)) {
+        if (GetLastError() == ERROR_HTTP_HEADER_NOT_FOUND)
+            status = 0;         /* what about other errors? */
+    }
     wictxt->length = status;
     wictxt->type = strdup(buf);
     if(!IDquiet) {
Comment 2 Jeroen Ooms 2016-04-08 19:11:26 UTC
For sake of readability, use a different variable for length than for status? This is very confusing right now.