Bug 16805

Summary: wininet method: downloaded length != reported length
Product: R Reporter: Jeroen Ooms <jeroen>
Component: Windows GUI / Window specificAssignee: R-core <R-core>
Status: CLOSED FIXED    
Severity: enhancement CC: morgan
Priority: P5    
Version: R 3.3.*   
Hardware: Other   
OS: Other   

Description Jeroen Ooms 2016-04-08 00:16:36 UTC
When downloading a file on windows using the `wininet` method, we sometimes see warnings like this:

	> download.file("https://httpbin.org/stream/10", tempfile(), method = "wininet")
	trying URL 'https://httpbin.org/stream/10'
	Content type 'application/json' length 200 bytes
	downloaded 1870 bytes

	Warning message:
	In download.file("https://httpbin.org/stream/10", tempfile(), method = "wininet") :
	  downloaded length 1870 != reported length 200

The problem appears only when the server is not setting the `Content-Length` http response header (which is not required and often omitted for streams). It seems like the wininet method in this case interprets the HTTP response code (200) as the total length which is obviously incorrect.
Comment 1 Martin Morgan 2016-04-08 01:11:59 UTC
I don't have a windows setup to compile R, but it looks like in in_R_HTTPOpen2 'status' is being used first to hold the HTTP response code, then the content length. It is not reset prior to the second call, and appears to remain unchanged if the length is missing. So the '200' code is interpreted as length. Maybe the following is a simple fix? 


Index: src/modules/internet/internet.c
===================================================================
--- src/modules/internet/internet.c	(revision 70445)
+++ src/modules/internet/internet.c	(working copy)
@@ -950,6 +950,7 @@
 		  HTTP_QUERY_CONTENT_TYPE, &buf, &d3, &d2);
     d2 = 0;
     // NB: this can only retrieve in a DWORD, so up to 2GB or 4GB?
+    status = 0;
     HttpQueryInfo(wictxt->session,
 		  HTTP_QUERY_CONTENT_LENGTH | HTTP_QUERY_FLAG_NUMBER,
 		  &status, &d1, &d2);

Alternatively, from https://msdn.microsoft.com/en-us/library/windows/desktop/aa384238(v=vs.85).aspx HttpQueryInfo() returns FALSE on error, and it seems that there is an ERROR_HTTP_HEADER_NOT_FOUND, so

Index: src/modules/internet/internet.c
===================================================================
--- src/modules/internet/internet.c	(revision 70445)
+++ src/modules/internet/internet.c	(working copy)
@@ -950,9 +950,12 @@
 		  HTTP_QUERY_CONTENT_TYPE, &buf, &d3, &d2);
     d2 = 0;
     // NB: this can only retrieve in a DWORD, so up to 2GB or 4GB?
-    HttpQueryInfo(wictxt->session,
-		  HTTP_QUERY_CONTENT_LENGTH | HTTP_QUERY_FLAG_NUMBER,
-		  &status, &d1, &d2);
+    if (!HttpQueryInfo(wictxt->session,
+                       HTTP_QUERY_CONTENT_LENGTH | HTTP_QUERY_FLAG_NUMBER,
+                       &status, &d1, &d2)) {
+        if (GetLastError() == ERROR_HTTP_HEADER_NOT_FOUND)
+            status = 0;         /* what about other errors? */
+    }
     wictxt->length = status;
     wictxt->type = strdup(buf);
     if(!IDquiet) {
Comment 2 Jeroen Ooms 2016-04-08 19:11:26 UTC
For sake of readability, use a different variable for length than for status? This is very confusing right now.