Bug 15307 - Extra Lines in .EMF plots with PDF Export in MS Word
Summary: Extra Lines in .EMF plots with PDF Export in MS Word
Status: CLOSED WISHLIST
Alias: None
Product: R
Classification: Unclassified
Component: Windows GUI / Window specific (show other bugs)
Version: R 3.0.0
Hardware: ix86 (32-bit) Windows 32-bit
: P5 critical
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2013-05-07 14:41 UTC by catch.all.foo
Modified: 2013-05-11 07:34 UTC (History)
3 users (show)

See Also:


Attachments
PDF file with demonstration of problem. (207.74 KB, application/pdf)
2013-05-07 14:41 UTC, catch.all.foo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description catch.all.foo 2013-05-07 14:41:02 UTC
Created attachment 1447 [details]
PDF file with demonstration of problem.

Thanks for making R! 

There's a bug I hope you can fix as it affects many of my plots.

I'm using MS Word 2007 SP3 on MS Windows 7 Home Premium SP1.

The bug can be reproduced by running this code in R:

win.metafile("Test.emf")
x = rlnorm(1e4)
plot(cumprod(x), type="l", log="y", panel.first=grid())
dev.off()

This creates a Test.emf file which you then insert as a picture in a MS Word document. It looks fine when viewed in MS Word but when I use the PDF export function built into MS Word then the plot in the PDF file contains extra lines. These extra lines are connected to the plotted line and seemingly originate from outside the upper-left of the plot. Such lines seem to be added for every 1000 points of the x-axis. See attachment.

My guess is that there's some sort of variable that gets reset by R for every 1000 iterations and this is confusing MS Word's PDF exporter.

I have seen this bug mentioned elsewhere on the internet. It is highly unlikely that Microsoft is going to fix it so I'm hoping the R developers can fix it. I had a similar problem with GnuPlot which they fixed last year and works fine now.

I would circumvent this problem by simply using another PDF converter but I can't find one that also converts the document's internal links to equations, sections, etc.

Any help would be greatly appreciated!
Comment 1 Duncan Murdoch 2013-05-07 14:47:11 UTC
From the sound of it, this is a bug in the PDF exporter in your six year old version of Word.  Does Microsoft still even support that version?  What do they say about this?
Comment 2 Peter Dalgaard 2013-05-07 15:08:32 UTC
As Duncan indicates, it is unlikely that we can move forward on this. One chance might be if you can dig out the analysis of the problem with gnuplot and the associated fix. I seem to recall similar issues getting fixed by emitting polylines in smaller chunks. 

You can't just insert the graphics as PDF in the Word document and then export as PDF?
Comment 3 catch.all.foo 2013-05-08 09:31:16 UTC
Thanks for the quick response.

This might be the bug I experienced in gnuplot last year:

http://sourceforge.net/p/gnuplot/bugs/1163/

With gnuplot the EMF plot would also show fine in the EMF file and in MS Word 2007 but when the Word document was exported to a PDF file the plot would be corrupted.

About the problem I'm having with R:

Inserting the R plot as a PDF file in the MS Word document doesn't show the extra lines, but the plot gets pixelated and smeared so it's difficult to read. It might be OK if printed but who prints anymore? and there's more than 300 plots in the document so they need to be of good quality.

I have also tried inserting the plot as an EPS file but that doesn't render very well either.

Inserting over 300 plots as high-res bitmap-files is not possible.

I have seen this problem with R / MS Word / EMF described elsewhere and it appears to be present with later versions of MS Word as well:

http://r.789695.n4.nabble.com/Problems-of-metafile-plots-when-converting-word-to-pdf-file-td3809975.html

It is suggested to use another PDF converter but the problem is that they don't create the internal links to equations, sections, etc. which I need.

Here's another thread with suggestions for workarounds:

http://stackoverflow.com/questions/9555889/producing-a-vector-graphics-image-i-e-metafile-in-r-suitable-for-printing-in

The last post mentions the devEMF package for R which I've just tried. It renders the plot fine in both MS Word and the exported PDF - however, the EMF file is somehow corrupted as it causes MS Paint to leak gigabytes of memory. I've contacted the author of devEMF to hear if he can fix that bug and possibly also knows what might be causing the problem with win.metafile().
Comment 4 Duncan Murdoch 2013-05-08 11:10:16 UTC
If MS Word doesn't display R or gnuplot figures correctly, and makes a mess of PDF and EPS plots, then aren't you talking to the wrong people?  Shouldn't you be complaining to MS?  (Or switching to some software that works.)
Comment 5 Duncan Murdoch 2013-05-10 09:24:23 UTC
I'm closing this, as it appears to be a request for a workaround to a Microsoft bug.  Microsoft should fix it.
Comment 6 Philip Johnson 2013-05-10 15:59:41 UTC
(I realize this is closed but in case anyone stumbles across this by searching)

> The last post mentions the devEMF package for R which I've just tried. It
> renders the plot fine in both MS Word and the exported PDF - however, the EMF
> file is somehow corrupted as it causes MS Paint to leak gigabytes of memory.
> I've contacted the author of devEMF to hear if he can fix that bug and possibly
> also knows what might be causing the problem with win.metafile().

I am the author of devEMF.  There are two key differences between the emf file produced by devEMF vs win.metafile:

  1) The former has a single POLYLINE record while the latter has 1000s of MOVETO and LINETO records.  My guess is that either the larger file size or the many MOVETO/LINETO records trigger the observed bug in Word 2007.  (That said, I know little about win.metafile / the windows GDI; I wrote devEMF to generate emf on non-windows systems)

  2) The former specifies a much larger device coordinate system.  This allows for more precise placement of vector objects on the page, but it seems that this exposes a flaw in catch.all.foo's MS Paint filter.  My best guess is that it blindly allocates a bitmap image of size equal to the device coordinate space.

To conclude: I agree with Duncan - these are bugs in programs other than R and do not effect enough R users to warrant a workaround.
Comment 7 catch.all.foo 2013-05-11 07:34:42 UTC
After several tests in cooperation with Philip Johnson (author of devEMF) I can conclude the following:

- win.metafile() creates EMF files which may confuse the PDF export in MS Word causing it to add extra lines, as described above. devEMF does not have this problem. Johnson describes the likely cause above.

- win.metafile() creates EMF files which may be shown by MS Paint using some unknown 3rd party filter that we couldn't identify. The files created by devEMF cause MS Paint to allocate gigabytes of memory which stalls the computer. Johnson believes the reason is that MS Paint allocates a pixel canvas the size of the coordinate system (17780 x 17780 used by devEMF). Instead forcing devEMF to output a smaller coordinate system means its files can be loaded in MS Paint as well. So the problem is likely in the MS Paint filter as Johnson suspects.

So people experiencing problems exporting PDF files in MS Word with plots made in R using win.metafile() (EMF/WMF) should try using devEMF instead.

@Duncan Murdoch, your attitude is not helpful. I described the problem well and showed that other people and other software had had similar problems, and that it wasn't clear whether R or MS Word was causing the problem (in the gnuplot case described above it was actually a bug in gnuplot and not MS Word; the gnuplot folks were very helpful and polite and fixed the bug). Johnson and I have now spent time localizing the cause of this problem so this thread can serve as a reference for people who might experience similar problems in the future. This is constructive.