Pages

Friday, 15 October 2010

Converting Microsoft Office (Word, Excel) documents to PDFs in Java

This question was asked a lot on the JDC and I'm sure will continue being asked now the JDC has been merged with the OTN.

The stock answer is a link to Apache POI and if you are lucky some complaints about not googling before asking, however Apache POI is focused on getting data and not on rendering the data.
A few other Java options exist for reading and writing Office documents however we want to be able to render the data a close to Microsoft Office as we can.

Products that I know of can render Office documents:

yeokm1/docs-to-pdf-converter
Irregularly maintained, Pure Java, Open Source
Ties together a number of libraries to perform the conversion.

xdocreport
Actively developed, Pure Java, Open Source
It's Java API to merge XML document created with MS Office (docx) or OpenOffice (odt), LibreOffice (odt) with a Java model to generate report and convert it if you need to another format (PDF, XHTML...).

Snowbound  Imaging SDK
Closed Source, Pure Java
Snowbound appears to be a 100% Java solution and costs over $2,500. It contains samples describing how to convert documents in the evaluation download.

Qoppa jWordConvert
Closed Source, Pure Java
Starting at $600.00 it claims to be able to convert word documents to PDF documents.

OpenOffice API
Open Source, Not Pure Java - Requires Open Office installed
OpenOffice is a native Office suite which supports a Java API, reading Office documents and writing PDF documents. The SDK contains an example in document conversion (examples/java/DocumentHandling/DocumentConverter.java). To write PDFs you need to pass the "writer_pdf_Export" writer rather than the "MS Word 97" one.
Or you can use the wrapper API JODConverter.

Microsoft Office
Closed Source, Not Pure Java - Requires Microsoft Office installed
It is the only software out which will give you a 100% perfect conversion but requires leaving Java behind for this feature.
Word Doc to PDF Conversion, Command line using VBScript and automation by Michael Suodenjoki is a great write-up on how to do this however Microsoft has some notes to read if you go down this path; the take away is "Microsoft does not currently recommend, and does not support, Automation of Microsoft Office applications from any unattended, non-interactive client application or component (including ASP, ASP.NET, DCOM, and NT Services), because Office may exhibit unstable behavior and/or deadlock when Office is run in this environment."

Muhimbi PDF converter
.NET based Web Service
Continuing to leave Java behind Muhimbi have a service which supports Office to PDF conversion and instructions on how to use it in Java on a blog post.

JDocToPdf
Dead, Pure Java, Open Source
Uses Apache POI to read the Word document and iText to write the PDF. Completely free, 100% Java but has some limitations.


If you have experimented with one of these, or used something I've missed please let me know in the comments.

I'm running a very short survey on Word-to-PDF as a service. I would be very grateful if would fill it in. Don't worry I'm not asking for email address to spam.

Updates: 
2013-01-07: Scripting Microsoft Word
2013-04-03: Tidy up opening paragraph. 
2013-06-13: Added Muhimbi
2016-02-11: Added yeokm1/docs-to-pdf-converter & xdocreport, reordered, one-liners added
2017-01-23: Survey 

5 comments:

  1. I have found this online converter
    which allow users to convert MS office documents without installing MS office on their PC to any other formats using Java language and in other languages also.

    ReplyDelete
  2. Does anyone have an experience with the .NET libray Spire.DOC? http://www.e-iceblue.com/Introduce/word-for-net-introduce.html#.WCwF7vkrJaR

    ReplyDelete
  3. Excellent! The perfect list of converting documents.

    ReplyDelete