[XeTeX] Python Project: PDF Optimization

Fri Jun 4 21:14:02 CEST 2010

Dear XeTeX Mailing List,

As a quick way of introduction, my name is Rob Oakes.  I tend to mostly be a lurker here, but I greatly appreciate the many individuals here who patiently answer questions.  I've learned a great deal from many.

With that said, I am writing regarding a project idea that I am interested in pursuing.  Though it is only tangentially related to xetex, I thought this might be an appropriate forum to present it and solicit feedback/collaborators.

A couple of weeks ago, I was putting together an article about different utilities available for working with PDF documents on Linux (http://blog.oak-tree.us/index.php/2010/05/26/pdf-linux).  While doing so, I looked high and low for something that would make it easy to optimize a PDF for web distribution.  (Specifically, I wanted a tool to downsample images, convert between different color spaces, and streamline the PDF for web viewing.)

I came up (mostly) empty handed.

PDF optimization seems to be a major hurdle to a complete set of PDF related GUI tools on Linux, and is one of the holes and annoyances (referred to as "paper cuts" within the Ubuntu project) that I'm particularly sensitive to.  (I've been working on a book that claims open source tools superior to  proprietary ones for writing, and in the process, I've opened my eyes to all kinds of shortcomings.)  And I've noticed it is a complaint of others, as well.  (The issue of PDF optimization, provided by Acrobat through the "PDF Optimizer" feature seems to come up regularly on several of the mailing lists that I am a member of.)  So, while researching my article, I looked into what it would require to put together such a tool.

As it turns out, slapping together a functional (and useful) prototype probably wouldn't be too hard.

The GUI and framework already exists in the form of PDF-Shufler (http://sourceforge.net/projects/pdfshuffler/), which is written in Python and relies on python-pdf and pyGTK.  The image manipulation and conversions could be done using any one four or five image processing frameworks for Python.  The only missing piece appears to be backend code that can integrate the two, and some GUI code to provide users with options.

I am writing to see if there are any students, budding Python programmers, or others who might be interested in collaborating on this.  I've already created a GUI layout and a pretty detailed spec that could serve as a starting point.  Unfortunately, given work stuff,  an outliner extension for LyX tgat would provide it with some Scrivener like organization abilities and a book I'm writing, I can't take on primary responsibility for yet another project (though I would be happy to both contribute code and experience).  From what I've put together so far, I estimate that it would take about 25 to 30 hours of programming time to implement these features as an extension to PDF-Shuffler.

Such a program would plug a *really* big hole in the world of Linux based/writing and publishing and would be an enormous aid to many people, it would also be a great project for people who wish to learn more about document manipulation or image processing.

Anyone who might interested in helping to tackle this project?

Cheers,

Rob Oakes

PS, as per the GPL, any code contributions would be sent upstream to the maintainer of PDF-Shuffler.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/xetex/attachments/20100604/0ebcf153/attachment.html>