[XeTeX] xe(la)tex to epub?

Khaled Hosny khaledhosny at eglug.org
Wed Aug 18 03:19:41 CEST 2010

On Wed, Aug 18, 2010 at 10:48:59AM +1000, Ross Moore wrote:
> Hi Michiel,
> On 18/08/2010, at 10:28 AM, Michiel Kamermans wrote:
> > Khaled, Ross,
> >> This can be useful, however, if one have existing TeX material that need
> >> to be processed to other output format, though one can still argue that
> >> converting it ones to some sort of XML is much better long term plan.
> >> 
> >> Don't get me wrong, I like TeX syntax and find it more easier to author
> >> with than many other markups, but I accept that it does not fit every
> >> need.
> >>   
> > 
> > I write textbooks. I write these in TeX, because that allows me to easily modify very large, structured documents. I have used DocBook in the past, and the best way I can summarise it, is "easy to write initially, migraine inducingly insane to update or revise". It is really easy to mark up a document as DocBook, and it is then very hard to modify the structure without getting so frustrated with the utterly inadequate DocBook editors on the market that you resort to completely wiping the document's markup, moving everything around, and then reapplying all the markup.
> > 
> > TeX, on the other hand, is "steep learning curve for the initial document, child's play to revise". It's why I gave up DocBook in favour of TeX. So that's my situation. My texts are in TeX, and we take it from there: I would like to generate not just pdf, but also epub from these sources, without having to write a completely different book using a completely different toolchain that gives me two completely different documents with the same words in it... I can't even being to imagine the potential for errors and inconsistencies that introduces =)
> This is exactly what I expected.
> Use of computer software for books, documentation, etc.
> should be about what is convenient for the author to write
> and maintain, not what is best suited to machine transfer.

True, but then you have to accept its limitations. I've very slow
typing speed, it is far more convenient to me to write on paper than to
type on a keyboard, but I know I can't get nice printed documents from
my handwriting, so I've to trade my personal convenience with the ease
of getting ready to print documents.

> > It's not really about a favourite "synax", it's about having a tool that already produces a device independent document format that then gets converted to a specific device readable format. Can that independent format also be converted to epub? If it can't, that unfortunate, and perhaps someone will end up writing a dvi to epub driver (limited in its functionality by what epub offers for document layout).
> Agreed. Getting onto other devices is about having well-written
> translators between formats. It certainly should not require
> the author to re-write large chunks of manuscript.

I hope if it has been easier to "translate" TeX output, but unless you
restrict your TeX input to a managed subset, it is near impossible to
translate it by anything but TeX itself. However, starting from TeX
output, whether DVI or PDF, you loss many important information, not
only the structure of your document, but the actual textual material. It
is almost impossible to retrieve the original Arabic text from a PDF,
for example, unless every word have been tagged with an /ActualText tag
or any other form of saving the original text alongside the visual
output. The same goes for any complex textual material, like
mathematical formulas, for example.
> Computers are meant to help people do a better job.
> They should not be mandating requirements that force authors 
> to do more work, of a repetitious nature that adds little 
> extra value to work already done.
> Of course someone needs to write those translators, and make
> them sufficiently flexible as well as being robust.
> But that's what computer scientists and programmers are
> paid for, surely.   :-)

Good luck parsing TeX macros :) (there are certainly reasons why there
is no "sufficiently flexible as well as being robust" (La)TeX to
anything translators out there.)

I suggest trying PlasTeX[1], though. Last time I checked, they had
everything implemented in Python and all TeX translation is done on
there own, generating clean HTML, which I think should not be hard to
repackaged as EPUB.


 Khaled Hosny
 Arabic localiser and member of Arabeyes.org team
 Free font developer

More information about the XeTeX mailing list