[XeTeX] xe(la)tex to epub?

Ross Moore ross.moore at mq.edu.au
Wed Aug 18 00:11:06 CEST 2010

Hi Khaled and Michiel,

On 18/08/2010, at 6:58 AM, Khaled Hosny wrote:

> On Tue, Aug 17, 2010 at 01:16:02PM -0700, Michiel Kamermans wrote:
>> Khaled,
>>> AFAIK, epup is just a subset of xhtml with a subset of css2, so IMO not a kind of output format that is very well suited for TeX (well, I hardly consider html an output format at all, the output is what the browser renders out of it).

>> For print media the epub format is, of course, nonsense. Hence the
>> desire for parallel format generation.
> I understand the benefits of EPUB, what I don't understand is the need
> for TeX at all.

To me the problem is not about using TeX for formatting,
it is about obtaining different output formats from
the same (La)TeX sources --- especially when math formulas,
and other 2-dimensional layouts, are involved.

Since ePub, and similar, are XML- or XHTML-based, you want the
detailed structure of the tagging to be produced automatically,
without having to make edits on each output result, to "get it right".
You want to enter your information in just one place, in a language
that the author already understands and can use effectively.
Software should then do the rest, modulo possible minor tweaking 
at the end.

This is not just simply a matter of redefining macros, because the
structure rules for the markup can be quite different for different
output formats. So some kind of knowledge about what macros are being
used for, and what kinds of things will follow after, is required 
of any translation software. 
Since LaTeX, processing to PDF as a major form of output, figures
to be the comfortable input format, this is desirable for encoding
the author's work --- though some may say it ought to be in XML.

And since TeX already understands the expansion of macros and their 
arguments, it is attractive to want to use it as a starting point
for generating other formats; but certainly it cannot be the 
whole shebang.

For instance, in my work for Tagged PDF, an XML version will be able
to be exported (using Adobe Acrobat Pro) from the complete PDF.
Mathematics will be fully tagged as MathML, in this view.
Other PDF readers may only see the rendered pages, but others may
be able to use the tagging to extract an alternative view suitable
to their own display screen.

> (X)HTML is dynamic by nature, you should be able to
> resize or change text size and the layout will re-flow, forcing a rigid,
> box based layout that is a direct translation of TeX output just does
> not make much sense to me.

I agree that it is not the TeX *output* that needs to be further 
processed, but the input source --- or something intermediate 
that can be generated and written to a file as a by-product 
of LaTeX processing, with extra packages loaded to achieve this.

TeX4Ht works by putting extra information into the .dvi file, 
to encode the required tagging. An extra post-processor is required
to extract this information, producing HTML or XML or whatever.
That is very similar to what I do for Tagged PDF, where the 
extra post-processor is Acrobat Pro. This is even more flexible
than TeX4HT, since Acrobat can export into a range of formats, 
whereas TeX4ht only produces the format that was specified when 
the .dvi was being created.

> I've the feeling that you are looking for the
> wrong solution to the problem. One of the strengths of TeX that I mis
> in almost all HTML renderers is decent line breaking and hyphenation
> algorithms. While I don't know any any HTML engines, especially
> browsers, that have given much attention to this, there are JavaScript
> implementations of TeX's line breaking and hyphenation algorithms,
> assuming EPUB readers can execute JavaScript, I think this is a good
> compromise. See [1] for example (some interesting links near the end,
> too).
> [1] http://typophile.com/node/71247
> Regards,
> Khaled

Hope this helps,


Ross Moore                                       ross.moore at mq.edu.au 
Mathematics Department                           office: E7A-419      
Macquarie University                             tel: +61 (0)2 9850 8955
Sydney, Australia  2109                          fax: +61 (0)2 9850 8114

More information about the XeTeX mailing list