[tex4ht] [bug #274] tex4ht features vs. lwarp vs. ...

Michal Hoftich michal.h21 at gmail.com
Tue Mar 29 17:26:09 CEST 2016

On Sat, Mar 26, 2016 at 5:44 AM, Radhakrishnan CV <cvr at river-valley.org> wrote:
> On Tue, Mar 22, 2016 at 9:51 PM, Michal Hoftich <michal.h21 at gmail.com>
> wrote:
>> Follow-up Comment #1, bug #274 (project tex4ht):
>> That thread has lot of flaming potential, which I don't want to fire up :)
>> But
>> we probably should clarify some misunderstandings about tex4ht.
> It is more of an impractical approach to tex4ht. The \Configure mechanism in
> tex4ht relies on seeding of configure hooks in the original macros of a
> given package used in the document. Here the question is who will do the
> seeding. Eiten had done it extensively for around 400 and odd packages
> popular during his time. Every day, several packages get updated, newer
> packages get released, people do use many of these at their will and
> freedom. Hence, in the absence of *.4ht's that have hooks-seeded functions
> of newly released/revised packages, it is obvious that tex4ht will break
> down. I have seen documents that use packages like xstring, breqn,
> stackengine, tabstackengine, siunitex, acronym, etc at work and we are
> expected to generate XML out of these documents.  Many packages like
> siunitex, acronym, breqn are now written in expl3 which is another
> challenge.

expl3 shouldn't be a big challenge (other than a bit unusual syntax),
on contrary, it encourages separating of code and design, so it should
be easier to insert tex4ht hooks. at least in theory. for example
xtemplate package seems to have ideas in  quite good direction.
> I would consider tex4ht as a backend. Many packages sadly lack the driver
> (*.4ht) for this backend. The best people to write this backend are the
> authors of these packages since others need more time and in-depth knowledge
> of these packages to write backend drivers.

Sure, the authors are the best persons for that. There are also
packages which cause tex4ht to fail compilation once they are included
(most notoriously fontspec).

> This being the reality, personally I have chosen to redefine macros from
> different packages, in an add-on configuration to be used for XML/HTML
> generation. I agree that this is not the right way or preferred way, but
> practically that is the only solution when one is expected to handle
> hundreds of documents every day with several funny packages and functions
> profusely used. This way, tex4ht works wonderfully well with minimal effort
> for me and I am sure, tex4ht is the best engine among all that would
> generate another markup from TeX documents.

Sometimes it is only option, because users are really innovative in
custom macros writing. I've tried to convert some mathematical books
from Project Guttenberg to epub3 and there were macros which abused
\section commands to just write bold and large text. While it produced
desired output in PDF, the HTML was obviously total mess.

> In view of the above, I suggest that we would request authors to provide the
> backend driver of their packages for tex4ht. This is the only practically
> feasible solution. An example would be hyperref, the primary author of this
> package (Sebastian Rahtz) wrote text4ht driver also since Sebastian was a
> big user and admirer of tex4ht.

I agree :)

> The tex4ht team might come up with necessary documentation of how to write
> .4ht for a package that would largely help authors. If each author spends a
> few more minutes to do their bit, usage of tex4ht will be pleasure then.
> Since HTML is gaining more popularity/usage owing to support of smart
> devices and for its re-flowing ability without losing format features to
> suit the dimensions of device screens (a severe handicap of PDF), authors
> shall invest a bit more energy to provide tex4ht support which is as
> essential as the one provided for outputs like PDF.

I agree as well. There is a question how to create that documentation.
I've tried to write tex4ht tutorial as Wiki on github and I've found
Markdown as too limited (I really don't understand why it is so
popular nowadays. It is indeed easy to write some basic formatting
with, which is fine for Stackexchange answers or custom note archive,
but it is real pain as soon as one needs some more advanced feature).
It also doesn't make much sense to use anything other than TeX for
tex4ht documentation :). So a question is where to host the source
code and generated documentation. Here on Puszcsa? Or Github? It has
buil-in support for page hosting and is easier for colloaboration.

> Secondly, the permissive nature of TeX/LaTeX. $a_{\bf n}$ will create the
> correct output in pdf, but will not generate the right kind of output in
> MathML which should be like.
>       <math>
>         <msub>
>           <mrow>
>             <mi>a</mi>
>           </mrow>
>           <mrow>
>             <mstyle mathvariant="bold">
>               <mi>n</mi>
>             </mstyle>
>           </mrow>
>         </msub>
>       </math>
> The user needs to tag math as $a_{\mathbf{n}}$ for perfect MathML output
> without intervention.  Another common example found in documents is $(a
> ....$), this would be passed by TeX, but not MathML since the closing
> parenthesis is outside math. Prof William Hammond has been campaigning for
> profiled LaTeX for several years now, but many users are hardly bothered
> since they expect other systems to adopt to their non-standard tagging
> methods. This can only result in a frustrating experience with tex4ht
> unfortunately.

We can educate users who actively wants to convert their documents,
they really need to understand the nature of HTML and MathML in order
to produce valid output. Flexibility of TeX if generally good thing
and feature, only the abusing users are problem :)

>> Regarding bug reports, I've tried to compile LWARP documentation. It
>> needed
>> some fixes, there was problem with \label commands used inside \caption,
>> it
>> totally explodes tex4ht. I am not sure whether \caption{caption
>> text\label{some label}} is legal construct in LaTeX, but it should compile
>> without errors.
> This is legal tagging and works OK for me.

Yes, it works with other documents, it must be some conflicting macro
redefinition in one of packages included by Lwarp
>> There is also missing cleveref support, which I thought I reported last
>> summer, but apparently didn't. I have some basic cleveref.4ht file, which
>> works except for links.
>> \fbox contents often overflow the border, it is probably just some CSS
>> issue
>> SVG's produced by Tikz are often invalid, especially when font formatting
>> commands are used in diagrams. I personally use Tikz externalization
>> instead
>> of built-in tex4ht support, it doesn't work correctly in most cases.
> Entirely agree with you. Maybe we can provide an extra package for tikz
> (owing to its extensive usage) to write out tikz picture sources to an
> external TeX file to make it easier enough to process separately, generate
> pdf and convert to desired graphic format. The package will also flag the
> figures automatically in HTML output.  All can be done in one go if the user
> dares to invoke shell-escape.

This is possible with Tikz's externalization mechanism. I should add
configuration for this to helpers4ht bundle (which I should publish on
CTAN, but I still strugle with documentation, as is usual with my
packages), so user coudl need to only require a package in the .cfg

>> [...]
>> PS: we really need some more collaborators to write some documentation,
>> more
>> configurations, automated regression tests etc. or at least some feedback.
>> it
>> isn't really motivating that there is almost no activity on tex4ht mailing
>> list and issue tracker, and then you find elsewhere that something doesn't
>> work, even when it is described how to get it to work in already existing
>> documentation.
> There are thousands of packages in CTAN that are in popular usage. Nobody
> can write *.4ht for all these packages. Unless we get support from authors,
> these packages will not be used if the users want HTML output of their
> documents using tex4ht.  Authors of packages might take note that when PDF
> dies out of the scene, which I hope will happen sooner than we imagine,
> their packages will not be as useful as they would have expected unless
> other output formats are supported.

I totally agree :)

Best regards,

More information about the tex4ht mailing list