[tex4ht] Getting correct MathML for multi-character symbols

Michal Hoftich michal.h21 at gmail.com
Fri Feb 2 20:52:17 CET 2018


Hi Bill,

On Thu, Feb 1, 2018 at 10:04 PM, William F Hammond <gellmu at gmail.com> wrote:
> The discussion about correct MathML output for numbers bring this old issue
> back to me.  Take, for example, the common math (though not engineering)
> symbol "Hom".  Let's assume we want it to become <mi> -- although some may
> wish for <mo>.  What LaTeX markup for Hom should be given to tex4ht (that
> also works for regular LaTeX)?
>

In such cases I think it is best to use custom command, like \Hom,
which can be configured in tex4ht to produce the desired output.

> There is no way that I think is fully correct unless it involves something
> like
> \DeclareMathOperator, which should be handled, but is too heavy for casual
> use.
>
> For casual use \mathrm{Hom} will generate <mi>Hom</mi> with tex4ht and, if I
> might add, also with latexml.  The problem with using \mathrm this way is
> that its LaTeX purpose is a zone font operation and its argument for regular
> LaTeX may contain math expressions, not just a single symbol name.  A
> translation to MathML can only safely set \mathrm as <mi> when its content
> is free of operations, otherwise it needs to become, I suppose, <mstyle>.
>

We just found that tex4ht contains some undocumented post-processing
script for XHTML+MathML output, which tries to fix some issues that
are hard to fix on the TeX level. Like the <mn> issue. Other think it
does is `<mi><mstyle mathvariant="bold">div</mstyle></mi>` -> `<mi
mathvariant="bold">div</mi>`.

It produces invalid HTML unfortunately. I think easier for me will be
to recreate it using make4ht Lua filters than trying to understand how
these old tools work. Do you have more examples of problematic output
that should be taken into account?

> I've always thought that \mbox{Hom} should be the way to go.  In regular
> LaTeX one cannot set a math expression in an mbox inside math without
> explicitly returning to math mode.  However, for \mbox{Hom} tex4ht uses
> <mtext> (as does also latexml).  But isn't <mtext> is supposed to be a
> semantic escape from math expression parsing that a computer algebra system
> can ignore?  In fact, amsmath introduced \text{} for semantic escaping from
> math expression parsing.
>
> By the way, as for \mbox{} found in LaTeX outside of math, I think its
> content should be passed through to html as if the \mbox{} were not present.
> For \mbox{} inside math, internal math content should throw an error for
> translation to MathML.  If the content has no internal white space, it
> should be <mi>, but <mtext> if there is internal white space.
>

I honestly don't know answers to these questions :/

Best regards,
Michal


More information about the tex4ht mailing list