[XeTeX] Math class initialization in Unicde-aware engine

Ross Moore ross.moore at mq.edu.au
Thu Nov 28 01:16:44 CET 2019

Hi Joe, Doug

On 28 Nov 2019, at 10:27 am, Joseph Wright <joseph.wright at morningstar2.co.uk<mailto:joseph.wright at morningstar2.co.uk>> wrote:

> # N - Normal - includes all digits and symbols requiring only one form

> # D - Diacritic

> # F - Fence - unpaired delimiter (often used as opening or closing)

> # G - Glyph_Part - piece of large operator

> # S - Space
> # U - Unary - operators that are only unary

> # X - Special - characters not covered by other classes

> Unfortunately, the documentation/comments don't say what happens to entries having these other Unicode math codes (N, D, F, G, S, U, and X). Are they completely ignored, or are they mapped to one of the other eight codes that matches what TeX is interested in or only capable of handing?
> I can imagine that the space character, given Unicode math class 'S' in MathClass.txt, is ignored during this parse. But what happens to the '¬' character (U+00AC) ("NOT SIGN"), which is assigned 'U' (Unary Operator). Surely the logical not sign is not being ignored during initialization of a Unicode-aware engine, yet the comments in load-unicode-math-classes.tex don't say one way or the other, and it appears to me that the parsing code is ignoring it.

The other Unicode math classes don't really map directly to TeX ones, so
they are currently ignored. Suggestions for improvements here are of
course welcome.

If by ignoring you mean removing the character entirely, then that is surely not best at all.

Most  N Class (Normal) characters would be simply of the default  \mathord  class.

I’d expect others to be mapped instead into a macro that corresponds to something that TeX does support.
 space characters for  thinspace, 2-em space, etc.  in  U+2000 – U+200A
can expand into things like:   \, \; \> \quad \qquad  etc.  ( even to constructions like  \mskip1mu )

After all, this is essentially what happens when pdfTeX reads raw Unicode input.

The G class (Glyph_Part) is a lot harder, as those glyph parts don’t correspond to any single
TeX macro. Think about a very large opening brace spanning 3+ ordinary line widths, say,
as may be generated by  \left\{ ... \right\}  surrounding some (inner-) displayed math alignment.
On input, the whole grouping would need to be identified and mapped to appropriate TeX coding.

Basically there is a lot here that needs to be looked more or less individually.

I’ve been through this kind of exercise, in reverse, to decide what to specify as /Alt  and /ActualText
replacements (for accessibility) for what TeX produces with various math constructions.
I don’t have definitive answers for everything, but have tried some possibilities for many things.


Hope this helps.


Dr Ross Moore
Department of Mathematics and Statistics
12 Wally’s Walk, Level 7, Room 734
Macquarie University, NSW 2109, Australia
T: +61 2 9850 8955  |  F: +61 2 9850 8114
M:+61 407 288 255  |  E: ross.moore at mq.edu.au<mailto:ross.moore at mq.edu.au>
[cid:image001.png at 01D030BE.D37A46F0]
CRICOS Provider Number 00002J. Think before you print.
Please consider the environment before printing this email.

This message is intended for the addressee named and may
contain confidential information. If you are not the intended
recipient, please delete it and notify the sender. Views expressed
in this message are those of the individual sender, and are not
necessarily the views of Macquarie University. <http://mq.edu.au/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/xetex/attachments/20191128/2af51253/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 4605 bytes
Desc: image001.png
URL: <https://tug.org/pipermail/xetex/attachments/20191128/2af51253/attachment-0001.png>

More information about the XeTeX mailing list