[XeTeX] Math class initialization in Unicde-aware engine

Ross Moore ross.moore at mq.edu.au
Thu Nov 28 21:11:26 CET 2019

Hi Joseph.

On 28 Nov 2019, at 6:29 pm, Joseph Wright <joseph.wright at morningstar2.co.uk<mailto:joseph.wright at morningstar2.co.uk>> wrote:

On 28/11/2019 00:16, Ross Moore wrote:
If by ignoring you mean removing the character entirely, then that is surely not best at all.
Most  N Class (Normal) characters would be simply of the default  \mathord  class.

That is already the case: it's where IniTeX starts off, chars are mathord. So 'nothing to do here'. Also note that some of this information is already set from the main Unicode file: it tells us which chars are letters.

OK. That’s what I’d expect.

I’d expect others to be mapped instead into a macro that corresponds to something that TeX does support.
 space characters for  thinspace, 2-em space, etc.  in  U+2000 – U+200A
can expand into things like:   \, \; \> \quad \qquad  etc.  ( even to constructions like  \mskip1mu )

That's not a generic IniTeX thing, I'm afraid.

Yeah, well there are so many of these extra space characters.
I really don’t know where they are all used in practice by other (non-TeX) apps.

The Unicode data loaders are explicitly about setting up the basic data in Unicode TeX engines that's held in (primitive) tables.

Creating macros is the job of the 'rest' of the format. Here, presumably you are thinking of making chars math-active: that's well out-of-scope for the loader.

Fair enough; especially if this is all happening before processing any textual input intended for the typeset page.

After all, this is essentially what happens when pdfTeX reads raw Unicode input.

pdfTeX reads bytes, there's not really much comparison. In IniTeX mode, there is not much happening with UTF-8 and pdfTeX: perhaps you are thinking of with LaTeX?

Yes, sure I’m thinking of LaTeX; at least now that UTF-8 input has become the default.
Previously there would be (inputenc) package and  .def  file loading.
But, as you say above, this comes later.

One has to wonder then, how much of the Unicode range needs to be (or can be) handled earlier;
e.g, when there is only one sensible interpretation for the use of specific characters?
Conversely, how much can, or should, be left to later when there may be a better idea of which
(classes of) characters are present within the input source?

I suppose that is the kind of question you are dealing with; so I’ll now butt out of this conversation,
but still watch it if there’s further continuation.




Dr Ross Moore
Department of Mathematics and Statistics
12 Wally’s Walk, Level 7, Room 734
Macquarie University, NSW 2109, Australia
T: +61 2 9850 8955  |  F: +61 2 9850 8114
M:+61 407 288 255  |  E: ross.moore at mq.edu.au<mailto:ross.moore at mq.edu.au>
[cid:image001.png at 01D030BE.D37A46F0]
CRICOS Provider Number 00002J. Think before you print.
Please consider the environment before printing this email.

This message is intended for the addressee named and may
contain confidential information. If you are not the intended
recipient, please delete it and notify the sender. Views expressed
in this message are those of the individual sender, and are not
necessarily the views of Macquarie University. <http://mq.edu.au/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/xetex/attachments/20191128/e172d787/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 4605 bytes
Desc: image001.png
URL: <https://tug.org/pipermail/xetex/attachments/20191128/e172d787/attachment-0001.png>

More information about the XeTeX mailing list