[XeTeX] Whitespace in input

Keith J. Schultz keithjschultz at web.de
Fri Nov 18 09:00:54 CET 2011

Hi Pihilip,

Thoughout, my programming life and experience I have learned
that internal structure means nothing, as long as the result is correct 
when it comes out.

As you rightfully point out the problem lies inside how TeX internally
handles space characters when adding them to its internal structure.

The fact is that initially, TeX was not designed to handle modern typesetting
well. (Xe)TeX's internals are partially quite outdated. It is possible to to handle
all this "new" type of spaces in (Xe)TeX, yet it is quite awkward and you have to be
a TeXchian to do it properly.

My personal opinion is that TeX et al. has to be revamped completely. Ideally, it should get 
a natural language parser as a front end and the typesetting module as its back-end for its

Yes, I know this would not be TeX any more and require a complete different structure of the
TeX eco-system. Language modules and the like. I you care to discuss this we cam back channel
as it would be to OT, here.


Am 17.11.2011 um 20:56 schrieb Philip TAYLOR:

> Ross, I do not dispute your arguments : I was answering
> Keith's question in an honest way.  I (personally) do not
> think of a space in TeX output as a character at all,
> because I am steeped in TeX philosophy; but I am quite
> willing to accept that /if/ the objective is not to
> produce output for the sake of output, but output for
> subsequent processing as input by another program, then
> there /may/ be an argument for outputting a space as a
> variable-width glyph.
> However, I do think that what appears in the output stream
> is a secondary consideration; far more important (IMHO) is
> how we represent that space /within XeTeX/.  There is, I am
> sure, not a suggestion on the table that we start to treat
> a conventional space in XeTeX other than as TeX has traditionally
> treated it, and therefore the real question is (to my mind),
> "do we adopt an extension of this traditional TeX treatment
> for non-breaking space, thin-space, and any of the other
> not-quite-standard spaces that Unicode encompasses, or do
> we look for an alternative model which /might/ be glyph-
> or character-based ?".

