[XeTeX] Whitespace in input

Karljurgen Feuerherm kfeuerherm at wlu.ca
Wed Nov 16 00:15:22 CET 2011

I was going to make the following point earlier--maybe in light of
Phil's conclusion I should do it now.

There seems to be a tendency not to distinguish between a(n orginal)
character in the sense of character of a writing system, and a computer

The former are visible symbols on a background medium. The latter are
an entirely different set of symbols which to some extent parallel the
former, and some extent do not. Space, control codes, etc. don't exist
in the former, but exist in the latter because it was a convenient way
to encode certain functions one wished to apply to the encoded other
characters--the ones that correspond more or less to original writing
system characters.

These encoding sets have developed over time, and have consequently
inherited all sorts of legacy issues, not all of which need supporting.
Unicode provides tools. No one says one has to use them all.

Specifically, the purpose of XeTeX and other such engines is to all for
the nice typographical formatting of visual representations of script
characters against some other defined background. From that point of
view, so long as it does it, once it does it, it has achieved its goal.

Transparency of all sorts of other things, providing input via PDF to
other software isn't and shouldn't be a *primary* goal.

That being said, no doubt it might be helpful to some to have this or
that control character passed along. But that's not the essence of the
exercise, and should only be done if it can be done cheaply, i.e.
without a lot of risk to the primary objective.

I guess the real question is that latter part.


>>> On Tue, Nov 15, 2011 at  4:45 PM, in message
<4EC2DD63.3040302 at Rhul.Ac.Uk>,
Philip TAYLOR <P.Taylor at Rhul.Ac.Uk> wrote:

> Ross Moore wrote:
>> On 16/11/2011, at 5:56 AM, Herbert Schulz wrote:
>>> Given that TeX (and XeTeX too) deal wit a non-breakble space
already (where
> we usually use the ~ to represent that space) it seems to me that
> should treat that the same way.
>> No, I disagree completely.
>> What if you really want the Ux00A0 character to be in the PDF?
>> That is, when you copy/paste from the PDF, you want that character
>> to come along for the ride.
> I'm not sure I entirely go along with this argument, Ross.
> "What if you really want the \ character to be in the PDF",
> or the "^" character, or the "$" character, or any character
> that TeX currently treats specially ?  Whilst I can agree
> that there is considerable merit in extending XeTeX such
> that it treats all of these "new", "special" characters
> specially (by creating new catcodes, new node types and so
> on), in the short term I can see no fundamental problem with
> treating U+00A0 in such a way that it behaves indistinguishably
> from the normal expansion of "~".
>> In TeX ~ *simulates* a non-breaking space visually, but there is
>> no actual character inserted.
> And I don't agree that a space is a character, non-breaking or not !
> ** Phil.
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
>   http://tug.org/mailman/listinfo/xetex

More information about the XeTeX mailing list