[XeTeX] Whitespace in input

Zdenek Wagner zdenek.wagner at gmail.com
Mon Nov 14 18:07:04 CET 2011


2011/11/14 Philip TAYLOR <P.Taylor at rhul.ac.uk>:
>
>
> mskala at ansuz.sooke.bc.ca wrote:
>
> <various points with which I have no reason to disagree at this time,
> followed by>
>
>> 2. Inevitably, people will include invalid characters in TeX input; and
>> U+00A0 is an invalid character for TeX input.
>
> Firstly (as is clear from the list on which we are discussing
> this), we are not discussing TeX but XeTeX.  Secondly, even
> if we were discussing TeX, on what basis do you claim that
> U+00A0 is invalid ?  And if you assert that it is, /a priori/,
> invalid for TeX, and if your reasons for that assertion are
> sound, do they also support the assertion that it is, /a priori/,
> invalid for XeTeX ?
>
> Remainder snipped, so that we can debate one point at a time.
>
I agree with Phil there is nothing in TeX that makes a character
invalid a priori. It is made invalid by \catcode.

There are two aspects:

A. We are preparing a document to be typeset by TeX. Why on earth
should we use only U+00a0 and not ~ which is clearly visible in any
editor and has been used for a nonbreakable space for years? Why we
use & in \halign or \begin{tabular} and not U+0009?

B. TeX is used to typeset data extracted from a database (or similar
source) that was not TeX-aware at the first place. Such data can
contain not only U+00a0 but even texts as "Tweedledum & Tweedledee",
"12 $", "15 %", "#1", whatever. In such a case we must be aware that
the input may contain arbitrary characters, even those playing special
roles in TeX. We have to handle them properly.

> Philip Taylor
>
>
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>



-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



More information about the XeTeX mailing list