[XeTeX] Whitespace in input

Zdenek Wagner zdenek.wagner at gmail.com
Wed Nov 16 00:08:29 CET 2011


2011/11/15 Ross Moore <ross.moore at mq.edu.au>:
> Hi Phil,
>
> On 16/11/2011, at 8:45 AM, Philip TAYLOR wrote:
>
>> Ross Moore wrote:
>>>
>>> On 16/11/2011, at 5:56 AM, Herbert Schulz wrote:
>>>
>>>> Given that TeX (and XeTeX too) deal wit a non-breakble space already (where we usually use the ~ to represent that space) it seems to me that XeTeX should treat that the same way.
>>>
>>> No, I disagree completely.
>>>
>>> What if you really want the Ux00A0 character to be in the PDF?
>>> That is, when you copy/paste from the PDF, you want that character
>>> to come along for the ride.
>>
>> I'm not sure I entirely go along with this argument, Ross.
>> "What if you really want the \ character to be in the PDF",
>> or the "^" character, or the "$" character, or any character
>> that TeX currently treats specially ?
>
> TeX already provides \$ \_ \# etc. for (most of) the other special
> characters it uses, but does not for ^^A0 --- but it does not
> need to if you can generate it yourself on the keyboard.
>
^^^^00a0
>
>> Whilst I can agree
>> that there is considerable merit in extending XeTeX such
>> that it treats all of these "new", "special" characters
>> specially (by creating new catcodes, new node types and so
>> on), in the short term I can see no fundamental problem with
>> treating U+00A0 in such a way that it behaves indistinguishably
>> from the normal expansion of "~".
>
> How do you explain to somebody the need to do something really,
> really special to get a character that they can type, or copy/paste?
>
> There is no special role for this character in other vital aspects
> of how TeX works, such as there is for $ _ # etc.
>
>
>>>
>>> In TeX ~ *simulates* a non-breaking space visually, but there is
>>> no actual character inserted.
>>
>> And I don't agree that a space is a character, non-breaking or not !
>
> In this view you are against most of the rest of the world.
>
TeX NEVER outputs a space as a glyph. Text extraction tools usually
interpret horizontal spaces of sufficient size as U+0020.

(The exception to the above mentioned "never" is the verbatim mode.)

> If the output is intended to be PDF, as it really has to be with
> XeTeX, then the specifications for the modern variants of PDF
> need to be consulted.
>
> With PDF/A and PDF/UA and anything based on ISO-32000 (PDF 1.7)
> there is a requirement that the included content should explicitly
> provide word boundaries. Having a space character inserted is by
> far the most natural way to meet this specification.

A space character is a fixed-width glyph. If you insist in it, you
will never be able to typeset justified paragraphs, you will move back
to the era of mechanical typewriters.

> (This does not mean that having such a character in the output
> need affect TeX's view of typesetting.)
>
> Before replying to anything in the above paragraph, please
> watch the video of my recent talk at TUG-2011.
>
>  http://river-valley.tv/further-advances-toward-tagged-pdf-for-mathematics/
>
> or similar from earlier years where I also talk a bit about such things.
>
>>
>> ** Phil.
>
>
> Hope this helps,
>
>        Ross
>
> ------------------------------------------------------------------------
> Ross Moore                                       ross.moore at mq.edu.au
> Mathematics Department                           office: E7A-419
> Macquarie University                             tel: +61 (0)2 9850 8955
> Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
> ------------------------------------------------------------------------
>
>
>
>
>
>
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>



-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



More information about the XeTeX mailing list