[XeTeX] Whitespace in input

Ross Moore ross.moore at mq.edu.au
Tue Nov 15 23:47:16 CET 2011

Hi Phil,

On 16/11/2011, at 8:45 AM, Philip TAYLOR wrote:

> Ross Moore wrote:
>> On 16/11/2011, at 5:56 AM, Herbert Schulz wrote:
>>> Given that TeX (and XeTeX too) deal wit a non-breakble space already (where we usually use the ~ to represent that space) it seems to me that XeTeX should treat that the same way.
>> No, I disagree completely.
>> What if you really want the Ux00A0 character to be in the PDF?
>> That is, when you copy/paste from the PDF, you want that character
>> to come along for the ride.
> I'm not sure I entirely go along with this argument, Ross.
> "What if you really want the \ character to be in the PDF",
> or the "^" character, or the "$" character, or any character
> that TeX currently treats specially ?  

TeX already provides \$ \_ \# etc. for (most of) the other special
characters it uses, but does not for ^^A0 --- but it does not
need to if you can generate it yourself on the keyboard.

> Whilst I can agree
> that there is considerable merit in extending XeTeX such
> that it treats all of these "new", "special" characters
> specially (by creating new catcodes, new node types and so
> on), in the short term I can see no fundamental problem with
> treating U+00A0 in such a way that it behaves indistinguishably
> from the normal expansion of "~".

How do you explain to somebody the need to do something really,
really special to get a character that they can type, or copy/paste?

There is no special role for this character in other vital aspects 
of how TeX works, such as there is for $ _ # etc.

>> In TeX ~ *simulates* a non-breaking space visually, but there is
>> no actual character inserted.
> And I don't agree that a space is a character, non-breaking or not !

In this view you are against most of the rest of the world.

If the output is intended to be PDF, as it really has to be with 
XeTeX, then the specifications for the modern variants of PDF 
need to be consulted.

With PDF/A and PDF/UA and anything based on ISO-32000 (PDF 1.7)
there is a requirement that the included content should explicitly
provide word boundaries. Having a space character inserted is by
far the most natural way to meet this specification.
(This does not mean that having such a character in the output
need affect TeX's view of typesetting.)

Before replying to anything in the above paragraph, please
watch the video of my recent talk at TUG-2011.


or similar from earlier years where I also talk a bit about such things.

> ** Phil.

Hope this helps,


Ross Moore                                       ross.moore at mq.edu.au 
Mathematics Department                           office: E7A-419      
Macquarie University                             tel: +61 (0)2 9850 8955
Sydney, Australia  2109                          fax: +61 (0)2 9850 8114

More information about the XeTeX mailing list