[XeTeX]   in XeTeX

Zdenek Wagner zdenek.wagner at gmail.com
Tue Nov 15 12:07:48 CET 2011

2011/11/14 Mike Maxwell <maxwell at umiacs.umd.edu>:
> On 11/14/2011 4:56 PM, Zdenek Wagner wrote:
>> 2011/11/14 Mike Maxwell<maxwell at umiacs.umd.edu>:
>>> We are not (at least I am not) suggesting that everyone must use
>>> the Unicode non-breaking space character, or etc.  What we *are*
>>> suggesting is that in Xe(La)Tex, we be *allowed* to use those
>>> characters, and that they have their
>> You are allowed to use them, nothing prevents you.
> At least one participant in this thread (or actually the related thread
> "Whitespace in input"--the person in question is mskala at ansuz.sooke.bc.ca)
> has said:
>> U+00A0 is an invalid character for TeX input
> That sounds pretty much like prevention (although maybe you don't agree with
> him).
I strongly disagree. From the TeX point of view a character is invalid
if its \catcode is equal to 15 which is not the case of U+00a0. If an
invalid character is found on input, an error message appears in the
log. It does not happen with U+00a0 because its \catcode is 12 which
means "other character". When talking about \catcode I ave in mind a
value defined in the format. Even if a character is declared as
invalid in the format, a user can assign another \catcode if the
character can be rendered.

> But in fact, the last time I tried this, the NBSP character was interpreted
> in the same way as an ASCII space, which is not what I want.  What I want
> (repeating myself again) is for such characters to--

NBSP's \catcode is 12, so it is just a glyph in the font, it is not
treated specially by XeTeX. Line can be broken at glue if in does not
follow other discardable element, at penalty, at \discretionary but
not at a glyph, that's why this space is nonbreakable in the XeTeX's
eyes. Since it is a glyph, its width is fixed. You can do a few things
with it:

Change its \catcode to 10, then it will be normal
strethable/shrinkable space but will not be nonbreakable

Change its \catcode to 13 and define it as \nobreak\space. In such a
case it will have the same meaning as ~

>>> have their Unicode-defined semantics, to the extent that
>>> makes sense in XeTeX.
> --just the same as I would expect XeTeX (or xdvipdfmx) to correctly handle
> the visual re-ordering behavior of U+09C7 through U+09CC, or U+093F
> (Devanagari vowel sign I).
OpenOffice has some intelligence and recognizes the Devanagari script
automatically. This is not the case of XeTeX. When loading a
Devanagari font you have to switch the script to Devanagari too. Then
XeTeX properly handles U+093F and U+094D (other characters are handled
properly even without setting the script). Similarly you have to set
the Arabic script in order to connect the characters properly, without
setting the script only isolated forms will be typeset. Everything is
done in XeTeX, xdvipdfmx just renders properly reordered and composed
glyphs into PDF. The Velthuis Devanagari package contains even samples
for XeLaTeX, some support files have recently been moved to the
xetex-devanagari package.

>> However, I would not like to think, why I have
>> overful/underful boxes and opening hex editor to see what kind of
>> space is written between words.
> A number of alternatives to a hex editor have been pointed out:
> 1) color coding
> 2) using a font that has a representation of these code points
> 3) using any text editor that allows you to see the Unicode code point of a
> character (I use jEdit this way, I'm sure many other editors offer this
> support)
> Again, this is not about _forcing_ anyone to use NBSP etc., it is about
> _allowing_ their use *with the expected Unicode behavior.*
> --
>        Mike Maxwell
>        maxwell at umiacs.umd.edu
>        "My definition of an interesting universe is
>        one that has the capacity to study itself."
>        --Stephen Eastmond

Zdeněk Wagner

More information about the XeTeX mailing list