# [XeTeX] &nbsp; in XeTeX

Zdenek Wagner zdenek.wagner at gmail.com
Sat Nov 12 17:24:48 CET 2011

2011/11/12 Ulrike Fischer <news3 at nililand.de>:
> Am Fri, 11 Nov 2011 16:33:20 +0100 schrieb Zdenek Wagner:
>
>> I still do not understand the internal mechanism. I know how
>> punctuation is handled in French, the category of a few characters is
>> set to 13 and defined as some macros. But how can XeTeX regognize
>> whether the space token with category 10 has to be converted to a
>> nonbreakable space?
>
> There was once a discussion about spaces on the xetex list starting
> here:
>
> http://tug.org/mailman/htdig/xetex/2009-March/012480.html
>
> I don't know if the code discussed there led to a package or found
> its way somehow in the format.
>
> from Jonathan:
>
>>>> %% U+00A0 NO-BREAK SPACE;
>>>> %%   Unicode char for ~.
>>>> \catcode^^^^00a0=\active
>>>> \def^^^^00a0{\nobreakspace}
>
>> Are the definitions necessary? That means how does XeTex handle
>> normally e.g. U+00A0 NO-BREAK SPACE?  Can  there be a line break
>> before or after this input?
>
> XeTeX has no special built-in knowledge about U+00A0 or the various
> other Unicode space-like characters; it will simply "print" them in
> the current font. Which would be fine, except that some fonts fail
> to support them, in which case you'll get a .notdef glyph. :(
>
> Defining these in a font-independent way using TeX seems like a good
> idea in general; however, care may be needed to make them work
> correctly in all contexts, particularly when they occur in text that
> ends up going to the LaTeX .aux file, etc., or into PDF bookmarks. I
> haven't really looked into this, not being a serious LaTeX user,
> just wondering......
>
Thank you for this answer. For LaTeX the definition of non-breakable
space should be more complex. It is necessary to test the current
definition of \protect, thus it is possible to recognize whether we
are typesetting the text or whether the text is being written
somewhere. I cannot explain it precisely, I would have to look to the
detailed description of \protect. Next we have to test \catcode\
which tells whether we are in normal or verbatim mode. The definitions
of ZWJ and ZWNJ are wrong for many cases. These characters are used in
Indian scripts for ligature handling. If they are expanded to
\nobreak\hskip, a word boundary will be created in the middle of a
word and hyphenation will not work. Moreover, the glyphs preceding and
following them may need kerning which will be lost. These definitions
can only be used for poorly defined fonts that do not contain them.
Thus these definitions should not appear in the format or in a package
for general use, they are just emergency definitions that may
sometimes help.
>
> --
> Ulrike Fischer
>
>
>
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>

--
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz