# [XeTeX] Unicode space characters

maxwell maxwell at umiacs.umd.edu
Wed Feb 3 21:22:40 CET 2010

I have been inserting zero width space characters (U+200B) into text at
punctuation marks where I wanted an optional break.  (Examples: "foo/bar"
or "preposition+pronoun"--and no, the author does not want a real ASCII
space character there.)  To my surprise, this did not make any difference
in the rendering in XeLaTeX; I still get an "overfull box."

It turns out there was a thread here just under a year ago with the same
subject line ("Unicode space characters"), where Jonathan Kew clarified
that "XeTeX has no special built-in knowledge about U+00A0 or the various
other Unicode space-like characters; it will simply 'print' them in
the current font."  Which explains my problem.  Tomáš Janoušek was going
to make a package to handle those appropriately.  AFAIK, the package does
not yet exist (I'd be happy to find out I'm wrong).

In the meantime, is there a snippet of XeTeX code that I can insert into
my preamble that makes U+200B act as an optional line break?  (I guess
U+0082 would also work, but I'm already using ZWSP.)  The thread I
mentioned may say (see in particular the msg at
http://www.tug.org/pipermail/xetex/2009-March/012480.html), but if it does
I'm afraid I'm not enough of a Techie to understand.  The original 'z at skip'
in the thread got interpreted as an email address and turned into 'z at
skip'.  That I fixed (no guarantees this won't get munged as well!), giving
----------
\catcode^^^^200b=\active
\def^^^^200b{\hskip\z at skip}
----------
which I copied into my prologue.  But xelatex still complains:
-----------
! Undefined control sequence.
â->\hskip \z
@skip
l.10664 ...nd \urdu{ØªÙ} /tÅ«/; these pronoun+â
postposition
-----------
so apparently I'm still doing s.t. wrong.

BTW, I can't insert \discretionary in my input text in place of the
U+200B, because my texts are actually in XML, and get converted to xetex
using dblatex (and I don't want to put LaTeX commands in my XML, since
that's only one possible way of rendering the XML).  I guess I could
convert ZWSP to \discretionary during the conversion from XML, but it seems
like defining the char in the preamble would be cleaner.  Whatever way I do
this, I guess the ZWSP itself should not be preserved, since a given font
may not have a (null) glyph assigned to it.

Mike Maxwell
`