[XeTeX] hyphenation problem
Jonathan Kew
jonathan at jfkew.plus.com
Fri Sep 26 10:06:02 CEST 2008
On 26 Sep 2008, at 7:07 AM, Mehdi Omidali wrote:
> Hi all,
> In farsi we have words like
> میبینم
> and there is no space between ی and ب. In a standard farsi keyboard
> layout, we
> use Shift+B to put a zero-width space between these two characters
> (unless we get میبینم).
The character you are using is not ZERO WIDTH SPACE (U+200B in
Unicode), but ZERO WIDTH NON-JOINER (U+200C). This is an important
distinction, because they have different properties with respect to
line breaking.
(ZWNJ is the appropriate character to use for this purpose in Persian,
so your text is fine as far as that is concerned; just wanted to be
clear what character we're dealing with.)
>
> I can simply do it in the body of my file and xetex shows it correct
> but when I use
> \hyphenation{می-بینم}
>
> (where after ی there is a Shift+B) then I got an error that says
> میب not a character.
To be more accurate, it says "Not a letter", and the first line of the
context stops after "\hyphenation{می"; it does not include the
"ب". It's generally helpful if you copy/paste precise error messages
rather than typing roughly what you think you remember.
This occurs because words given in the \hyphenation command must
consist of what TeX considers to be "letters", which in this context
means that they must have a non-zero \lccode value. (See The TeXbook
for full details.) U+200C is not initialized as a "letter" in XeTeX
because it does not have "letter" properties defined in the Unicode
standard, and therefore it is not permitted in \hyphenation.
To fix this, you can set its \lccode before trying to give
\hyphenation entries that require this character:
\lccode"200C="200C
\hyphenation{می-بینم}
I'm a little surprised that you'd want this, actually; most Persian
users I have known insist on keeping these kind of prefixes and
suffixes attached to the root word, not allowing a line break. But if
you do want hyphenation here, it should work once you set the \lccode
of the ZWNJ character appropriately.
HTH, JK
More information about the XeTeX
mailing list