[XeTeX] hyphenation problem

Jonathan Kew jonathan at jfkew.plus.com
Fri Sep 26 10:06:02 CEST 2008


On 26 Sep 2008, at 7:07 AM, Mehdi Omidali wrote:

> Hi all,
> In farsi we have words like
> می‌بینم
> and there is no space between ی and ب. In a standard farsi keyboard  
> layout, we
> use Shift+B to put a zero-width space between these two characters
> (unless we get میبینم).

The character you are using is not ZERO WIDTH SPACE (U+200B in  
Unicode), but ZERO WIDTH NON-JOINER (U+200C). This is an important  
distinction, because they have different properties with respect to  
line breaking.

(ZWNJ is the appropriate character to use for this purpose in Persian,  
so your text is fine as far as that is concerned; just wanted to be  
clear what character we're dealing with.)

>
> I can simply do it in the body of my file and xetex shows it correct
> but when I use
> \hyphenation{می‌-بینم}
>
> (where after ی there is a Shift+B) then I got an error that says
> می‌ب not a character.

To be more accurate, it says "Not a letter", and the first line of the  
context stops after "\hyphenation{می‌"; it does not include the  
"ب". It's generally helpful if you copy/paste precise error messages  
rather than typing roughly what you think you remember.

This occurs because words given in the \hyphenation command must  
consist of what TeX considers to be "letters", which in this context  
means that they must have a non-zero \lccode value. (See The TeXbook  
for full details.) U+200C is not initialized as a "letter" in XeTeX  
because it does not have "letter" properties defined in the Unicode  
standard, and therefore it is not permitted in \hyphenation.

To fix this, you can set its \lccode before trying to give  
\hyphenation entries that require this character:

   \lccode"200C="200C
   \hyphenation{می‌-بینم}

I'm a little surprised that you'd want this, actually; most Persian  
users I have known insist on keeping these kind of prefixes and  
suffixes attached to the root word, not allowing a line break. But if  
you do want hyphenation here, it should work once you set the \lccode  
of the ZWNJ character appropriately.

HTH,   JK




More information about the XeTeX mailing list