[XeTeX] hyphenation in Ethiopian languages

Jonathan Kew jfkthame at googlemail.com
Thu May 12 12:48:20 CEST 2011

On 11 May 2011, at 23:46, Arthur Reutenauer wrote:

>> That doesn't surprise me; I'd expect you to get the font's .notdef glyph (which might be a blank space, as in this example, or a box, or some other symbol).
> Thanks for the explanation, that makes sense.
>> What you want is a character that has a zero-width, invisible glyph; if the font supports any of the Unicode characters such as ZWNBSP or ZWNJ or WJ or CGJ, etc., that ought to work.
>  Yes, that's what I thought too, but it doesn't provide a font-independent solution.
>> Or character 13 (CR) is a likely bet, too.
>  Note that Mojca remarked that using character 10 (LF) produced the desired result in that particular font (Abyssinica SIL).  Is there any reason why one would prefer the former over the latter, or why either of these characters would be a safer bet in general?  I would have thought that both of them, being control characters (sort of), would precisely have no glyph in most fonts; after all, who would want to set a glyph for a character that's supposed to indicate the end of a line of text?

Hmm, looking at Microsoft's recommendations[1], it sounds like you should be aiming for glyph 1, and character codes that should map to that glyph include U+0000 (null), U+0008 (backspace) and U+001D (group separator). They say that U+000D (CR) should have a positive advance width (which is not what you want); although I think I recall seeing somewhat different recommendations in the past, perhaps from Apple.

With U+000A (LF), there's a greater risk that it will map to .notdef and show up as a box, I think. This certainly used to be fairly common in TrueType fonts, and showed up as boxes at the start of each line when a DOS-originated text file with <CRLF> line-ends was loaded into a classic MacOS application that treated <CR> alone as the line ending, and didn't filter out the <LF> characters.

So to sum up, I think U+0000 "ought" to work if fonts carefully follow the MS recommendations; if it doesn't, other control-char codes are worth a try, but there's no guarantee that you'll find a universal, font-independent solution.


[1] http://www.microsoft.com/typography/otspec/recom.htm

More information about the XeTeX mailing list