[XeTeX] hyphenation in Ethiopian languages

Arthur Reutenauer arthur.reutenauer at normalesup.org
Thu May 12 19:00:29 CEST 2011


> Hmm, looking at Microsoft's recommendations[1], it sounds like you should be aiming for glyph 1, and character codes that should map to that glyph include U+0000 (null), U+0008 (backspace) and U+001D (group separator).

  Thanks Jonathan, that's most useful.  Sadly, all of these characters
seem to map to .notdef in Abyssinica, like all of the Unicode characters
you mentioned earlier, apart for ZWNJ and ZWJ.  (Useless piece of
trivia: did you know that as of Unicode 6.1, only four characters have a
name starting with "ZERO WIDTH"?   They've all been mentioned in that
thread.)  Carriage return and line feed both have a zero-width glyph, as
has tabulation (U+0009), again against the recommendation that says that
its glyph should have the same width as the one for space.  That's most
disconcerting.

> With U+000A (LF), there's a greater risk that it will map to .notdef and show up as a box, I think. This certainly used to be fairly common in TrueType fonts, and showed up as boxes at the start of each line when a DOS-originated text file with <CRLF> line-ends was loaded into a classic MacOS application that treated <CR> alone as the line ending, and didn't filter out the <LF> characters.

  Amusing :-)

> So to sum up, I think U+0000 "ought" to work if fonts carefully follow the MS recommendations; if it doesn't, other control-char codes are worth a try, but there's no guarantee that you'll find a universal, font-independent solution.

  Indeed not.  In fact, what you've just said proves that it's probably
hopeless to expect font designers to follow the recommendation in that
particular area.  Better to poke around by trying out a list of possible
characters that could have zero width.

	Arthur


More information about the XeTeX mailing list