[XeTeX] [tex-hyphen] Hyphenation of polytonic Greek (expressed in Unicode)

Mike Maxwell maxwell at umiacs.umd.edu
Fri Sep 13 01:20:30 CEST 2013


On 9/12/2013 6:17 PM, Khaled Hosny wrote:
> Some writing systems do not use spaces to separate words, so TeX’s
> normal line breaking algorithm will fail. \XeTeXlinebreaklocale
> instructs XeTeX to break the lines based on the rule of those writing
> systems.
>
> ‹Locale ID› should be the ISO code of the language in question,

Hmm, wouldn't this be insufficient information? Some languages are 
written in multiple scripts, and I would not be surprised if word breaks 
are signaled differently in those different scripts.  Japanese, for example?

> documentation is a bit vague, but it seems to calculate the line
> breaking position based on the Unicode character properties and the
> locale value is simply ignored).

That also seems insufficient, since multiple languages may use the same 
script and have different word (and therefore line) breaking 
characteristics. Although perhaps closer, given that scripts that don't 
use spaces are *perhaps* more unique to a particular language, or to a 
small set of similar languages--e.g. Chinese script, to the extent that 
Cantonese and Mandarin are similar in their word break characteristics. 
  But here I'm *really* ignorant.

In general, word breaking in scripts that don't indicate word boundaries 
is a partly unsolved research problem in computational linguistics--and 
from what I've heard, native speakers often disagree.  (If you think 
that's odd, you might consider 'doghouse' vs. 'dog house' in English...) 
  So I suppose it's not surprising if this doesn't work as well in XeTeX 
as one might hope.
-- 
    Mike Maxwell
    "The biggest danger is not ignorance,
    but the illusion of knowledge."
    --Stephen Hawking


More information about the XeTeX mailing list