# [XeTeX] Re: Greek hyphenation

Jonathan Kew jonathan_kew at sil.org
Mon Jan 16 23:08:37 CET 2006

On 16 Jan 2006, at 5:01 pm, Yves Codet wrote:
>
> Le 16 janv. 06, à 14:12, Jonathan Kew a écrit :
>
>> I've fixed one issue in the code for 0.991, at least; I'm not
>> completely sure if this is the only problem, but it seems to be
>> working for me. I hope to find time to finish and release 0.991
>> fairly soon, but email me if you'd like to hand-install and try an
>> interim version to see if it solves the Greek issue for you.
>
> I've just installed 0.991. I guess my gwTeX has to be updated since
> neither the system nor kpsewhich found fmtutil-sys, so I used
> iInstaller to configure.
>
> In the decomposed version of "Daphnis", most of the time there is
> no hyphenation after a diacritic, even if it is two or three
> syllables before the hyphenation point, for instance in a word
> which should have been hyphenated like this: ἁπαλώτε-
> ρα. But there were a few cases of hyphenations after diacritics:
> γενέ-σει, πλού-σιος. As far as I could see, when
> there is no diacritic before the hyphenation point, hyphenation
> always occurs.
>

Aha, I've finally realized what's going on here, and why you're
seeing problems in LaTeX that I can't reproduce in plain XeTeX.

When a hyphenation file is loaded by LaTeX, the whole operation
happens inside a group. So the \lccode assignments for the diacritics
need to be \global, or else they need to be repeated when you
actually select the Greek language in your document.

That's why diacritics in the decomposed text are interfering with
hyphenation: if they have \lccode=0, they end the potentially-
hyphenatable word as far as TeX is concerned. So prefix those
assignments with \global. (I'm thinking that perhaps XeTeX should
have these set by default, via unicode-letters.tex.)

In addition, the patterns file itself is not supposed to use
\newlanguage\greek and \language=\greek; this is supposed to be the
responsibility of the code that loads the hyphenation files. That
doesn't have anything to do with the problems you've seen, but it
does mean that Babel's language management may be a little confused.
If you look at the end of the log file when you make the format file,
you can see there's a problem because a language code gets skipped
(has no patterns).

However, if you do eliminate the \newlanguage\greek and \language
\greek from the hyphenation file, then you can't simply use \language
\greek in your document, either. As far as I can tell, you'll need to
use Babel's
\selectlanguage{greek}
instead. But then that will change the font encoding, and so you'll
need to reset that with
\def\encodingdefault{U}
\fontencoding{U}\selectfont
to get back to the font you asked for with fontspec.

(Maybe there's a better way to deal with this.... any Babel experts
listening?)

> In the precomposed version all hyphenations occur, except in one
> case: ἐθραύοντο· (with the punctuation), though there
> are rules "1τ" and "2ντ" in hyphenation patterns, supposed to
> allow hyphenation between the consonants but not before the first
> one, which works in other cases.
>
> Incidentally, in the precomposed (original) text the dot above the
> line (equivalent of ";") looks fine, but in the decomposed text it
> looks like a small black disk in the PDF output. Maybe an error in
> TECkit?

In the original text, the character is U+0387 GREEK ANO TELEIA.
However, this character has a singleton canonical decomposition to U
+00B7 MIDDLE DOT. So TECkit is correct to change it during
normalization. Unfortunately, the Galatia SIL font apparently
displays U+00B7 with a glyph that would be more appropriate for U
+2022 BULLET. I think there's been some confusion between "middle
dot" and "bullet" in various encodings in the past, so this is not
totally surprising. But it's an error in the font, I'd say.

JK