[XeTeX] Re: Greek hyphenation

Yves Codet ycodet at club-internet.fr
Tue Jan 17 09:07:56 CET 2006


Le 16 janv. 06, à 23:08, Jonathan Kew a écrit :

>> In the decomposed version of "Daphnis", most of the time there is no 
>> hyphenation after a diacritic, even if it is two or three syllables 
>> before the hyphenation point, for instance in a word which should 
>> have been hyphenated like this: ἁπαλώτε-ρα. But there were a few 
>> cases of hyphenations after diacritics: γενέ-σει, πλού-σιος. As far 
>> as I could see, when there is no diacritic before the hyphenation 
>> point, hyphenation always occurs.
>>
>
> Aha, I've finally realized what's going on here, and why you're seeing 
> problems in LaTeX that I can't reproduce in plain XeTeX.
>
> When a hyphenation file is loaded by LaTeX, the whole operation 
> happens inside a group. So the \lccode assignments for the diacritics 
> need to be \global, or else they need to be repeated when you actually 
> select the Greek language in your document.

Many thanks, it works well now: no hyphenation is missing in the 
decomposed version of my test file. So here is the new version of the 
hyphenation file if some members of the list want to test it:

-------------- next part --------------
A non-text attachment was scrubbed...
Name: uni-grhyph.tex
Type: application/x-tex
Size: 5115 bytes
Desc: not available
Url : http://tug.org/pipermail/xetex/attachments/20060117/45a975b1/uni-grhyph.tex
-------------- next part --------------


>
> That's why diacritics in the decomposed text are interfering with 
> hyphenation: if they have \lccode=0, they end the 
> potentially-hyphenatable word as far as TeX is concerned. So prefix 
> those assignments with \global. (I'm thinking that perhaps XeTeX 
> should have these set by default, via unicode-letters.tex.)

It might be a good thing since the same issue will occur for many other 
languages.

> In addition, the patterns file itself is not supposed to use 
> \newlanguage\greek and \language=\greek; this is supposed to be the 
> responsibility of the code that loads the hyphenation files. That 
> doesn't have anything to do with the problems you've seen, but it does 
> mean that Babel's language management may be a little confused. If you 
> look at the end of the log file when you make the format file, you can 
> see there's a problem because a language code gets skipped (has no 
> patterns).

Yes, I had noticed that when we worked on Sanskrit hyphenation.

>
> However, if you do eliminate the \newlanguage\greek and 
> \language\greek from the hyphenation file, then you can't simply use 
> \language\greek in your document, either. As far as I can tell, you'll 
> need to use Babel's
> 	\selectlanguage{greek}
> instead. But then that will change the font encoding, and so you'll 
> need to reset that with
> 	\def\encodingdefault{U}
> 	\fontencoding{U}\selectfont
> to get back to the font you asked for with fontspec.
>
> (Maybe there's a better way to deal with this.... any Babel experts 
> listening?)

It would be nice to be able to load hyphenation patterns in LaTeX 
without Babel, i.e. without "language.dat". But I understand some 
members of this list want to use "greek.ldf". So the best would be to 
have a clean solution for both kinds of users. Pray, LaTeX गुरवः, do 
something for us!

> In the original text, the character is U+0387 GREEK ANO TELEIA. 
> However, this character has a singleton canonical decomposition to 
> U+00B7 MIDDLE DOT. So TECkit is correct to change it during 
> normalization. Unfortunately, the Galatia SIL font apparently displays 
> U+00B7 with a glyph that would be more appropriate for U+2022 BULLET. 
> I think there's been some confusion between "middle dot" and "bullet" 
> in various encodings in the past, so this is not totally surprising. 
> But it's an error in the font, I'd say.
>

It seems so: I've tried with Gentium and the "ano teleia" looks normal.

Best wishes,

Yves



More information about the XeTeX mailing list