[XeTeX] Greek Hyphenation (monotoniko)

Mon Jan 9 15:19:29 CET 2006

On 9 Jan 2006, at 1:33 pm, Yves Codet wrote:

>
> Le 9 janv. 06, à 13:42, Jonathan Kew a écrit :
>
> Hello.
>
> I am trying my luck with Jonathan's exercise but I have a few  
> questions about it.
>
>>
>> * use U+02BC MODIFIER LETTER APOSTROPHE for the apostrophe  
>> (elision), not ''
>
> Should U+02BC be used instead of U+2019 in Greek?

Probably, when it is functioning as an apostrophe/elision mark, which  
is logically part of a word, as opposed to a punctuation mark  
(closing quote).

However, in practice I wouldn't be surprised if people use U+2019;  
it's difficult to maintain such distinctions, when the characters  
look the same. The fact that 2019 behaves as a punctuation mark,  
while 02BC behaves as a letter, will be ignored by most people most  
of the time!

>
> Is coronis the same character as smooth breathing?

Yes, I believe so.

>
>> * use U+2060 WORD JOINER as compound word mark, not the letter "v"
>
> I thought "v" was for digamma in Claudio Beccari's file :) What is  
> the use of a compound word mark in Greek?

Oh! I was going by the comments at the top of the file (as I don't  
really know anything about Greek). Also, would digamma typically be  
found in modern Greek? I thought it was an archaic letter, so  
wouldn't expect it to be included at all in a hyphenation file  
intended for modern monotonic text.

>
>> One further issue to consider would be composed vs. decomposed  
>> text; this file uses precomposed letters for the vowels with tonos  
>> or dieresis, but these could also be encoded as sequences of vowel  
>> + diacritic. So additional rules should be included to recognize  
>> those forms as well. This is left as an exercise for the  
>> reader.... :-)
>
> It is probably handier to encode them like that, unless your  
> keyboard's width is two meters (for ancient Greek at least). Yes,  
> you could use dead keys but it would take a while to create a  
> layout. But my question is: if breathings, accents, diaeresis and  
> iota subscript are not declared as letters (and they should not be,  
> should they?), there is no need of rules prohibiting break before  
> them. Am I right?

They'd better be declared as "letters" from the point of view of  
TeX's hyphenation routine, which means they need to have catcode 11  
or 12, and non-zero \lccode. Otherwise they'll break words up and  
hyphenation won't be applied to the proper complete sequences.

So I think the right thing to do is to ensure \lccode<char> = <char>  
for each of these diacritics, and include hyphenation rules for both  
the precomposed and decomposed representations. (Remember that  
regardless of which form you happen to use when you type, with the  
particular keyboard layout you like to use, you might also get text  
from other sources that uses a different encoding form. Or text that  
you originally typed using combining diacritics might go through some  
other process that applies NFC normalization.)

JK