[tex-hyphen] Greek hyphenation patterns

Mojca Miklavec mojca.miklavec.lists at gmail.com
Mon Jun 16 16:56:48 CEST 2008


On Wed, May 21, 2008 at 6:06 PM, Filippou, Dimitrios (RTIT) wrote:
>
> Now, I have almost completed the work on UTF-8 patterns for ancient Greek (see also attached), but I have come across a small incident and I need your help: I noticed that words ending with a consonant (e.g., λ) and right guillemets (») get hyphenated before the consonant, which is wrong, despite the patterns that prohibit hyphenation before final consonants (e.g., 4λ.). To overcome the problem I added extra prohibitive patterns with the right guillemets after the final consonant (e.g., 4λ».). But I think that the problem is just the \llcode of the character ». What do you think?
>
> As soon as I fix that problem with the right guillemets, I will release the revised patterns on CTAN.
>
> Many thanks for your interest in my patterns and your help in debugging them.

Hello Dimitrios,

can you please check if the attached patch (by Jonathan) fixes the
issue you had with faulty hyphenations in XeTeX and then remove the
guillemets from patterns again?

You don't need to release them under a new version. The source of
patterns as will be included in TeX Live 2008 will come from here:
     http://www.tug.org/svn/texhyphen/trunk/hyph-utf8/tex/generic/hyph-utf8/patterns
and we will submit files to CTAN and to TeX Live.

It would be best if you could simply prepare a version of patterns
called hyph-el-monoton.tex, hyph-el-polyton.tex, hyph-grc.tex with the
following constraints:
- no TeX macros in patterns except for \patterns and \hyphenation, in
particular we mean: no \message, no \lccode, no \begingroup, no
\endinput
- pure UTF-8
- if you need special characters, let us know (they will be put under
loadhyph-el-*.tex)

I assume that it should be rather easy to fix a few lines in your perl code.

These UTF-8 patterns will be then considered upstream, and I will try
to convince authors of other patterns to switch to the new loading
mechanism (i.e.: update these new patterns).

Of course we can do the "conversion" as well, but it makes more sense
if you as the author can fix patterns whenever you need to, else the
changes may go unnoticed for us and it would be unnecessary double
work.

You may ask Karl for an account for SVN to be able to access & modify
your files in the repository if you want. I have put a copy of your
script to the repository as well, but it would be really really really
great if you could take care of all the tools you need to generate the
patterns in this repository.

Thanks a lot,
    Mojca

PS: here are some radical comments by Hans. Please defend
yourself/explain when these are needed :) :) :)

>> MtxRun | checking language ??, file hyph-grc.tex
>> MtxRun | invalid character       (0x0009) in patterns of language ??,
>> file hyph-grc.tex, n=17
>> MtxRun | invalid character » (0x00BB) in patterns of language ??, file
>> hyph-grc.tex, n=19
>
> leftguilemmot ... get rid of it
>
>> MtxRun | invalid character ᾿ (0x1FBF) in patterns of language ??, file
>> hyph-grc.tex, n=78
>
>> MtxRun | invalid character ᾽ (0x1FBD) in patterns of language ??, file
>> hyph-grc.tex, n=78
>
> GREEK PSILI GREEK KORONIS (category sk)
>
> maybe ask Thomas S of that stuff makes sense
>
>> MtxRun | invalid character ʼ (0x02BC) in patterns of language ??, file
>> hyph-grc.tex, n=78
>
> MODIFIER LETTER APOSTROPHE
>
>> MtxRun | invalid character ' (0x0027) in patterns of language ??, file
>> hyph-grc.tex, n=78
>
> APOSTROPHE
>
>> MtxRun | invalid character ' (0x2019) in patterns of language ??, file
>> hyph-grc.tex, n=78
>> MtxRun | there are errors that need to be fixed
>
> RIGHT SINGLE QUOTATION MARK
>
> maybe that's all related to some funny input encoding
>
>> Arthur, same question about 0x0009.
>> » needs to be removed from patterns (Jonathan will hopefully fix the ini
>> file).
>> I guess that other 5 characters are needed. lccodes are set in
>> loadhyph for five characters.
>
> i think they should all go away
>
>> MtxRun | checking language ??, file hyph-cop.tex
>> MtxRun | invalid character ̀ (0x0300) in patterns of language ??, file
>> hyph-cop.tex, n=72
>> MtxRun | invalid character ̈ (0x0308) in patterns of language ??, file
>> hyph-cop.tex, n=5
>> MtxRun | there are errors that need to be fixed
>
> COMBINING GRAVE ACCENT
> COMBINING DIAERESIS
>
> kick out those lines
>
>> No idea. Done by Jonathan. Germans say: "Ich verstehe Bahnhof." (or:
>> "It's all Greek to me.") But these are combining characters as far as
>> I can see.
>>
>>
>> There are some other characters, mostly apostrophes. I suspect (don't
>> know, only suspect) that the functionality of patterns in such cases
>> changes if apostrophe is remapped to "single right quotation
>> character" with mapping=tex-text, and I suspect that people might be
>> using different characters in composed-words, but I might be wrong.
>
> wipe'm out ... either make patterns really clever (so, all kind of
> combinations) or not .. probably much of this dates from the 8 bit times
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: xelatex-ini-patch.txt
Url: http://tug.org/pipermail/tex-hyphen/attachments/20080616/eb34b3ee/attachment-0001.txt 


More information about the tex-hyphen mailing list