[tex-hyphen] Fwd: Re: missing hyphen points in monotonic greek

Claudio Beccari claudio.beccari at gmail.com
Sun Jul 27 17:00:30 CEST 2014


I am  registered to the group, but at the moment I do not have with me 
my coordinates.

Mojca Miklavec suggested that I forward to the group the message that 
follows. Here are her words:

First of all, I believe that it would be very helpful if Claudio would
send the initial long email (from 22nd of July) to the tex-hyphen
mailing list and it would be great if the discussion continued there.
It would be very helpful to have the discussion archived online. Could
this be done?

Very gladly I am forwarding the  message, ad on my side I try to continue the discussion on this mailing list.

Meawhile it is worth noting that I am trying to do what I did with monotonic Greek in order to extend to LGR encoding the polytonic Grek version. If I succeed, I continue with the ancient Greek variety. But of course Dimitrios Filippou has to supervise the pattern files, and Günter Milde has to suoervise the update to the babel-greek source version of the greek.ldf file.

Claudio Beccari




-------- Original Message --------
Subject: 	Re: missing hyphen points in monotonic greek
Date: 	Tue, 22 Jul 2014 22:54:21 +0200
From: 	Claudio Beccari <claudio.beccari at gmail.com>
To: 	Guenter Milde <milde at users.sf.net>
CC: 	Dimitrios Filippou <Dimitrios.Filippou at gmail.com>, Mojca Miklavec 
<mojca.miklavec.lists at gmail.com>



Dear friends,
I deleted the other messages about this thread, but this message is the 
first positive result in order to have correct hyphenation also when the 
LICR precomposed accented Greek characters are entered in the input file.

Let's recall the actual situation:

The babel/polyglossia side of the Greek hyphenation contains three 
loaders and 6 pattern files; three with unicode greek characters and the 
corresponding three ASCII encoded ones. Every triplet contains the 
patterns for modern monotonic greek, modern polytonic greek and ancient 
polytonic greek.

The ones with unicode characters are used in the creation of formats for 
Unicode aware engines, such as xetex and luatex, and others.

The ASCII encoded pattern files are used for creating formats for 8-bit 
aware engines (such as pdftex and the like) but they miss some important 
functionality; they contain patterns that are strictly ASCII and work 
with accented characters only when they arrive at the hyphenation 
algorithm as un-ligated ligatures, that will be ligated after the 
hyphenation process thanks to a feature of the font tfm files of the 
CBfonts, and of the other fonts with LGR encoding. Remember that I 
created those CBfonts when Unicode did not exist and most western users 
dealing with ancient greek had available only old keyboards that used to 
work with an encoding similar, but not equal, to the ISO 8859-1 (ISO 
Latin). Even today in this part of the world, in spite of more modern 
operating systems and programs, that use Unicode characters, the Latin 
transliteration is very helpful for anyone who does not run a Greek PC.

This "original sin" of my Greek fonts is that did work pretty well with 
the ligature mechanism; I had uploaded to CTAN (in the first times of 
existence of my fonts) a grhyph.tex file; at that time  I created it to 
be hopefully useful for the three incarnations of Greek. The first 
greek.ldf file just loaded it as *THE* only babel Greek language (at 
that time xetex and luatex did not exist); the only provision that 
Apostolos created was the use of the polutonikogreek language, that 
eventually gave rise to an attribute, that in turn can now be used as a 
modifier. With the advent of xetex and xelatex, Apostolos abandoned 
babel and pdflatex, and concentrated his activity on the best adaptation 
of the new engine to the benefit of the Greek users.

Meanwhile the Greek users decided that my patterns did an acceptable 
job, but even if hyphenation errors were very few, my patterns missed a 
lot of possible break points, in the effort of being equally 
(in)sufficient for the three Greek incarnations.

The actual 6 Greek patten files are the result of a synthesis performed 
by Dimitrios, but although they are loaded in the various format files, 
the babel-greek.ldf, opposite to the gloss-greek.ldf, does not use them, 
but uses only those for polytonic Greek.


Gunter added a lot of good parts to the initial scheme of greek.ldf, and 
to the transofrmation of characters in a general way, the LICR method, 
so as to let the users keep using the transliteration or enter the greek 
text with greek glyphs; the last edition of the greek.ldf file works 
also when babel is used with xelatex and lualatex, but he did not enter 
in the problem of changing hyphenation as it is done with polyglossia.

In order to have things working correctly two actions were considered 
necessary.
1) To let babel-greek.ldf use suitable attribute/modifiers in order to 
change the infix words, but also in order to change hyphenation schemes.
2) To modify the ASCII pattern files in order to take into account non 
only the ligatures but also  the direct input of the precomposed LGR 
encoded accented characters.

Now I have done this draft work, contained in the attached zip file.

1a) I have added to the babel-greek.ldf  the ancient attribute/modifier.
1b) I have added the necessary mechanism to change hyphenation language, 
so as to accomodate to the default language (monotonic) and the two 
variants: polytonic and ancient.
1c) I have added the lower case codes for all the characters for code 
points 127 < codepoint < 256.
1d) I have modified the documentation test to explain what I thought in 
the necessity a sort of explanation for the user.
1e) the \ProvideLanguage statement has been updated so that it is easy 
to distinguish it from previous versions.

Remember it is just the patch that Günter was asking for; it is pretty 
complete, but I tested it only with the default language (monotonic). It 
must be completed.

2a) I created the grmhyph6.tex pattern file for monotonic Greek that in 
my opinion should substitute the grmhyph5.tex file; but I changed name 
in order to avoid confusion.
2b) The patterns containing ligatures and some other patterns with 
single vowels, have been duplicated and in the duplicated version, the 
ligature has been changed with the double caret hexadecimal code point 
of the character of the precomposed character. For ease of 
interpretation not only the file starts with some comment lines where 
the double caret hexadecimal code points are spelt out in plain TeXnical 
English/Greek but there is also a PDF file containing the table of all 
LGR encoded Greek glyphs.

3a) I have set up a small testing file in order to test the correctness 
of the hyphenation patterns at least in a short sentence with my little 
package testhyphens.sty (the little package with which I discovered that 
things were not running as they should have done with the actual 
situation, and that triggered the start of this thread).
3b) The output of the test file is also included, so you see that 
hyphnation of monotonic Greek is now working also with the initial input 
of Unicode Greek text, in spite of using pdflatex.

This set of tentative files is just a starting point, from which you 
caan take off to finish the job.

If you want to test what I did, you have to install the new pattern file 
grmhyph6.tex either by changing its name to grmhyph5.tex and replacing 
the original grmhyph5.tex; rebuilding the pdflatex format file, and then 
running the TestTesthyphensInGreek.tex file; you obviously may modify 
according to what you want to do. I discourage this procedure, while I 
encourage the one of creating a language-local.dat file in the 
texmf-local/ tree, according with what is described in the tlmgr 
documentation when the command *tlmgr generate* is looked up (not easy 
and error prone procedure if you have never done these things before, 
but it is within a normal user range, it does not require a "wizard" 
certification to do it). This implies the creation also of a 
loadhyph-el-monoton6.tex loader of the grmhyph6.tex file. Read carefully 
the documentations because the difficult point is to save the various 
files in the proper subdirectories of the texmf-local TeX tree. Then you 
have to create a new format pdflatex.fmt file, with the instruments of 
your distribution; you have to do it as a superuser/administrator/root 
and by means of the fmtutil-sys executable (read the documetation, if it 
is the first time you use it).

Conclusion: adding the double caret point codes of the second half page 
of the LGR encoded CBgreek fonts solves the problems I initially 
sreported. The approach is working and can be extended to the other two 
ASCII encoded ligature encoded Greek pattern hyphenation files.

What remains to be done:

1) Revise the babel-greek.dtx and pdf files in order to correct what is 
just inserted but it has  not been tested.

2) Revise the other two ASCII encoded hyphenation pattern files (for 
polytonic and ancient greek) in order to add the double caret 
hexadecimal codes of the precomposed glyphs of the LGR encoded fonts.

3) revise the system wide distribution of pattern files and possibly 
upgrage the loadhyph....tex file according to the actual names of the 
new files.

I see some work for the three of you. ;-)

Günter has to revise the babel-greek,dtd, and greek.ldf files.

Dimitrios has to add the missing double caret hexadecimal codes for the 
precomposed characters; Dimitrios, if ou know any scripting laguage 
sufficently intelligent to do the work, you need just the time that it 
takes to write the script and testing it.

Mojca, when the above actions are completed you probably have to see if 
anything has to be "retouched" in order to render it compliant with the 
actual mechanism thst is used to create format files for  8-bit aware 
typesetting engines.

The path is opened; now is up to you.

Thank you for reading the message until this end :-)

Claudio




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/tex-hyphen/attachments/20140727/42b999af/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MonotonicGreekForBabel.zip
Type: application/zip
Size: 477073 bytes
Desc: not available
URL: <http://tug.org/pipermail/tex-hyphen/attachments/20140727/42b999af/attachment-0001.zip>


More information about the tex-hyphen mailing list