[tex-hyphen] Fwd: Re: missing hyphen points in monotonic greek
Claudio Beccari
claudio.beccari at gmail.com
Sun Jul 27 17:00:30 CEST 2014
I am registered to the group, but at the moment I do not have with me
my coordinates.
Mojca Miklavec suggested that I forward to the group the message that
follows. Here are her words:
First of all, I believe that it would be very helpful if Claudio would
send the initial long email (from 22nd of July) to the tex-hyphen
mailing list and it would be great if the discussion continued there.
It would be very helpful to have the discussion archived online. Could
this be done?
Very gladly I am forwarding the message, ad on my side I try to continue the discussion on this mailing list.
Meawhile it is worth noting that I am trying to do what I did with monotonic Greek in order to extend to LGR encoding the polytonic Grek version. If I succeed, I continue with the ancient Greek variety. But of course Dimitrios Filippou has to supervise the pattern files, and Günter Milde has to suoervise the update to the babel-greek source version of the greek.ldf file.
Claudio Beccari
-------- Original Message --------
Subject: Re: missing hyphen points in monotonic greek
Date: Tue, 22 Jul 2014 22:54:21 +0200
From: Claudio Beccari <claudio.beccari at gmail.com>
To: Guenter Milde <milde at users.sf.net>
CC: Dimitrios Filippou <Dimitrios.Filippou at gmail.com>, Mojca Miklavec
<mojca.miklavec.lists at gmail.com>
Dear friends,
I deleted the other messages about this thread, but this message is the
first positive result in order to have correct hyphenation also when the
LICR precomposed accented Greek characters are entered in the input file.
Let's recall the actual situation:
The babel/polyglossia side of the Greek hyphenation contains three
loaders and 6 pattern files; three with unicode greek characters and the
corresponding three ASCII encoded ones. Every triplet contains the
patterns for modern monotonic greek, modern polytonic greek and ancient
polytonic greek.
The ones with unicode characters are used in the creation of formats for
Unicode aware engines, such as xetex and luatex, and others.
The ASCII encoded pattern files are used for creating formats for 8-bit
aware engines (such as pdftex and the like) but they miss some important
functionality; they contain patterns that are strictly ASCII and work
with accented characters only when they arrive at the hyphenation
algorithm as un-ligated ligatures, that will be ligated after the
hyphenation process thanks to a feature of the font tfm files of the
CBfonts, and of the other fonts with LGR encoding. Remember that I
created those CBfonts when Unicode did not exist and most western users
dealing with ancient greek had available only old keyboards that used to
work with an encoding similar, but not equal, to the ISO 8859-1 (ISO
Latin). Even today in this part of the world, in spite of more modern
operating systems and programs, that use Unicode characters, the Latin
transliteration is very helpful for anyone who does not run a Greek PC.
This "original sin" of my Greek fonts is that did work pretty well with
the ligature mechanism; I had uploaded to CTAN (in the first times of
existence of my fonts) a grhyph.tex file; at that time I created it to
be hopefully useful for the three incarnations of Greek. The first
greek.ldf file just loaded it as *THE* only babel Greek language (at
that time xetex and luatex did not exist); the only provision that
Apostolos created was the use of the polutonikogreek language, that
eventually gave rise to an attribute, that in turn can now be used as a
modifier. With the advent of xetex and xelatex, Apostolos abandoned
babel and pdflatex, and concentrated his activity on the best adaptation
of the new engine to the benefit of the Greek users.
Meanwhile the Greek users decided that my patterns did an acceptable
job, but even if hyphenation errors were very few, my patterns missed a
lot of possible break points, in the effort of being equally
(in)sufficient for the three Greek incarnations.
The actual 6 Greek patten files are the result of a synthesis performed
by Dimitrios, but although they are loaded in the various format files,
the babel-greek.ldf, opposite to the gloss-greek.ldf, does not use them,
but uses only those for polytonic Greek.
Gunter added a lot of good parts to the initial scheme of greek.ldf, and
to the transofrmation of characters in a general way, the LICR method,
so as to let the users keep using the transliteration or enter the greek
text with greek glyphs; the last edition of the greek.ldf file works
also when babel is used with xelatex and lualatex, but he did not enter
in the problem of changing hyphenation as it is done with polyglossia.
In order to have things working correctly two actions were considered
necessary.
1) To let babel-greek.ldf use suitable attribute/modifiers in order to
change the infix words, but also in order to change hyphenation schemes.
2) To modify the ASCII pattern files in order to take into account non
only the ligatures but also the direct input of the precomposed LGR
encoded accented characters.
Now I have done this draft work, contained in the attached zip file.
1a) I have added to the babel-greek.ldf the ancient attribute/modifier.
1b) I have added the necessary mechanism to change hyphenation language,
so as to accomodate to the default language (monotonic) and the two
variants: polytonic and ancient.
1c) I have added the lower case codes for all the characters for code
points 127 < codepoint < 256.
1d) I have modified the documentation test to explain what I thought in
the necessity a sort of explanation for the user.
1e) the \ProvideLanguage statement has been updated so that it is easy
to distinguish it from previous versions.
Remember it is just the patch that Günter was asking for; it is pretty
complete, but I tested it only with the default language (monotonic). It
must be completed.
2a) I created the grmhyph6.tex pattern file for monotonic Greek that in
my opinion should substitute the grmhyph5.tex file; but I changed name
in order to avoid confusion.
2b) The patterns containing ligatures and some other patterns with
single vowels, have been duplicated and in the duplicated version, the
ligature has been changed with the double caret hexadecimal code point
of the character of the precomposed character. For ease of
interpretation not only the file starts with some comment lines where
the double caret hexadecimal code points are spelt out in plain TeXnical
English/Greek but there is also a PDF file containing the table of all
LGR encoded Greek glyphs.
3a) I have set up a small testing file in order to test the correctness
of the hyphenation patterns at least in a short sentence with my little
package testhyphens.sty (the little package with which I discovered that
things were not running as they should have done with the actual
situation, and that triggered the start of this thread).
3b) The output of the test file is also included, so you see that
hyphnation of monotonic Greek is now working also with the initial input
of Unicode Greek text, in spite of using pdflatex.
This set of tentative files is just a starting point, from which you
caan take off to finish the job.
If you want to test what I did, you have to install the new pattern file
grmhyph6.tex either by changing its name to grmhyph5.tex and replacing
the original grmhyph5.tex; rebuilding the pdflatex format file, and then
running the TestTesthyphensInGreek.tex file; you obviously may modify
according to what you want to do. I discourage this procedure, while I
encourage the one of creating a language-local.dat file in the
texmf-local/ tree, according with what is described in the tlmgr
documentation when the command *tlmgr generate* is looked up (not easy
and error prone procedure if you have never done these things before,
but it is within a normal user range, it does not require a "wizard"
certification to do it). This implies the creation also of a
loadhyph-el-monoton6.tex loader of the grmhyph6.tex file. Read carefully
the documentations because the difficult point is to save the various
files in the proper subdirectories of the texmf-local TeX tree. Then you
have to create a new format pdflatex.fmt file, with the instruments of
your distribution; you have to do it as a superuser/administrator/root
and by means of the fmtutil-sys executable (read the documetation, if it
is the first time you use it).
Conclusion: adding the double caret point codes of the second half page
of the LGR encoded CBgreek fonts solves the problems I initially
sreported. The approach is working and can be extended to the other two
ASCII encoded ligature encoded Greek pattern hyphenation files.
What remains to be done:
1) Revise the babel-greek.dtx and pdf files in order to correct what is
just inserted but it has not been tested.
2) Revise the other two ASCII encoded hyphenation pattern files (for
polytonic and ancient greek) in order to add the double caret
hexadecimal codes of the precomposed glyphs of the LGR encoded fonts.
3) revise the system wide distribution of pattern files and possibly
upgrage the loadhyph....tex file according to the actual names of the
new files.
I see some work for the three of you. ;-)
Günter has to revise the babel-greek,dtd, and greek.ldf files.
Dimitrios has to add the missing double caret hexadecimal codes for the
precomposed characters; Dimitrios, if ou know any scripting laguage
sufficently intelligent to do the work, you need just the time that it
takes to write the script and testing it.
Mojca, when the above actions are completed you probably have to see if
anything has to be "retouched" in order to render it compliant with the
actual mechanism thst is used to create format files for 8-bit aware
typesetting engines.
The path is opened; now is up to you.
Thank you for reading the message until this end :-)
Claudio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/tex-hyphen/attachments/20140727/42b999af/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MonotonicGreekForBabel.zip
Type: application/zip
Size: 477073 bytes
Desc: not available
URL: <http://tug.org/pipermail/tex-hyphen/attachments/20140727/42b999af/attachment-0001.zip>
More information about the tex-hyphen
mailing list