[tex-hyphen] Latin Hyphenation when using utf8
Mojca Miklavec
mojca.miklavec.lists at gmail.com
Tue Jun 22 14:51:23 CEST 2010
Dear Andrew,
Before answering on anything else, here are a few points that I would
like to make (sorry for top-posting):
1.) VERY VERY VERY important: use XeTeX or LuaTeX; with pdfTeX you
won't be able to handle macrons and hyphenation properly; even if
Claudio would fix the patterns, they wouldn't help you at all (I can
explain it why, but the basic problem is that TeX uses 256 characters
and I suspect that no common encoding like T1, texnansi, qx, cs has
letters with macrons present; of course you may fix that yourself, I
could even send you some instructions how to do it, but I can promise
you that it will never be supported in TeX Live or MikTeX). Arthur, if
I'm wrong about the macrons, please correct me. I didn't really check
it.
2.) It is quite possible that once you start using XeTeX, most of the
problems you see now will disappear (but definitely not all of them).
3.) This is the answer that we got from Petr Sojka (for some unrelated problem):
> \hyphequiv table might be the right way of doing that (it was suggested
> some 13 years ago) to make patterns independent of font encodings.
> Anyway, for the purpose of unifying char positions just for hyphenation
> and not lowercasing, one can use etex's \savinghyphcodes macro:
> see sec. 3.10 of http://www.tug.org/teTeX/texmf-dist/doc/etex/base/etex-man.pdf
If you compose a table of all equivalent letters, for example saying
that "a is equal to amacron" etc., ... we can support that without
changing hyphenation patterns.
Anyway ... please report if you will manage to use XeLaTeX instead of
pdfLaTeX (using a proper font, at least try the Latin Modern or
Gentium or some other font on your system that supports those
characters) and how many problems still remain after that ... we can
then continue the discussion from that point on.
Mojca
On Tue, Jun 22, 2010 at 14:08, Claudio Beccari
<claudio.beccari at gmail.com> wrote:
> Dear Andrew,
> I am very impressed by your plan to write a book in/on Latin for your senior
> year high school students.
>
> The Latin patterns I made up by hand were based on the common rules used in
> Roman Catholic church printing practice, where the acute accent is used (see
> my ecclesiastic.sty package); sometimes the ligatures ae and oe are used,
> but macrons and breves are never used.
>
> My approach to writing hyphenation patterns was to ingore patterns
> containing diacritically marked letters; my Italian patterns do not contain
> any pattern with accented vowels; my Greek paterns for classical Greek (now
> obsolete) do not contain any pattern with accented letters. I had to cope
> with very few accented letters in the Coptic patterns in both the Sahidic an
> Bohairic dialects, because both dialects use accents also over some
> consonants.
>
> The idea behind this is that patterns may often be made up by using only
> consonants. They work well if the accented vowels are transformed into a
> single character code (8 bit character code) by some sort of ligature system
> or by smart accent macros such as those implied by the T1 encoding for Latin
> scripts or by the definitions contained in the greek.ld file for Greek (or
> similar actions defined in the Coptic packages).
>
> With utf8 of course you have to use the specific option to the inputenc
> package and, possibibly you have to load another helper sty file before
> invoking the inputenc one. But, even if this apparently works fine with
> western modern languages, it probably requires many additions to the Latin
> hyphenation pattern file in order to be sure that the macron or breve marked
> five vowels are suitably taken care of. The point is that such vowels
> carrying a macron or a breve are not represented with a single byte
> character code, but require at least two bytes.
>
> Now, since in Italian and in the other modern western languages that I use
> (in spite of having a Mac with MacTeX and TeXShop, the latter initially
> preset to use unicode [and reset to ISO Latin 1]) I don't need anything else
> but the ISO latin 1 single byte input encoding, I have never examined the
> possibility of extending the Latin hyphenation patterns the way it might be
> useful for your purpose. Sorry.
>
> I might give it a try, but I am so ignorant about unicode that I might not
> succeed. Give me some time, please, to do some experiments. I have to get
> acquainted with the new hyphenation system that splits the "old" hyphenation
> files in two different ones: the "loader" that contains the specific
> language definitions and shortcuts, and the real "pattern file" that
> contains only the patterns.
>
> I'll get in touch soon. Please, I need some time to test the ideas I already
> have in mind.
>
> Claudio
>
> Andrew Gollan wrote:
>
> I had sent this to claudio.beccari at gmail.it which bounced.
>
> Andrew Gollan
> "bis vincit qui se vincit"
> Latin - Henry Clay HS
>
>
> ---------- Forwarded message ----------
> From: Andrew Gollan <tharoth at hypalonia.com>
> Date: 21 June 2010 20:18:29 UTC-4
> Subject: Latin Hyphenation when using utf8
> To: babel <babel at braams.xs4all.nl>, "claudio.beccari"
> <claudio.beccari at gmail.it>
>
>
> Gentlemen,
>
> let me thank you in advance for your excellent attention to Latin in babel
> these many years. I am in the process of adding macrons to a book written
> entirely written in Latin, in order to use it in my 3rd year Latin high
> school class. I am entering the letters directly on my mac without recourse
> to the '=' trick. I find that the hypenation breaks nastily when I do this,
> at least in some words. I did some research, and found that the way the
> hyphenation files are generated is beyond me. I upgraded to the latest
> MacTex to see if that was better and it did not change anything that I could
> see.
>
> What I want to achieve is that words with macrons or breves are treated as
> identical to their unaccented equivalents in terms of hyphenation. Could you
> point me at the resources I would need to acheive this? I have a CS
> background, so I can probably come to understand it with the right
> information.
>
> Andrew Gollan
> "bis vincit qui se vincit"
> Latin - Henry Clay HS
More information about the tex-hyphen
mailing list