[tex-hyphen] Latin Hyphenation when using utf8

Andrew Gollan tharoth at hypalonia.com
Tue Jun 22 23:16:32 CEST 2010

grātiās plūrimās vōbīs agō

This almost completely solved my problem (though I confess I haven't looked
through for nasty hyphenations yet). I ran xelatex on my input file with
only a couple of minor mods for font handling:
\usepackage{palatino} => \usepackage{fontspec}\setromanfont{Palatino

But xelatex seems to just happily proceed when it doesn't have a glyph. In
the old scheme I could put a macon on 'y' to make 'ȳ' and it came out in the
wash even though the font did not directly contain it. My new book had gaps
where the 'ȳ' should be. Any quick hints on how to understand/control glyph
substitution in the brave new (to me) world of XeTeX?

I have of course worked around the problem with '\=y' for now.

My first reaction would be just to say:
amacron = abreve = a
Amacron = Abreve = A

I don't think anyone ever considered differently hyphenating any of the
macron or breve letters from their base forms. Claudio might have more
knowledge of this.

Andrew Gollan
"bis vincit qui se vincit"
Latin - Henry Clay HS

On 22 June 2010 08:51, Mojca Miklavec <mojca.miklavec.lists at gmail.com>wrote:

> Dear Andrew,
> Before answering on anything else, here are a few points that I would
> like to make (sorry for top-posting):
> 1.) VERY VERY VERY important: use XeTeX or LuaTeX; with pdfTeX you
> won't be able to handle macrons and hyphenation properly; even if
> Claudio would fix the patterns, they wouldn't help you at all (I can
> explain it why, but the basic problem is that TeX uses 256 characters
> and I suspect that no common encoding like T1, texnansi, qx, cs has
> letters with macrons present; of course you may fix that yourself, I
> could even send you some instructions how to do it, but I can promise
> you that it will never be supported in TeX Live or MikTeX). Arthur, if
> I'm wrong about the macrons, please correct me. I didn't really check
> it.
> 2.) It is quite possible that once you start using XeTeX, most of the
> problems you see now will disappear (but definitely not all of them).
> 3.) This is the answer that we got from Petr Sojka (for some unrelated
> problem):
> > \hyphequiv table might be the right way of doing that (it was suggested
> > some 13 years ago) to make patterns independent of font encodings.
> > Anyway, for the purpose of unifying char positions just for hyphenation
> > and not lowercasing, one can use etex's \savinghyphcodes macro:
> > see sec. 3.10 of
> http://www.tug.org/teTeX/texmf-dist/doc/etex/base/etex-man.pdf
> If you compose a table of all equivalent letters, for example saying
> that "a is equal to amacron" etc., ... we can support that without
> changing hyphenation patterns.
> Anyway ... please report if you will manage to use XeLaTeX instead of
> pdfLaTeX (using a proper font, at least try the Latin Modern or
> Gentium or some other font on your system that supports those
> characters) and how many problems still remain after that ... we can
> then continue the discussion from that point on.
> Mojca
> On Tue, Jun 22, 2010 at 14:08, Claudio Beccari
> <claudio.beccari at gmail.com> wrote:
> > Dear Andrew,
> > I am very impressed by your plan to write a book in/on Latin for your
> senior
> > year high school students.
> >
> > The Latin patterns I made up by hand were based on the common rules used
> in
> > Roman Catholic church printing practice, where the acute accent is used
> (see
> > my ecclesiastic.sty package); sometimes the ligatures ae and oe are used,
> > but macrons and breves are never used.
> >
> > My approach to writing hyphenation patterns was to ingore patterns
> > containing diacritically marked letters; my Italian patterns do not
> contain
> > any pattern with accented vowels; my Greek paterns for classical Greek
> (now
> > obsolete) do not contain any pattern with accented letters. I had to cope
> > with very few accented letters in the Coptic patterns in both the Sahidic
> an
> > Bohairic dialects, because both dialects use accents also over some
> > consonants.
> >
> > The idea behind this is that patterns may often be made up by using only
> > consonants. They work well if the accented vowels are transformed into a
> > single character code (8 bit character code) by some sort of ligature
> system
> > or by smart accent macros such as those implied by the T1 encoding for
> Latin
> > scripts or by the definitions contained in the greek.ld file for Greek
> (or
> > similar actions  defined in the Coptic packages).
> >
> > With utf8 of course you have to use the specific option to the inputenc
> > package and, possibibly you have to load another helper sty file before
> > invoking the inputenc one. But, even if this apparently works fine with
> > western modern languages, it probably requires many additions to the
> Latin
> > hyphenation pattern file in order to be sure that the macron or breve
> marked
> > five vowels are suitably taken care of. The point is that such vowels
> > carrying a macron or a breve are not represented with a single byte
> > character code, but require at least two bytes.
> >
> > Now, since in Italian and in the other modern western languages that I
> use
> > (in spite of having a Mac with MacTeX and TeXShop, the latter initially
> > preset to use unicode [and reset to ISO Latin 1]) I don't need anything
> else
> > but the ISO latin 1 single byte input encoding, I have never examined the
> > possibility of extending the Latin hyphenation patterns the way it might
> be
> > useful for your purpose. Sorry.
> >
> > I might give it a try, but I am so ignorant about unicode that I might
> not
> > succeed. Give me some time, please, to do some experiments. I have to get
> > acquainted with the new hyphenation system that splits the "old"
> hyphenation
> > files in two different ones: the "loader" that contains the specific
> > language definitions and shortcuts, and the real  "pattern file" that
> > contains only the patterns.
> >
> > I'll get in touch soon. Please, I need some time to test the ideas I
> already
> > have in mind.
> >
> > Claudio
> >
> > Andrew Gollan wrote:
> >
> > I had sent this to claudio.beccari at gmail.it which bounced.
> >
> > Andrew Gollan
> > "bis vincit qui se vincit"
> > Latin - Henry Clay HS
> >
> >
> > ---------- Forwarded message ----------
> > From: Andrew Gollan <tharoth at hypalonia.com>
> > Date: 21 June 2010 20:18:29 UTC-4
> > Subject: Latin Hyphenation when using utf8
> > To: babel <babel at braams.xs4all.nl>, "claudio.beccari"
> > <claudio.beccari at gmail.it>
> >
> >
> > Gentlemen,
> >
> > let me thank you in advance for your excellent attention to Latin in
> babel
> > these many years.  I am in the process of adding macrons to a book
> written
> > entirely written in Latin, in order to use it in my 3rd year Latin high
> > school class. I am entering the letters directly on my mac without
> recourse
> > to the '=' trick. I find that the hypenation breaks nastily when I do
> this,
> > at least in some words.  I did some research, and found that the way the
> > hyphenation files are generated is beyond me. I upgraded to the latest
> > MacTex to see if that was better and it did not change anything that I
> could
> > see.
> >
> > What I want to achieve is that words with macrons or breves are treated
> as
> > identical to their unaccented equivalents in terms of hyphenation. Could
> you
> > point me at the resources I would need to acheive this? I have a CS
> > background, so I can probably come to understand it with the right
> > information.
> >
> > Andrew Gollan
> > "bis vincit qui se vincit"
> > Latin - Henry Clay HS
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/tex-hyphen/attachments/20100622/9091caed/attachment.html>

More information about the tex-hyphen mailing list