[tex-hyphen] More on hyphenating Ancient Greek.

Claudio Beccari claudio.beccari at gmail.com
Thu Nov 13 11:24:50 CET 2014

The problem is not LaTeX, but the program used for transforming XML 
source into LaTeX code.
The additional characters and the scholarly emendations are dealt with 
by LaTeX by means of package teubner. As far as I can say the teubner 
macros does not interfere with hyphenation more or less than any other 
macro; in the sense that any macro may interfere with hyphenation when 
the text is sent to the hyphenation algorithm still contains 
unexpandable tokens. With latin script, for example, and the OT1 default 
font encoding, writing \`a (or even à  when using the suitable option to 
the inputenc package) remains as \accent18a when the text is sent to the 
hyphenation algorithm; this algorithm considers a valid word only 
something made up with character tokens with a positive lccode: \accent, 
1, and 8 have non positive lccode, therefore the "LaTeX word" stops 
before the accented letter, and the rest of the word string is discarded 
for hyphenation until a new valid word start is encountered.

This is not dependent on which typesetting engine is used (pdftex, 
xetex, etc.) it depends on the hyphenation algorithm, explained in 
appendix H of the TeXbook.

For what concerns Greek your problem probably persists even if you use 
OpenType fonts, instead of the LGR encoded ones; with the latter ones 
the round and angle brackets are mappedto other chars and interfere with 
hyphenation. With OpenType fonts it is possible that assigning a 
positive \lccode to round and angle brackets hyphenation is still 
possible, but with unexpected results.


On 13/11/2014 10:39, Philip Taylor wrote:
> In the work in progress, various stretches of ancient Greek text have 
> additional characters interpolated into them to indicate (the nature
> of) scholarly emendations made.  For example, the XML input :
>     <Other_Notes>f.≈Br:<image status="active" source="L40.2-G5-[B1]" 
> callout="Other_Notes"></image> “<foreign language="Greek">Σωσον 
> Κ<expan>ύρι</expan>ε τῶν λα<expan>ον</expan> σου καὶ ευλογησον τὴν 
> κλ<supplied>η</supplied>ρονομια<supplied>ν</supplied> σου νίκας τῆς 
> βα<supplied>σιλεύσι</supplied></foreign>”; ... </Otyher notes>
> will, after TeX's macro expansion, yield (in part) :
>     f.(kern)Br: Σωσον Κ(ύρι)ε τῶν λα(ον) σου καὶ ευλογησον τὴν 
> κλ<η>ρονομια<ν> ...
> Empirically, it would seem that the presence of the interpolated round 
> and angle brackets affects TeX's ability to hyphenate such stretches 
> of text; could the hyphenation experts suggest a work-around, please ?
> ** Phil.

More information about the tex-hyphen mailing list