[tex-hyphen] missing hyphen points in Greek

Guenter Milde milde at users.sf.net
Sun Jul 27 22:51:01 CEST 2014


Dear TeX hyphenators,


On 27.07.14, Mojca Miklavec wrote:

...

> > Is setting the lccode of a character to itself the "normal" way for small
> > letters?

> Yes. That always needs to be done. Usually you don't need to do it for
> latin scripts since LaTeX (and probably also plain TeX) already does
> that for you, at least for the ascii range. XeTeX also sets the codes
> for more or less the whole Unicode, I think. 

BTW: This is done by polyglossia (and since version 1.5 also by babel)
via an excerpt from Apostolos' "xgreek" package, the file
xgreek-fixes.def.
However, this file is derived from an older version of xgreek.sty and
misses some fixes done in 

  Version 2.1 of package xgreek
  
  I have introduced some new \lccode-\uccode pairs that
  reflect current changes in Unicode 5.2 while I have corrected the
  values for an existing pair. 

It would be good if "polyglossia" could ship an updated xgreek-fixes.def.

...


> > * The hyph-utf8 package has conversion rules for several 8-bit TeX font
> >  encodings. Currently not for LGR but this could/should be changed.

> I would be happy to accept patches. I'm not competent enough in TeX
> (as the Turing complete programming language) to write the conversion
> myself.

Would it be sufficient to provide a data file similar to
the ones in  hyph-utf8/source/generic/hyph-utf8/data/encodings ?

It should be relatively easy to produce a file data/encodings/lgr.dat
from the CB-Fonts' CB.enc.

Is the format of the *.dat files documented?

A problem might be that some pre-composed Unicode characters (accented
capital Greek letters) are represented by two characters in LGR.


> > The hyph-utf8 package shows that an automatic transcoding of the
> > hyphenation pattern files is possible. I hope a cooperation between
> > Dimitrios and Mojca will be able to overcome obstacles.

> See above. I'm not saying it isn't possible, but I don't think it's
> worth the effort (and it's awfully ugly code, for whoever is willing
> to come up with it). In particular there's not much point in doing
> on-the-fly conversion because we ended up doing external conversion
> for the sake of pTeX anyway.

So, it may be better to provide a conversion script from Greek Unicode
hyphenation patterns to LGR-encoded ones in a modern script language.

Alternatively, Claudio is working on a "hand conversion" of the patterns.
Any thoughts?

Günter




More information about the tex-hyphen mailing list