[tex-hyphen] tug.ctan.org upload: Hyphenation Patterns in UTF-8

Vladimir Volovich vvv at vsu.ru
Sun Jun 29 22:27:15 CEST 2008


"MM" == Mojca Miklavec writes:

 >> --- /src/TeX/texlive-svn/Master/tlpkg/tlpsrc/hyphen-russian.tlpsrc
[...]
 >> What about that? The package ruhyphen ships loads of files:
[...]
 >> Can we delete the whole package? Are these files still necessary?
 >> 
 >> I.e., are all these files *ONLY* for the patterns, and not for
 >> actual processing later on?

 MM> I'm not sure. I need to take a look. We do not depend on these
 MM> files, but maybe there are some useful macros inside, and I suspect
 MM> that user is able to modify a file to specify which encoding to use
 MM> for patterns. There is one default, but we do not provide the same
 MM> mechanisms for switching the encoding. It's similar to Bulgarian
 MM> (which only ships with support for a single encoding by default),
 MM> but it might be that people need other files at runtime as
 MM> well. Arthur did the conversion, but ... I would leave the package
 MM> there. I'll add the dependency.

first of all, i'm happy to see this effort on cleaning up the
hyphenation patterns.

a few notes regarding russian patterns, comparing what is currently in
texlive repository in the hyph-utf8 and ruhyphen packages:

* ruhyphen package provides 7 different variants of patterns, made by
  different people: ruhyph{al,as,ct,dv,mg,vl,zn}.tex
  with the default ruhyphal.tex as giving probably the highest quality
  of hyphenation (but some people may prefer other patterns, that's why
  all of them are included into the ruhyphen package).
  hyph-utf8 contains just ruhyphal; i don't know if/how it is possible
  to include several pattern variants for one language (making it an
  option to select the pattern to the user).

* hyph-ru.tex includes just ruhyphal.tex re-encoded from koi8-r to
  utf-8, with some additional comments. but the patterns in ruhyphen
  package include more than that - see ruhyphen.tex
  namely, the important "missing bits" are:

  - additional patterns with the "cyrillic letter yo" contained in the
    file cyryoal.tex (they could be appended to the hyph-ru.tex, i guess)

  - additional patterns present in ruhyphen.tex in two lines with
    \patterns macros (they could also be appended to the hyph-ru.tex)

  - patterns generated by hypht2.tex which is similar to hypht1.tex
    ("\input hypht2" is present in ruhyphen.tex). i don't know what is
    the best way to include them. probably, if you want, i can provide
    "flat" file (not macro-generated) with patterns generated by
    hypht2.tex which are relevant to the russian language. then it could
    also be appended to hyph-ru.tex

Best,
v.


More information about the tex-hyphen mailing list