[tex-hyphen] tug.ctan.org upload: Hyphenation Patterns in UTF-8

Vladimir Volovich vvv at vsu.ru
Mon Jul 14 22:54:12 CEST 2008


"AR" == Arthur Reutenauer writes:

 AR> 	Hello Vladimir, I added it to the SVN repository but I didn't
 AR> run any test, apart for format creation.  Maybe you can make some
 AR> yourself?  Note that you will need to set trie_size higher if you
 AR> use TeX Live 2007 (it's 300k there; TeX Live 2008 has raised it to
 AR> 600k which is enough, and 500k went well for me).

i'm not sure if these extra patterns (from exhyph-ru.tex and
exhyph-uk.tex) are useful in utf-8 case (i.e. for xetex).

pre-requisites for these patterns are

* the font encoding should contain an additional hyphen character
  (like, for example slot 127 in T1 and T2A encodings, in addition to
  slot 45)

* let <h1> be the plain hyphen (slot 45), and <h2> an additional hyphen
  (e.g. slot 127 in T1 encoding), then there should be ligatures:
  <h1> <h1> => <endash> % usual one
  <endash> <hyphen> => <emdash> % usual one
  <h1> <h2> => <h2> % required for this mechanism to work properly!
  <endash> <h2> => <endash> % optional
  <emdash> <h2> => <emdash> % optional

the T1 and T2* encodings have both of these features except the last 2
optional ligatures. in xetex, some fonts may have an additional hyphen
character, and xetex supports mapping mechamism which allows essentially
to add ligatures on the fly. so it may support this as well.

but the reason i wrote the above is that this "exhyph" stuff was
primarily intended for the 8-bit case, as i didn't experiment with it in
xetex. i will try to do some tests. in any case these additional
patterns should not do any harm.

Best,
v.


More information about the tex-hyphen mailing list