[XeTeX] XeTeX, ConTeXt, and utf-8 hyphenation patterns.

Peter Heslin pj at heslin.eclipse.co.uk
Tue Jun 13 00:11:07 CEST 2006

A little while ago, I said that I hoped to convert Dimitrios Filippou's
ancient Greek hyphenation patterns (the elhyphen package) to utf-8, in
order to use them with xetex.  Before thinking about starting this work,
I decided to look to see if anyone else had done it, and I came across
something interesting in ConTeXt, which is not a package I normally use.

There appears to be a whole subdirectory in the ConTeXt distribution
that is full of utf-8 hyphenation patterns, including Filippou's ancient
Greek ones, but also including German, French, etc.  They are in the
file: http://www.pragma-ade.com/context/current/cont-tmf.zip, in the
tex/context/patterns directory.

Can anyone who knows about ConTeXt explain about where these patterns
come from and how it is that context manages to use these patterns?  (I
thought that non-xetex TeX could only use single-byte encoded patterns.)

If there is a script that was used to convert these from the source to
utf-8, is it available?  A quick glance at the ancient greek patterns
(in the file lang-agr.pat) shows that there is a bug in the conversion
that I'd like to report and fix.

On a more general level, if both ConTeXt and XeTeX are engaged in
converting legacy TeX hyphenation patterns to utf-8, should they be
coordinated in order to avoid duplication of effort?

Peter Heslin (http://www.dur.ac.uk/p.j.heslin)

