[XeTeX] XeTeX, ConTeXt, and utf-8 hyphenation patterns.

Hans Hagen pragma at wxs.nl
Tue Jun 13 09:25:16 CEST 2006

Peter Heslin wrote:
> A little while ago, I said that I hoped to convert Dimitrios Filippou's
> ancient Greek hyphenation patterns (the elhyphen package) to utf-8, in
> order to use them with xetex.  Before thinking about starting this work,
> I decided to look to see if anyone else had done it, and I came across
> something interesting in ConTeXt, which is not a package I normally use.
> There appears to be a whole subdirectory in the ConTeXt distribution
> that is full of utf-8 hyphenation patterns, including Filippou's ancient
> Greek ones, but also including German, French, etc.  They are in the
> file: http://www.pragma-ade.com/context/current/cont-tmf.zip, in the
> tex/context/patterns directory.
> Can anyone who knows about ConTeXt explain about where these patterns
> come from and how it is that context manages to use these patterns?  (I
> thought that non-xetex TeX could only use single-byte encoded patterns.)
some time ago i decided to ship patterns with context because

(1) there is no sound infrastructure in the tex world for managin gpatterns
(2) i need encoding neutral patterns [most patterns are ec only]
(3) i want control over what gets loaded in context
(4) i wanted to get rid of every year's disappearing, renamed, changed 
(5) apart from the fact that i wanted patterns that were not in a sense 
hard wired latex patterns
> If there is a script that was used to convert these from the source to
> utf-8, is it available?  A quick glance at the ancient greek patterns
> (in the file lang-agr.pat) shows that there is a bug in the conversion
> that I'd like to report and fix.
ctxtools --pat             [en nl agr ...]
ctxtools --pat --utf    [en nl agr ...]

the greek conversions were done with the help of a greek language users 
on the context list, so in case of troubles, so i cc there; bugs need to 
be fixed indeed

in ctxtools.rb you can grep for 'agr' and see what conversions takes 
place for greek

more info can be found in:


(also published in tugboat)

there is a file lang-all.xml in the context distribution

> On a more general level, if both ConTeXt and XeTeX are engaged in
> converting legacy TeX hyphenation patterns to utf-8, should they be
> coordinated in order to avoid duplication of effort?
anyone can use the patterns; of course bugs need to be sorted out, but 
given my experiences with pattern maintainance i will not drop them from 
context; too much has gone wrong in the past; but you can consider them 
to be generic so indeed we can avoid duplication of work.


