[tex-hyphen] hyphenation group

Hans Hagen pragma at wxs.nl
Mon Aug 30 17:48:09 CEST 2004

Kevin Patrick Scannell wrote:

>Hi everyone,
>    I'm glad to be participating on this list.  
>       One initial thought: it would be nice to solidify the CTAN
>   archive as the official/authoritative upstream source of
>   hyphenation patterns.   A lot of effort is going directly
>   into OpenOffice and the like and it would be a shame to have
>   the TeX patterns become out-of-date.
>   In part this will mean (1) clarifying the licenses
>   for all existing files (this has been a non-trivial issue 
>   e.g. with Apache FOP) (2) converting to the XML format using
>   by OpenOffice, etc. and offering those files in addition to the
>   xxhyph.tex files.
>   I have an ad hoc script for doing (2) that I used for Irish.
>   I'll try and generalize it and test on some of the patterns
>   in CTAN.
Before we jump into all kind of actions, let's strart by clarifying the reason for this list. 

- currently patterns for tex live are taken from ctan or mails from pattern authors

- the names as well as the internals of the patterns are not standardized, and in some cases not generic 

- some pattern files en dup in different places in the tex tree 

- pattern files got renames, changes etc without keeping history in mind, which (when users want to reprocess old documents) may give unexpected results in typesetting 

- there are occasional developments with regards to hyphenation patterns  (i.e algorithms) that never make it into tex's, which is a pitty 

Now, in order to get thismess sorted out, the followingis on the agenda: 

- collect patterns per language (dialect) and normalize the them 

- cook up a sort of standard (generic) file format for their tex instance 

- create a set of test files so that distributions can test their usability and working 

- set up a consistent naming scheme for patterns, exceptions and auxiliary files (some pattern files share files that set up lccodes)

- keep copies of old pattern collections under a well defined naming scheme 

- collect resources about patterns 

Concerning the licence, we should avoid to open a can of worms (for the moment, i'm happy with everything that suits the tex live team) 

Now, along the lines of Kevin mentions, we can look into 

- alternative pattern collections
- providing pattern files for other applications, like open office 

We can also collect word-lists etc. 

but i think that we should first get the tex part right. Actually this is not that complicated, as long as we obey some rules that we need to define, mainly: 

- names 
- structure 

One of the rationales behind this 'project' is that instead of forcing pattern authors to deliver, we just take and convert and provide the result to the tex live team. My guess is that most pattern authors will step in, but if some don't, well, we'll derive from their work. We should avoid politics. 

About names and structure: 

- how to deal with changes in languages (spelling)  
- how to deal with dialects 

Maybe Steve Peter has ideas about how such things can be grouped? 


                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl

More information about the tex-hyphen mailing list