[tex-hyphen] Re: hyphenation group

Per Starback starback at ling.uu.se
Thu Mar 3 12:48:29 CET 2005

I recently joined the list (I see I was mentioned here recently
because of that -- nice that I could revive this list without even
having joined yet :-)
and have read through the list archive. There was Hans Hagen
(2004-08-30) clarifying that the reason for this list primarily is to
standardize names and internals of existing patterns. (I hope
other aspects of hyphenation in TeX also are welcome here, though.)

Some discussion ensued on what elements to have in the filenames.
Mentioned were

* language (according to ISO 639 or the Ethnologue codes)
* dialect
* orthography
* revision

In one of the later message Hans Hagen wrote (2004-09-01):

> >  german-austrian-old-20040806
> >  
> >
> we will run out of old new newer latest etc so we should think of 
> another tag there, in dutch we use years for that, so it then coul dbe
> dutch-.....-1996-20040816
> with the last date as version date

That recap is probably enough even though this was some time ago.

Here are my views:

* There can be several sets of hyphenation patterns for the same
  language, same dialect, written with the same orthography.
  There are cases when there are simply different opinions on how
  to hyphenate, and if there are hyphenation patterns corresponding to
  the different opinions all of them should be available to the user.
  (Of course it can be a hard choice which one to make the default for
  users who only specify what language their documents are written in,
  without indicating any preferences.)

* I don't think that distinction means there should be yet one more
  element in the filenames. Contrarily I think there doesn't have to
  be many fixed elements in them. There is no need to have an
  obligatory "orthography" field just because it would be useful for
  some languages.

* As for language information I think the most natural would be to use
  Internet standard RFC 3066. Since it uses any number of subtags can
  be used and later tags can mean anything they can be used for any
  additional information needed. So we could use


  for patterns for the new German orthography of 1996, and


  for a hypothetical set of patterns mimicking the hyphenation in
  American Heritage Dictionary, for example.

  I think following RFC 3066 makes a collection of hyphenation
  patterns most useful for various other applications implementing the
  TeX hyphenation for showing various texts, since that's how texts
  on the net tend to be tagged when they are tagged by language at

* I think this is better than haven a fixed fine-grained division of
  the filenames with fixed positions where dialect, orthography etc.
  always must be present. When there are several patterns for the
  same language there may be several reasons. They may be for
  different countries, different orthographies, correspond to
  different authorities on hyphenation, be of different sizes (a
  larger set being better, but larger), etc.

* Revisions:

  "% The Plain TeX hyphenation tables [NOT TO BE CHANGED IN ANY WAY!]"

  I think maybe there has been to much emphasis on keeping
  old hyphenation rules around so it's possible to recreate old documents
  in exactly the same way.  Of course that is a valid concern, but it's
  not something very special for hyphenation patterns, but goes for all
  files you load.  For almost every package I use in a TeX document it's
  conceivable that a slight change in it might cause a non-expected
  change my document. So maybe this problem should be addressed in a
  more general way than only for hyphenation patterns?

  There is at least one difference though. Very cautious users could
  make time-capsules to go with their documents where they have copies
  of all packages they use. What's special with hyphenation patterns
  is that since only INITEX can load them it's not that easy.
  When people talk of keeping old versions of hyphenation patterns
  around, how is it supposed to work in practice?
  Let's say we have de-DE-1996-20010507 and that is used for a
  user specifying language "de-DE-1996" or just "de-DE" or maybe
  even just "de". (Assuming the defaults are chosen that way.)
  (Or "german".)
  The user gets an informational messages during TeXing that says
  something like
    to keep the exact same hyphenation as you get now, please change
    "de-DE" to "de-DE-1996-20010607" in your file, please!
  Is this how it's supposed to work? (Or maybe no message, because
  casual users shouldn't be bothered and advanced users who need this
  will know it?)
  Then comes a new version de-DE-1996-20051111, but if users made
  that change their documents won't be changed by this if it is TeXed

  But that assumes that the updated TeX installation will have TeX
  formats built with the old patterns as well as the new.  Will it
  really be so? Probably not really old versions?

  I have nothing against keeping old versions, I just wonder if it
  really will help much with the perceived problem. Most times it
  would probably be easier to change the document as needed than to
  build one's own TeX format to use an older version of a set of
  patterns.  I've had hyphenations in a document change because of
  changes in Babel, but if I had had to change the TeX document to
  make it come out the same my first thought wouldn't be to change it
  to load an older version of Babel.

* In an ideal world I think the restriction that only INITEX can load
  hyphenation patterns should be lifted. I don't know, but I guess
  that restriction is there because of performance problems that
  probably aren't an issue anymore.

  Then the cautious users could load their own patterns from their
  time capsules too.

  And then a better more general way to address the problem of having
  several versions of packages etc. could be used for hyphenation
  patterns as well as other files.
Per Starb\"ack <starback at ling.uu.se>

More information about the tex-hyphen mailing list