[tex-hyphen] Re: hyphenation group
Per Starback
starback at ling.uu.se
Thu Mar 3 12:48:29 CET 2005
I recently joined the list (I see I was mentioned here recently
because of that -- nice that I could revive this list without even
having joined yet :-)
and have read through the list archive. There was Hans Hagen
(2004-08-30) clarifying that the reason for this list primarily is to
standardize names and internals of existing patterns. (I hope
other aspects of hyphenation in TeX also are welcome here, though.)
Some discussion ensued on what elements to have in the filenames.
Mentioned were
* language (according to ISO 639 or the Ethnologue codes)
* dialect
* orthography
* revision
In one of the later message Hans Hagen wrote (2004-09-01):
> > german-austrian-old-20040806
> >
> >
> we will run out of old new newer latest etc so we should think of
> another tag there, in dutch we use years for that, so it then coul dbe
>
> dutch-.....-1996-20040816
>
> with the last date as version date
That recap is probably enough even though this was some time ago.
Here are my views:
* There can be several sets of hyphenation patterns for the same
language, same dialect, written with the same orthography.
There are cases when there are simply different opinions on how
to hyphenate, and if there are hyphenation patterns corresponding to
the different opinions all of them should be available to the user.
(Of course it can be a hard choice which one to make the default for
users who only specify what language their documents are written in,
without indicating any preferences.)
* I don't think that distinction means there should be yet one more
element in the filenames. Contrarily I think there doesn't have to
be many fixed elements in them. There is no need to have an
obligatory "orthography" field just because it would be useful for
some languages.
* As for language information I think the most natural would be to use
Internet standard RFC 3066. Since it uses any number of subtags can
be used and later tags can mean anything they can be used for any
additional information needed. So we could use
de-DE-1996
for patterns for the new German orthography of 1996, and
en-US-AHD
for a hypothetical set of patterns mimicking the hyphenation in
American Heritage Dictionary, for example.
I think following RFC 3066 makes a collection of hyphenation
patterns most useful for various other applications implementing the
TeX hyphenation for showing various texts, since that's how texts
on the net tend to be tagged when they are tagged by language at
all.
* I think this is better than haven a fixed fine-grained division of
the filenames with fixed positions where dialect, orthography etc.
always must be present. When there are several patterns for the
same language there may be several reasons. They may be for
different countries, different orthographies, correspond to
different authorities on hyphenation, be of different sizes (a
larger set being better, but larger), etc.
* Revisions:
"% The Plain TeX hyphenation tables [NOT TO BE CHANGED IN ANY WAY!]"
I think maybe there has been to much emphasis on keeping
old hyphenation rules around so it's possible to recreate old documents
in exactly the same way. Of course that is a valid concern, but it's
not something very special for hyphenation patterns, but goes for all
files you load. For almost every package I use in a TeX document it's
conceivable that a slight change in it might cause a non-expected
change my document. So maybe this problem should be addressed in a
more general way than only for hyphenation patterns?
There is at least one difference though. Very cautious users could
make time-capsules to go with their documents where they have copies
of all packages they use. What's special with hyphenation patterns
is that since only INITEX can load them it's not that easy.
When people talk of keeping old versions of hyphenation patterns
around, how is it supposed to work in practice?
Let's say we have de-DE-1996-20010507 and that is used for a
user specifying language "de-DE-1996" or just "de-DE" or maybe
even just "de". (Assuming the defaults are chosen that way.)
(Or "german".)
The user gets an informational messages during TeXing that says
something like
to keep the exact same hyphenation as you get now, please change
"de-DE" to "de-DE-1996-20010607" in your file, please!
Is this how it's supposed to work? (Or maybe no message, because
casual users shouldn't be bothered and advanced users who need this
will know it?)
Then comes a new version de-DE-1996-20051111, but if users made
that change their documents won't be changed by this if it is TeXed
again.
But that assumes that the updated TeX installation will have TeX
formats built with the old patterns as well as the new. Will it
really be so? Probably not really old versions?
I have nothing against keeping old versions, I just wonder if it
really will help much with the perceived problem. Most times it
would probably be easier to change the document as needed than to
build one's own TeX format to use an older version of a set of
patterns. I've had hyphenations in a document change because of
changes in Babel, but if I had had to change the TeX document to
make it come out the same my first thought wouldn't be to change it
to load an older version of Babel.
* In an ideal world I think the restriction that only INITEX can load
hyphenation patterns should be lifted. I don't know, but I guess
that restriction is there because of performance problems that
probably aren't an issue anymore.
Then the cautious users could load their own patterns from their
time capsules too.
And then a better more general way to address the problem of having
several versions of packages etc. could be used for hyphenation
patterns as well as other files.
--
Per Starb\"ack <starback at ling.uu.se>
More information about the tex-hyphen
mailing list