[tex-hyphen] Names of files in OFFO
Arthur Reutenauer
arthur.reutenauer at normalesup.org
Sat Mar 12 21:51:54 CET 2016
On Sat, Mar 12, 2016 at 01:24:51PM +0100, Claudio Beccari wrote:
> Well, the linguistic differences are minimal as well as the differences
> beween American and British are minimal
Hurrah! I got you exactly where I wanted. Maieutics works after all :-)
What you just replied to are of course the exact words of the email I sent on
Thursday evening, up to a "that", an "and", and a comma. You reacted quite
strongly to that, and I'm sorry that I offended you, that wasn't my intention.
I'm also sorry that I seemed to ignore many of the interesting comments you
made in the emails you sent in the mean time, but I had to stay focused and
drive my point home. I'll now come back to them.
On Thursday evening I was trying to point out that there were several layers
in the way Latin is currently supported in TeX, and you're absolutely right
that there is a similar situation for English, except that for English it's
even more complicated. I'll try to explain the situation in reply to Barbara's
email, but for the time being I have to stick to Latin, there is already a lot
to say.
By now we have very well established that the sets of hyphenation patterns
you created are best referred to as "phonetical" and "etymological" rather than
"modern or medieval" and "classical". It's also essential to point out that
they have been created with specific use cases in mind, the three ones I
described in my email from Friday 13:24 UTC.
The latter point is important because it defines the real-life scenarios that
are supported by the current setup. It would be insane to try and support all
possible combinations of the different options that are available.
With that in mind, what should we call the different options we have?
The naming scheme we use for hyphenation patterns is the IETF's BCP 47
standard (https://tools.ietf.org/html/bcp47), that is both strictly defined and
flexible enough to distinguish between all the different variants we need to
label: we can use codes for languages (according to ISO 639), writing systems,
also known as scripts (ISO 15924), and countries (ISO 3166). There are also a
number of specially defined subtags to distinguish other variants, for example
the different types of German spellings, or the polytonic and monotonic
orthographies of Modern Greek (the full list of all subtags is maintained at
http://www.iana.org/assignments/language-subtag-registry). Finally, it also
allows us to define private subtags (prefixed by 'x'), like we have done for
your newest Latin hyphenation patterns.
I think that the two sets of patterns we have should actually be tagged using
private subtags within a namespace, for example la-x-hyphtex-phonetic and
la-x-hyphtex-etymological. The shaping engine Harfbuzz uses the same approach.
Of course, we need to retain the simple tag "la" for the former pattern set, so
we should ideally having an aliasing system, which is not available at the
moment, so I'm not suggesting we make any change now.
The actual language variants (classical, medieval, modern) are another
problem. On the face of it, they're three successive stages of the evolution
of Latin, differing in pronunciation, vocabulary, morphology, syntax. There
is, however, a major complication because they're only known through their
written form and are hard to define. In practice, the only issues that matter
for typography are differences in spelling (u/v, i/j, etc.) It thus seems that
in order to name these different variants it would be best to actually stick to
the orthographical features than define them, rather than use chronological
qualifiers such as "classical" or "modern". I don't actually have a concrete
solution to this problem at the moment; however, the discussions I've had to
attempt to classify the variants of Latin for typesetting have led me to
believe that this is the best approach.
This of course doesn't change anything to the fact that the end user should
actually only see simple, "top-level" option such a classical or modern.
Best,
Arthur
More information about the tex-hyphen
mailing list