[tex-hyphen] New language classiclatin – question about the language code

Mojca Miklavec mojca.miklavec.lists at gmail.com
Tue Jun 3 19:32:03 CEST 2014

On Tue, Jun 3, 2014 at 5:55 PM, Claudio Beccari wrote:
> On 03/06/2014 13:21, Mojca Miklavec wrote:
>> We need something from the standard, see
>>     http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
>> and I don't find anything suitable on the list.
> I examined the file at the link indicated above; I could not figure out where that code is needed in the hyphenation files; I suppose it is not needed there.

The code is not really *needed* for hyphenation, but we would like to
use a consistent naming scheme for the patterns.

In short: we can in principle pick any name, but I consider it a bad
practice to rename the files later. To TeX users it shouldn't make a
difference. If nobody is willing to deal with the standard, we don't
really have to do that at all.

The only visible difference is whether we put "classiclatin" or
something else into language.dat/language.def.

> Actually the file contains an entry:
> %%
> Type: language
> Subtag: la
> Description: Latin
> Added: 2005-10-16
> Suppress-Script: Latn
> which is good for Latin, and porbabily is associated to the existing loadhyph-la.tex file and its friends.

Yes. That subtag "la" gives the name to "hyph-la.tex", "loadhyph-la.tex" etc.

> Now classic Latin is not a different language; it uses the same alphabet and I could not invent a new entry to that: the ISO regulatins don't distinguish modern Latin from medieval latin and from classical Latin.

Yes, I'm aware of that.

> The Wiki ISO 639 entries for Latin are:
> Language       Language  Native name            639
> family         name                       1  2T  2B  3  6    notes
> Indo-European Latin latine, lingua latina la lat lat lat lats ancient

We don't need a new ISO 639 entry (and cannot register one either).
Just a subtag to specify the language variant. Similar to
"el-monoton"/"el-polyton" and "de-1901"/"de-1996".

> I know that it is necessary to name another language for a different hyphenation pattern set
> the same as it is done with Greek (even if it is not actually used, or it is used in a funny way -- the (babel-)greek.ldf does not ever use the ancient greek pattern set nor the modern monotonic greek pattern set, it uses only the polytonic greek pattern set).
> Polyglossia, on the opposite does not have any problem in using three greek pattern sets: apparently they refer to languages "monotonic", "polytonic", and "ancient". In the language-subtag-registry for greek I find only:

You need to talk to Arthur for support of a new language variant for
Latin in Polyglossia. And probably prepare something for babel.

> %%
> Type: language
> Subtag: el
> Description: Modern Greek (1453-)
> Added: 2005-10-16
> Suppress-Script: Grek
> %%
> Type: language
> Subtag: grc
> Description: Ancient Greek (to 1453)
> Added: 2005-10-16
> %%
> Type: language
> Subtag: grk
> Description: Greek languages
> Added: 2009-07-29
> Scope: collection
> %%
> Type: variant
> Subtag: monoton
> Description: Monotonic Greek
> Added: 2006-12-11
> Prefix: el
> %%
> Type: variant
> Subtag: polyton
> Description: Polytonic Greek
> Added: 2006-12-11
> Prefix: el
> plus several "script" entries.
> In similarity with Greek a new entry could be added as such:
> %%
> Type: variant
> Subtag: classic
> Description: Classic Latin
> Added: 2014-06-03
> Prefix: la
> I think this is the least invasive addition to the subtag list.

Yes, something like that would make sense. But: does one need more
than one variant? It would be weird to request, say, "la-classic" now
and "la-archaic" + "la-modern" + "la-neo" a few years later when you
realize that even different patterns are needed for those. What set of
subtags would make sense? Would you be willing to draft a proposal and
submit a request for inclusion?


There is one potential problem with the above mentioned subtag though.
According to RFC 5646:

Requests to add a 'Prefix' field to a variant subtag that imply a
   different semantic meaning SHOULD be rejected.  For example, a
   request to add the prefix "de" to the subtag '1994' so that the tag
   "de-1994" represented some German dialect or orthographic form would
   be rejected.  The '1994' subtag represents a particular Slovenian
   orthography, and the additional registration would change or blur the
   semantic meaning assigned to the subtag.  A separate subtag SHOULD be
   proposed instead.

This means that registering "la-classic" would prohibit anyone else
from registering "<anotherlanguagetag>-classic". There is a chance
that no other language would ever need that tag anyway, but it's
something I wasn't really aware of until now.

I'm also a bit confused by what Wikipedia says:
    The word "Latin" is now taken by default as meaning "Classical
Latin", so that, for example, modern Latin text books describe
classical Latin. [1]
    Classicists use the term "Neo-Latin" to describe the use of the
Latin language for any purpose, scientific or literary, after the
Renaissance. [2]

[1] http://en.wikipedia.org/wiki/Classical_Latin
[2] http://en.wikipedia.org/wiki/New_Latin

So to me it's not even clear which of the both patterns should be
called "latin".

But just to make sure that I understand it properly: are two pattern
sets needed because exactly the same words would hyphenate differently
in classical and modern Latin? Or is it just that the vocabularies of
both are so different that it's very difficult or impossible to cover
both variants of the language?

(I'm asking because I would like to know if there is a trick to cover
both variants with the same set of patterns or if that's theoretically
impossible because the rules are too different.)

Thank you,

More information about the tex-hyphen mailing list