[tex-hyphen] New language classiclatin – question about the language code

Claudio Beccari claudio.beccari at gmail.com
Wed Jun 4 00:11:37 CEST 2014


Dear friends,
things are getting complicated. Yesterday I loaded the new version 3.0 
of the babel-latin support for the Latin Language, that maakes use of 
two different sets of hyphenation patterns.

I called these two sets of patterns  hyph-la.tex (for Modern and 
medieval Latin and for consistence with the hyphenation patterns that 
are being used in the past 15 years or so -- the name hyph-la.tex has 
been created a few years ago when babel and polyglossia were 
reorganised) and hyph-lac.tex (for classic Latin).

Ih had to do so because the hyphenation rules used by modern scholars 
are different; the spelling of the three variants is different, but one 
pattern set can accomodate both modern and medieval Latin, while such 
procedure was impossible with classic Latin; I had to work quite a lot 
to produce the classic Latin patterns, because the rules aro so 
incompatible that a new different pattern set is required. Just to make 
an example the word transubstantialis is hyphenated as 
tran-sub-stan-tia-lis with modern/medieval latin patterns and as 
tran-subs-tan-ti-a-lis with classic latin patterns.

I considered that the tags la and  lac were sufficient to distinguish 
la(modern&medieval) from la(classic). I was not aware of the existance 
of special tags and subtags for the various languages as Mojca showed me 
through the link:

http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry

The contents of that language-subtag-registry file is a little 
misterious for me; neverthelss  I think I made a reasonable suggestion 
by proposing the addition of

%%
Type: variant
Subtag: classic
Description: Classic Latin
Added: 2014-06-03
Prefix: la

but the overall correspondece on this topic shows that a subtag classic 
may not be acceptable with the prefix la.

OK. And what about tags ans subtags for Greek? they contain some 
identical tags with different subtags (and I did not list all of them). 
Why la-classic is not acceptable, while, say, el and el-polyton are 
acceptable?

Mojca shows that Wikipedia lists more than half a dozen variants of 
Latin; I am not going to add so many modifiers/attributes for all these 
Latin languages; I would like that real linguists took care of their 
necessities with their professional competence, that I miss. But up to 
now I wrote some 15 pattern files for different languages (not all of 
them are or have been on CTAN); I got some experience in creating such 
pattern files without using patgen (because it assumes the existance of 
a very large list of correctly hyphenated words) but just using grammar 
rules. Now some difficulties arise because the language-subtag-registry 
file is incomplete.

Who is in charge of maintaining that file should provide the missing 
entries; I am not going to submit any request for updating that list. Is 
it really necessary for the good working of the TeX system or is it just 
one of those constraints that are becomming so common in modern times. I 
added the classiclatin language to my personal language files; recreated 
the formats and everything seems to work properly so that I succeeded 
testing and correcting as much as necessary. So I wonder what this fuss 
abut subtags is for.

If and when a decision on tags and subtags is taken, I accordingly 
change the contents of the babel-latin andgloss-latin language 
description files, and of course the hyph-*.tex files.

Please let me know the decisions taken on this matter.

Regards
Claudio



On 03/06/2014 19:32, Mojca Miklavec wrote:
> On Tue, Jun 3, 2014 at 5:55 PM, Claudio Beccari wrote:
>> On 03/06/2014 13:21, Mojca Miklavec wrote:
>>
>>> We need something from the standard, see
>>>      http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
>>> and I don't find anything suitable on the list.
>> I examined the file at the link indicated above; I could not figure out where that code is needed in the hyphenation files; I suppose it is not needed there.
> The code is not really *needed* for hyphenation, but we would like to
> use a consistent naming scheme for the patterns.
>
> In short: we can in principle pick any name, but I consider it a bad
> practice to rename the files later. To TeX users it shouldn't make a
> difference. If nobody is willing to deal with the standard, we don't
> really have to do that at all.
>
> The only visible difference is whether we put "classiclatin" or
> something else into language.dat/language.def.
>
>> Actually the file contains an entry:
>>
>> %%
>> Type: language
>> Subtag: la
>> Description: Latin
>> Added: 2005-10-16
>> Suppress-Script: Latn
>>
>> which is good for Latin, and porbabily is associated to the existing loadhyph-la.tex file and its friends.
> Yes. That subtag "la" gives the name to "hyph-la.tex", "loadhyph-la.tex" etc.
>
>> Now classic Latin is not a different language; it uses the same alphabet and I could not invent a new entry to that: the ISO regulatins don't distinguish modern Latin from medieval latin and from classical Latin.
> Yes, I'm aware of that.
>
>> The Wiki ISO 639 entries for Latin are:
>>
>> Language       Language  Native name            639
>> family         name                       1  2T  2B  3  6    notes
>>
>> Indo-European Latin latine, lingua latina la lat lat lat lats ancient
> We don't need a new ISO 639 entry (and cannot register one either).
> Just a subtag to specify the language variant. Similar to
> "el-monoton"/"el-polyton" and "de-1901"/"de-1996".
>
>> I know that it is necessary to name another language for a different hyphenation pattern set
>> the same as it is done with Greek (even if it is not actually used, or it is used in a funny way -- the (babel-)greek.ldf does not ever use the ancient greek pattern set nor the modern monotonic greek pattern set, it uses only the polytonic greek pattern set).
>>
>> Polyglossia, on the opposite does not have any problem in using three greek pattern sets: apparently they refer to languages "monotonic", "polytonic", and "ancient". In the language-subtag-registry for greek I find only:
> You need to talk to Arthur for support of a new language variant for
> Latin in Polyglossia. And probably prepare something for babel.
>
>> %%
>> Type: language
>> Subtag: el
>> Description: Modern Greek (1453-)
>> Added: 2005-10-16
>> Suppress-Script: Grek
>>
>> %%
>> Type: language
>> Subtag: grc
>> Description: Ancient Greek (to 1453)
>> Added: 2005-10-16
>>
>> %%
>> Type: language
>> Subtag: grk
>> Description: Greek languages
>> Added: 2009-07-29
>> Scope: collection
>>
>> %%
>> Type: variant
>> Subtag: monoton
>> Description: Monotonic Greek
>> Added: 2006-12-11
>> Prefix: el
>>
>> %%
>> Type: variant
>> Subtag: polyton
>> Description: Polytonic Greek
>> Added: 2006-12-11
>> Prefix: el
>>
>> plus several "script" entries.
>>
>> In similarity with Greek a new entry could be added as such:
>>
>> %%
>> Type: variant
>> Subtag: classic
>> Description: Classic Latin
>> Added: 2014-06-03
>> Prefix: la
>>
>> I think this is the least invasive addition to the subtag list.
> Yes, something like that would make sense. But: does one need more
> than one variant? It would be weird to request, say, "la-classic" now
> and "la-archaic" + "la-modern" + "la-neo" a few years later when you
> realize that even different patterns are needed for those. What set of
> subtags would make sense? Would you be willing to draft a proposal and
> submit a request for inclusion?
>
> See
>     http://www.langtag.net/register-new-subtag.html
>
> There is one potential problem with the above mentioned subtag though.
> According to RFC 5646:
>
> Requests to add a 'Prefix' field to a variant subtag that imply a
>     different semantic meaning SHOULD be rejected.  For example, a
>     request to add the prefix "de" to the subtag '1994' so that the tag
>     "de-1994" represented some German dialect or orthographic form would
>     be rejected.  The '1994' subtag represents a particular Slovenian
>     orthography, and the additional registration would change or blur the
>     semantic meaning assigned to the subtag.  A separate subtag SHOULD be
>     proposed instead.
>
> This means that registering "la-classic" would prohibit anyone else
> from registering "<anotherlanguagetag>-classic". There is a chance
> that no other language would ever need that tag anyway, but it's
> something I wasn't really aware of until now.
>
> I'm also a bit confused by what Wikipedia says:
>      The word "Latin" is now taken by default as meaning "Classical
> Latin", so that, for example, modern Latin text books describe
> classical Latin. [1]
>      Classicists use the term "Neo-Latin" to describe the use of the
> Latin language for any purpose, scientific or literary, after the
> Renaissance. [2]
>
> [1] http://en.wikipedia.org/wiki/Classical_Latin
> [2] http://en.wikipedia.org/wiki/New_Latin
>
> So to me it's not even clear which of the both patterns should be
> called "latin".
>
> But just to make sure that I understand it properly: are two pattern
> sets needed because exactly the same words would hyphenate differently
> in classical and modern Latin? Or is it just that the vocabularies of
> both are so different that it's very difficult or impossible to cover
> both variants of the language?
>
> (I'm asking because I would like to know if there is a trick to cover
> both variants with the same set of patterns or if that's theoretically
> impossible because the rules are too different.)
>
> Thank you,
>      Mojca

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/tex-hyphen/attachments/20140604/f5353025/attachment-0001.html>


More information about the tex-hyphen mailing list