[tex-hyphen] Names of files in OFFO

Mojca Miklavec mojca.miklavec.lists at gmail.com
Fri Mar 11 02:03:24 CET 2016


Dear Claudio and others,

Just to make something clear: for hyphenation of TeX documents we are
not changing anything at all (except if you are willing to register a
new language tag in which case we would remove the "-x-" from name,
but would still fully support typesetting Latin documents without any
visible change to the user). Period. The whole discussion is about how
to best support derived projects.

>From my understanding OFFO is targeting hyphenation of web pages. This
means that we are basically discussing how to properly hyphenate
Classical Latin text on those (rare?) web servers where the admins
actually bothered installing and enabling OFFO.

I would dare to bet that the cross-section of servers with proper OFFO
setup and websites written in Classical Latin with all the proper
tagging that enables hyphenation of the text must be just about zero,
but let us forget this for a moment and assume that there are
potential users where *both* content providers and server admins
bother enough to go through the trouble of setting everything up and
actually do it right. How do you actually tag the HTML to mark the
language of the text as Classical Latin? (Do you have any pointers to
websites that do it at least somehow right?)

I thought that perhaps tags like mul-Ethi or la-x-classic would not be
supported (they probably wouldn't be in a POSIX locale), but according
to the links below that's apparently not the case and *exactly* the
same standard is being used:

https://www.w3.org/International/questions/qa-html-language-declarations#langvalues
https://www.w3.org/International/articles/language-tags/

The problem is that la-x-classic means "private use" and nothing else
but that. So it is not realistic to expect from anyone to try to
follow a random private use code.

I see multiple solutions. The easiest solution is to say that we will
not bother about proper hyphenation of Classical Latin text on
websites. Honestly hardly anyone will be affected. The majority of
*English* websites doesn't use hyphenation at all and I seriously
doubt that those few experts of Classical Latin would actually notice
(lack of) hyphenation on the web. Usually nobody would even set any
language tag at all.

But if you *do* want to support Classical Latin properly, you really
should go through the registration process of a new language tag.

Claudio: a colleague of mine recently registered three language
subtags. You can check
    http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
You'll find entries like "bohoric", "dajnko", "metelko". Those tags
are way less important than Classical Latin and they were successfully
registered without any problems.

I think that it is unfair to expect from the OFFO (or any other)
project to properly support hyphenation of Classical Latin if nobody
bothered to register a suitable tag to make their life easier.

On 10 March 2016 at 19:26, Claudio Beccari wrote:
>
> But if la-x-classic is not usable in OFFO, any other name can be chosen
> provided the pattern set is not changed.

No. No other name can be chosen. Either the subtag gets registered
with IANA or some patterns must be dropped and support for only one
set of Latin patterns is kept. (We are still talking about websites,
perhaps eBook readers and alike. Not about TeX.)

(I'm sorry if I was a bit [too] harsh. That wasn't my intention, but I
really wanted to encourage you to try to officially register a new
subtag or alternatively at least help us write the justification /
explanation / proposal. The next question might arise whether we also
need a special ISO 3166 country code or subtag for the Roman Empire :)

Of course there's still a question for Simon & Luis as authors of OFFO
about the level of support for BCP 47 in OFFO. Their codebase would
still have to properly support "inexact matches", so that text tagged
as "la-IT-Latn" or "la-VA[-x]-classic" for example would still by
hyphenated as Latin / Classical Latin.

Mojca

(PS: I still wonder why we keep using all-lowercase names for a
standard where capitalization matters; just to help confuse other
users even more.)


More information about the tex-hyphen mailing list