[tex-hyphen] Renaming plain pattern files in hyph-utf8: feedback?

Claudio Beccari claudio.beccari at gmail.com
Tue Apr 12 00:15:44 CEST 2016

I am keen to any naming policy, but I am not keen to delete any pattern 
file because it does not fit into the language-country template. Let's 
not speak of Latin pattern files, for which I already spent several 
gallons of elbow grease, and for which there is no country to match the 
language-country pattern (*); let's not speak about Greek pattern files, 
for which Apostolos already spent his gallons of elbow grease. Let's 
speak about English for which there are several pattern files, in spite 
of being under the same roof "English"; we heave hyphen.tex, the 
original Knuthian pattern file, for the equivalent language names of 
english, american, usenglish, USenglish; ushyphenmax.tex (renamed 
hyph-en-us.tex), hyph-en-uk.tex (for the equivalent language names, 
british, ukenglish, UKenglish, australian, newzealand).
Which is the incorrectly named pattern file to eliminate? the Knuthian one?


(*) The official language of the State of Vatican is Italian; Latin is 
the official language of the Holy See, which is not a state, although it 
is resident in the State of Vatican.

On 11/04/2016 23:34, Luis Bernardo wrote:
> Changes to OFFO will be contained to a single file, so this won't be a 
> problem.
> On 4/11/16 12:18 AM, Mojca Miklavec wrote:
>> Hi,
>> Me and Arthur were discussing changing the names and locations of
>> files in our tex-hyphen repository. We would only change organization
>> of files in plain text that are currently used for LuaTeX only (but
>> also in external projects). The following files to be precise:
>> http://tug.org/svn/texhyphen/trunk/hyph-utf8/tex/generic/hyph-utf8/patterns/txt/?pathrev=743 
>> The following projects might be affected:
>> - potentially babel, polyglossia
>> - ConTeXt (requires just a change in a script that generates the 
>> internal files)
>> - MikTeX
>> - TeX Live's language.dat.lua (we have that under control)
>> - external projects taking patterns from us, like OFFO, Mozilla etc.
>> Below is our proposal, but we would be grateful for further feedback
>> and suggestions. In any case we would like to avoid problems at all
>> costs. If there is any fear that removing the files might break
>> functionality, we could of course keep the old files (or at least
>> symlinks?), even though I'm afraid that duplicated files would
>> introduce even more confusion.
>> Major changes would be the following:
>> - Every language would contain an additional folder with the name of
>> that language (patterns for English, Greek, German, Latin, Mongolian,
>> Serbian would end up in the same folder; not sure about Norwegian
>> which ship the same patterns for two language codes). We also need a
>> reasonable name for the complete collection of those patterns.
>> Something like "plain"?
>> - We would add a file hyph-<lang>.LICENSE with just the copyright
>> holder and text of the licence.
>> - We would replace a file hyph-<lang>.lic.txt (which currently
>> contains a literal copy of everything above \patterns{}) with
>> hyph-<lang>.yaml that would contain a consistent human-readable and
>> computer-parsable information about those patterns.
>> - hyph-<lang>.pat.txt -> hyph-<lang>.patterns
>> - hyph-<lang>.hyp.txt -> hyph-<lang>.exceptions
>> I would be particular grateful for feedback about that. Arthur is
>> convinced (and I agree) that ".pat.txt" and ".hyp.txt" are completely
>> confusing file endings. On the other hand ".patterns" and
>> ".exceptions" might have problems opening by double-clicking on them
>> as the file ending would not be recognized. It is not *absolutely*
>> necessary that we change that, but if we ever change it, this would
>> probably be the best moment for the change. We are both in favour of
>> changing it. I'm only not 100% convinced with the proposed change.
>> - Empty files with no hyphenation exceptions would be gone. That
>> caused some problems to some people (for example we recently got a
>> complaint from someone packaging for Arch linux, but I'm unable to
>> find that email).
>> - I don't know whether ".chr.txt" is usefull at all (it contains a
>> list of characters used in the file with patterns and exceptions). The
>> list of characters could be part of the yaml file in case anyone would
>> find it useful. I'm tempted to remove the file itself, but we could
>> just as well keep it if others find it useful (even if it can easily
>> be generated). It would actually be way more useful to list all
>> equivalent characters there (say that there is no difference between
>> "a" and "á": then they could be treated as equal characters by some
>> software). Feedback welcome.
>> ---------
>> Just to make it clear: it is unlikely that this would affect any
>> regular TeX (or LuaTeX) user that enables hyphenation via \usepackage
>> (either with Babel or Polyglossia). The names of files would be
>> changed in language.dat.lua used by LuaTeX and the change should be
>> transparent for all users except of those doing some non-conventional
>> things.
>> The best time to do any such changes is the time before the TL
>> release. That means: right now. Or one year from now. But given that
>> we have already done some changes and that we plan to do further
>> things to simplify life of "downstream" packagers of patterns and
>> users like LibreOffice, Android, ... it makes more sense to change
>> things now than one year from now.
>> We will also try to do something to enable easier navigation through
>> our files for the sake of CTAN. And support for LibreOffice will most
>> likely require yet further post-processing steps (some compression of
>> patterns).
>> ---------
>> (Un)related: we are thinking of switching from SVN to GIT.
>> ---------
>> Here's the concrete example of proposed changes:
>> Current:
>> tex/generic/hyph-utf8/patterns/
>>      txt/hyph-el-monoton.chr.txt
>>      txt/hyph-el-monoton.hyp.txt
>>      txt/hyph-el-monoton.lic.txt
>>      txt/hyph-el-monoton.pat.txt
>>      txt/hyph-el-polyton.chr.txt
>>      txt/hyph-el-polyton.hyp.txt
>>      txt/hyph-el-polyton.lic.txt
>>      txt/hyph-el-polyton.pat.txt
>>      txt/hyph-en-gb.chr.txt
>>      txt/hyph-en-gb.hyp.txt
>>      txt/hyph-en-gb.lic.txt
>>      txt/hyph-en-gb.pat.txt
>>      txt/hyph-en-us.chr.txt
>>      txt/hyph-en-us.hyp.txt
>>      txt/hyph-en-us.lic.txt
>>      txt/hyph-en-us.pat.txt
>> Proposed:
>> tex/generic/hyph-utf8/patterns/
>>      plain/el/hyph-el-monoton.patterns
>>      plain/el/hyph-el-monoton.LICENSE
>>      plain/el/hyph-el-monoton.yaml
>>      plain/el/hyph-el-polyton.patterns
>>      plain/el/hyph-el-polyton.LICENSE
>>      plain/el/hyph-el-polyton.yaml
>>      plain/en/hyph-en-gb.patterns
>>      plain/en/hyph-en-gb.exceptions
>>      plain/en/hyph-en-gb.LICENSE
>>      plain/en/hyph-en-gb.yaml
>>      plain/en/hyph-en-us.patterns
>>      plain/en/hyph-en-us.exceptions
>>      plain/en/hyph-en-us.LICENSE
>>      plain/en/hyph-en-us.yaml
>> Mojca

More information about the tex-hyphen mailing list