[tex-hyphen] Renaming plain pattern files in hyph-utf8: feedback?

Luis Bernardo lmpmbernardo at gmail.com
Mon Apr 11 23:34:53 CEST 2016

Changes to OFFO will be contained to a single file, so this won't be a 

On 4/11/16 12:18 AM, Mojca Miklavec wrote:
> Hi,
> Me and Arthur were discussing changing the names and locations of
> files in our tex-hyphen repository. We would only change organization
> of files in plain text that are currently used for LuaTeX only (but
> also in external projects). The following files to be precise:
> http://tug.org/svn/texhyphen/trunk/hyph-utf8/tex/generic/hyph-utf8/patterns/txt/?pathrev=743
> The following projects might be affected:
> - potentially babel, polyglossia
> - ConTeXt (requires just a change in a script that generates the internal files)
> - MikTeX
> - TeX Live's language.dat.lua (we have that under control)
> - external projects taking patterns from us, like OFFO, Mozilla etc.
> Below is our proposal, but we would be grateful for further feedback
> and suggestions. In any case we would like to avoid problems at all
> costs. If there is any fear that removing the files might break
> functionality, we could of course keep the old files (or at least
> symlinks?), even though I'm afraid that duplicated files would
> introduce even more confusion.
> Major changes would be the following:
> - Every language would contain an additional folder with the name of
> that language (patterns for English, Greek, German, Latin, Mongolian,
> Serbian would end up in the same folder; not sure about Norwegian
> which ship the same patterns for two language codes). We also need a
> reasonable name for the complete collection of those patterns.
> Something like "plain"?
> - We would add a file hyph-<lang>.LICENSE with just the copyright
> holder and text of the licence.
> - We would replace a file hyph-<lang>.lic.txt (which currently
> contains a literal copy of everything above \patterns{}) with
> hyph-<lang>.yaml that would contain a consistent human-readable and
> computer-parsable information about those patterns.
> - hyph-<lang>.pat.txt -> hyph-<lang>.patterns
> - hyph-<lang>.hyp.txt -> hyph-<lang>.exceptions
> I would be particular grateful for feedback about that. Arthur is
> convinced (and I agree) that ".pat.txt" and ".hyp.txt" are completely
> confusing file endings. On the other hand ".patterns" and
> ".exceptions" might have problems opening by double-clicking on them
> as the file ending would not be recognized. It is not *absolutely*
> necessary that we change that, but if we ever change it, this would
> probably be the best moment for the change. We are both in favour of
> changing it. I'm only not 100% convinced with the proposed change.
> - Empty files with no hyphenation exceptions would be gone. That
> caused some problems to some people (for example we recently got a
> complaint from someone packaging for Arch linux, but I'm unable to
> find that email).
> - I don't know whether ".chr.txt" is usefull at all (it contains a
> list of characters used in the file with patterns and exceptions). The
> list of characters could be part of the yaml file in case anyone would
> find it useful. I'm tempted to remove the file itself, but we could
> just as well keep it if others find it useful (even if it can easily
> be generated). It would actually be way more useful to list all
> equivalent characters there (say that there is no difference between
> "a" and "á": then they could be treated as equal characters by some
> software). Feedback welcome.
> ---------
> Just to make it clear: it is unlikely that this would affect any
> regular TeX (or LuaTeX) user that enables hyphenation via \usepackage
> (either with Babel or Polyglossia). The names of files would be
> changed in language.dat.lua used by LuaTeX and the change should be
> transparent for all users except of those doing some non-conventional
> things.
> The best time to do any such changes is the time before the TL
> release. That means: right now. Or one year from now. But given that
> we have already done some changes and that we plan to do further
> things to simplify life of "downstream" packagers of patterns and
> users like LibreOffice, Android, ... it makes more sense to change
> things now than one year from now.
> We will also try to do something to enable easier navigation through
> our files for the sake of CTAN. And support for LibreOffice will most
> likely require yet further post-processing steps (some compression of
> patterns).
> ---------
> (Un)related: we are thinking of switching from SVN to GIT.
> ---------
> Here's the concrete example of proposed changes:
> Current:
> tex/generic/hyph-utf8/patterns/
>      txt/hyph-el-monoton.chr.txt
>      txt/hyph-el-monoton.hyp.txt
>      txt/hyph-el-monoton.lic.txt
>      txt/hyph-el-monoton.pat.txt
>      txt/hyph-el-polyton.chr.txt
>      txt/hyph-el-polyton.hyp.txt
>      txt/hyph-el-polyton.lic.txt
>      txt/hyph-el-polyton.pat.txt
>      txt/hyph-en-gb.chr.txt
>      txt/hyph-en-gb.hyp.txt
>      txt/hyph-en-gb.lic.txt
>      txt/hyph-en-gb.pat.txt
>      txt/hyph-en-us.chr.txt
>      txt/hyph-en-us.hyp.txt
>      txt/hyph-en-us.lic.txt
>      txt/hyph-en-us.pat.txt
> Proposed:
> tex/generic/hyph-utf8/patterns/
>      plain/el/hyph-el-monoton.patterns
>      plain/el/hyph-el-monoton.LICENSE
>      plain/el/hyph-el-monoton.yaml
>      plain/el/hyph-el-polyton.patterns
>      plain/el/hyph-el-polyton.LICENSE
>      plain/el/hyph-el-polyton.yaml
>      plain/en/hyph-en-gb.patterns
>      plain/en/hyph-en-gb.exceptions
>      plain/en/hyph-en-gb.LICENSE
>      plain/en/hyph-en-gb.yaml
>      plain/en/hyph-en-us.patterns
>      plain/en/hyph-en-us.exceptions
>      plain/en/hyph-en-us.LICENSE
>      plain/en/hyph-en-us.yaml
> Mojca

More information about the tex-hyphen mailing list