[tex-hyphen] Renaming plain pattern files in hyph-utf8: feedback?

Mojca Miklavec mojca.miklavec.lists at gmail.com
Mon Apr 11 00:18:46 CEST 2016


Hi,

Me and Arthur were discussing changing the names and locations of
files in our tex-hyphen repository. We would only change organization
of files in plain text that are currently used for LuaTeX only (but
also in external projects). The following files to be precise:

http://tug.org/svn/texhyphen/trunk/hyph-utf8/tex/generic/hyph-utf8/patterns/txt/?pathrev=743


The following projects might be affected:
- potentially babel, polyglossia
- ConTeXt (requires just a change in a script that generates the internal files)
- MikTeX
- TeX Live's language.dat.lua (we have that under control)
- external projects taking patterns from us, like OFFO, Mozilla etc.

Below is our proposal, but we would be grateful for further feedback
and suggestions. In any case we would like to avoid problems at all
costs. If there is any fear that removing the files might break
functionality, we could of course keep the old files (or at least
symlinks?), even though I'm afraid that duplicated files would
introduce even more confusion.


Major changes would be the following:

- Every language would contain an additional folder with the name of
that language (patterns for English, Greek, German, Latin, Mongolian,
Serbian would end up in the same folder; not sure about Norwegian
which ship the same patterns for two language codes). We also need a
reasonable name for the complete collection of those patterns.
Something like "plain"?

- We would add a file hyph-<lang>.LICENSE with just the copyright
holder and text of the licence.

- We would replace a file hyph-<lang>.lic.txt (which currently
contains a literal copy of everything above \patterns{}) with
hyph-<lang>.yaml that would contain a consistent human-readable and
computer-parsable information about those patterns.

- hyph-<lang>.pat.txt -> hyph-<lang>.patterns
- hyph-<lang>.hyp.txt -> hyph-<lang>.exceptions
I would be particular grateful for feedback about that. Arthur is
convinced (and I agree) that ".pat.txt" and ".hyp.txt" are completely
confusing file endings. On the other hand ".patterns" and
".exceptions" might have problems opening by double-clicking on them
as the file ending would not be recognized. It is not *absolutely*
necessary that we change that, but if we ever change it, this would
probably be the best moment for the change. We are both in favour of
changing it. I'm only not 100% convinced with the proposed change.

- Empty files with no hyphenation exceptions would be gone. That
caused some problems to some people (for example we recently got a
complaint from someone packaging for Arch linux, but I'm unable to
find that email).

- I don't know whether ".chr.txt" is usefull at all (it contains a
list of characters used in the file with patterns and exceptions). The
list of characters could be part of the yaml file in case anyone would
find it useful. I'm tempted to remove the file itself, but we could
just as well keep it if others find it useful (even if it can easily
be generated). It would actually be way more useful to list all
equivalent characters there (say that there is no difference between
"a" and "á": then they could be treated as equal characters by some
software). Feedback welcome.

---------

Just to make it clear: it is unlikely that this would affect any
regular TeX (or LuaTeX) user that enables hyphenation via \usepackage
(either with Babel or Polyglossia). The names of files would be
changed in language.dat.lua used by LuaTeX and the change should be
transparent for all users except of those doing some non-conventional
things.

The best time to do any such changes is the time before the TL
release. That means: right now. Or one year from now. But given that
we have already done some changes and that we plan to do further
things to simplify life of "downstream" packagers of patterns and
users like LibreOffice, Android, ... it makes more sense to change
things now than one year from now.

We will also try to do something to enable easier navigation through
our files for the sake of CTAN. And support for LibreOffice will most
likely require yet further post-processing steps (some compression of
patterns).

---------

(Un)related: we are thinking of switching from SVN to GIT.

---------

Here's the concrete example of proposed changes:

Current:

tex/generic/hyph-utf8/patterns/
    txt/hyph-el-monoton.chr.txt
    txt/hyph-el-monoton.hyp.txt
    txt/hyph-el-monoton.lic.txt
    txt/hyph-el-monoton.pat.txt

    txt/hyph-el-polyton.chr.txt
    txt/hyph-el-polyton.hyp.txt
    txt/hyph-el-polyton.lic.txt
    txt/hyph-el-polyton.pat.txt

    txt/hyph-en-gb.chr.txt
    txt/hyph-en-gb.hyp.txt
    txt/hyph-en-gb.lic.txt
    txt/hyph-en-gb.pat.txt

    txt/hyph-en-us.chr.txt
    txt/hyph-en-us.hyp.txt
    txt/hyph-en-us.lic.txt
    txt/hyph-en-us.pat.txt


Proposed:

tex/generic/hyph-utf8/patterns/
    plain/el/hyph-el-monoton.patterns
    plain/el/hyph-el-monoton.LICENSE
    plain/el/hyph-el-monoton.yaml

    plain/el/hyph-el-polyton.patterns
    plain/el/hyph-el-polyton.LICENSE
    plain/el/hyph-el-polyton.yaml

    plain/en/hyph-en-gb.patterns
    plain/en/hyph-en-gb.exceptions
    plain/en/hyph-en-gb.LICENSE
    plain/en/hyph-en-gb.yaml

    plain/en/hyph-en-us.patterns
    plain/en/hyph-en-us.exceptions
    plain/en/hyph-en-us.LICENSE
    plain/en/hyph-en-us.yaml

Mojca



More information about the tex-hyphen mailing list