[tex-hyphen] luatex and file names

Arthur Reutenauer arthur.reutenauer at normalesup.org
Tue Aug 11 18:51:48 CEST 2015


> Well, what I was wondering is if we need to load any
> data at all (except english for compatibility).

  I don't think we need to, it's just the way it was done.  I didn't
consider it until you brought it up, but it would make sense to leave
the format alone and load everything at runtime depending on what's
available.

  Reading between the lines of luatex-hyphen.pdf, I'm guessing the
authors were envisaging a transition period where installations both
with and without the patterns in the format could exist; that was at a
time when packages where actively developed for LuaTeX (luatex-hyphen
was started in 2010).  This didn't happen; when I ported Polyglossia to
LuaTeX two years later I didn't consider the case where patterns
available were the format and expected to have to load them on-demand
anyway.  I suppose you did the same for Babel.

>                                            Furthermore,
> I think this improves compatibility, because we never
> know how the format was built (and also think of local
> patterns).

  I was thinking along those lines too: having metadata in the format
makes the system *less* portable since we then need to have at runtime
the exact data the metadata was referring to.

>            This also open hyphenation to packages (patterns
> based on rules are easy to program even in TeX, as mkpattern
> does, for example).

  That's certainly useful for some applications.

>>	http://tug.org/TUGboat/tb29-3/tb93miklavec.pdf
> 
> Well, but it's not CTAN :-).

  It is quite easy to reach from CTAN if you look for documentation.
The top-level README for hyph-utf8 mentions the hyphenation page on
tug.org, that has a link to the TUGboat article.  This of course demands
a small amount of research, and a little bit of navigation, but that's
always going to be the case for any package -- although a tiny
improvement is now possible: CTAN has very recently started allowing
package writers to put their README in Markdown format in order to
display it directly on the package page.  Until now there was only a
link to it from http://www.ctan.org/pkg/hyph-utf8 -- I've now converted
our README to Markdown in order to remove one level of indirection.

  More generally, I think that this kind of information is best written
up as an article and published in a place such as TUGboat, especially
considering the very low level of awareness of BCP 47.  This is really a
pity because it's a really useful standard, and particularly well suited
to identifying languages in the TeX world -- so yes, I'd much rather
have people ask questions about it on mailing lists in order to spread
the word :-)

	Best,

		Arthur


More information about the tex-hyphen mailing list