[tex-hyphen] Patgen
Arthur Reutenauer
arthur.reutenauer at normalesup.org
Wed May 15 19:53:56 CEST 2019
Hi Keno,
On Tue, May 14, 2019 at 10:55:32PM +0200, Keno Wehr wrote:
> Is it possible to adapt patgen for such huge lists?
If you’re able to compile patgen yourself, it should be enough to
change trie_size and triec_size in patgen.ch, currently set to
10,000,000 and 5,000,000 respectively. It is possible that the
percentages still will look silly because they’re computed as
100 * good_count / ((double) good_count + miss_count)
so that the numerator could result in an integer overflow considering
the orders of magnitude we’re talking about: with 11 million entries,
good_count could easily be over 22 million, which multiplied by a
hundred will be more than can fit in a signed 32-bit integer. I am
however not able to test it myself because the public repository for
Classical Latin hyphenation currently only produce a list of a little
over 2 million entries (I suppose you’re running patgen from the script
in https://github.com/wehro/hyphen-la/tree/master/patterns/generation).
Best,
Arthur
More information about the tex-hyphen
mailing list