Patgen

Wed May 15 22:40:24 CEST 2019

Am 15.05.19 um 19:53 schrieb Arthur Reutenauer:
> On Tue, May 14, 2019 at 10:55:32PM +0200, Keno Wehr wrote:
>> Is it possible to adapt patgen for such huge lists?
>    If you’re able to compile patgen yourself, it should be enough to
> change trie_size and triec_size in patgen.ch, currently set to
> 10,000,000 and 5,000,000 respectively.  It is possible that the
> percentages still will look silly because they’re computed as
>
> 	100 * good_count / ((double) good_count + miss_count)
>
> so that the numerator could result in an integer overflow considering
> the orders of magnitude we’re talking about: with 11 million entries,
> good_count could easily be over 22 million, which multiplied by a
> hundred will be more than can fit in a signed 32-bit integer.

Thank you for your advice. I will make a try.
It is perhaps better to use brackets for the calculation to avoid the 
overflow.

	100 * (good_count / ((double) good_count + miss_count))

> I am
> however not able to test it myself because the public repository for
> Classical Latin hyphenation currently only produce a list of a little
> over 2 million entries (I suppose you’re running patgen from the script
> in https://github.com/wehro/hyphen-la/tree/master/patterns/generation).

The correct location is 
https://github.com/gregorio-project/hyphen-la/tree/master/patterns/generation
All you need is the script "generate-patterns.sh" (and lua5.3 installed).
Unfortunately, I did not push the most recent change, which extends the 
list by a factor of 4. I have done that now.

Keno