[tex-hyphen] Procedure for adding alternative patterns

Mojca Miklavec mojca.miklavec.lists at gmail.com
Tue Sep 26 10:27:21 CEST 2017


Dear Stojan,

On 26 September 2017 at 09:44, Стоян Димитров wrote:
> Here is the information I could gather so far:
>
> The author of the proposed patterns is the renown dr. Anton Zinoviev [1]. In
> a private communication I was assured that his work covers the official
> hyphenation algorithm as published from Institute for Bulgarian Language [2]
> in Official Spelling Dictionary [3] which is the official normative
> reference book on spelling Bulgarian language. Additionally I was assured
> that full coverage of the algorithm is possible for Bulgarian without
> defects because of the simple nature of the hyphenation rules.
>
> Links to sources:
>
> upstream (supplied by the author) [4]
> converted dictionary for Open/LibreOffice [5]
>
> ___
> [1] http://lml.bas.bg/staff.html
> [2] http://ibl.bas.bg/en/
> [3] http://ibl.bas.bg/en/struktura/savremenen-balgarski-ezik/publikatsii/
> [4] http://logic.fmi.uni-sofia.bg/zinoviev/bgtex-v3.tgz
> [5] https://sourceforge.net/p/bgoffice/code/HEAD/tree/trunk/OOo-hyph-bg/

Thank you very much. I took a quick glimpse. Both patterns (the ones
we currently use and the ones from your link) seem to be
auto-generated from a script rather than via patgen. That should make
it much easier to compare them.

For the patterns from your link it would help to:
- convert them to UTF-8
- create a script that generates those patterns (we should be able to
help with that if help is needed; but I assume the author already has
some script unless this was assembled manually; ideally this
could/should be done for the existing patterns as well)
- (Ideally get an agreement for MIT licence?)

Once we get a generating script (or rather: exact rules), it should be
straightforward to compare both sets and point out the exact
differences. I would then suggest to contact both authors to comment
on the differences, ideally agree which patterns are better and why
and then we could "discard" one of the two sets and take the best set.
Given that this is not a random set from patgen, this should be
doable.

(It would also be nice to publish an article in English describing those rules.)

Mojca



More information about the tex-hyphen mailing list