[tex-hyphen] New Czechoslovak hyphenation patterns

Petr Sojka sojka at fi.muni.cz
Thu Jul 15 02:31:40 CEST 2021


Dear Arthur and Mojca, dear all,

attached is the draft of the paper offered for presentation
at TUG 2021, to discuss new workflow, Czechoslovak patterns
and new possibilities for hyphenation pattern development and merging.

When developing new Czech and Slovak patterns from word lists collected
from actual use on the web, we have realized that at least for close
syllable-based languages as Czech and Slovak, one can have only one
patterns file generated from merged word list
and still cover 99+% of hyphenation points without erroring at all.
The approach might be followed by other languages (bootstrapping quality
patterns from crawled and maintained word lists) and merged with close
languages to minimize the number of patterns and the need for explicit
language switching in the future. Challenging, but a bit
esoteric and complicated, though.

We would be glad if you will evaluate new patterns for inclusion in
hyph-utf8. We might discuss more at TUG 2021, eventually. All comments,
patches, pull requests are greatly appreciated.

All the best,
Petr and Ondřej

-------------- next part --------------
A non-text attachment was scrubbed...
Name: New_Czechoslovak_Hyphenation_Patterns_4_TUG_2021.pdf
Type: application/pdf
Size: 342316 bytes
Desc: not available
URL: <https://tug.org/pipermail/tex-hyphen/attachments/20210715/37af4c61/attachment-0001.pdf>


More information about the tex-hyphen mailing list.