[tex-hyphen] Hyphenation patterns for Belarusian

Arthur Reutenauer arthur.reutenauer at normalesup.org
Tue Aug 30 15:52:45 CEST 2016


> I'll review patterns and return when xetex will not complain.

  Actually, making the patterns acceptable to TeX is easy, I can do that
for you.  I think it would be more interesting to analyse the logic
behind them, and hopefully fix them, because there seems to be something
seriously wrong.

  Apart for a number of very special patterns, I can describe the full
list as follows: a pattern is of the form

	   ANY 1 CMA
	or 1 CMA 2 V
	or V 1 V
	or A 3 V
	or . ANY 8
	or 8 ANY .
	or the pattern д8ж
	or the pattern д8з
	or V 8 K
	or C 8 MA
	or MA 8 MA

where

	ANY is any letter of the Belarusian alphabet
	C is any consonant
	V is any vowel
	A is an apostrophe (' ` ’)
	K is й or ў
	M is the soft sign (ь)
	MA is M or an A
	CMA is a C or M or A

  Using that formalism (it’s a context-free grammar if you’re familiar
with that), I can generate exactly the patterns contained in the
LibreOffice package ... except for a list of almost 2000 patterns that
seem rather strange (between the rule 8 ANY . and the pattern д8ж).
They're all anchored at the beginning or the end of the word (i. e.,
they start or end with a dot), they have the digit 8 in every position,
and they contain only consonants and different apostrophes.  The first
ones seem somewhat reasonable, but they then get increasingly strange
and we then have patterns such as (8’s omitted for legibility) .бльгг,
.бррр, .брьггв, .дззззз, and (my personal favourite) .нннннн

  I would really have a hard time accepting that these patterns make
much sense.  I can imagine a few explanations for why they’re there, but
I think it would be good to try and understand what the original author
meant with that.

	Best,

		Arthur


More information about the tex-hyphen mailing list