[tex-hyphen] Hyphenation patterns for Belarusian

Arthur Reutenauer arthur.reutenauer at normalesup.org
Wed Aug 31 16:38:27 CEST 2016


> Here it is http://extensions.services.openoffice.org/en/project/dict-be-official

  Thanks.

> The file itself is in cp1251 and needs conversion to UTF-8
> iconv -f cp1251 -t UTF-8 < ./hyph_be_BY.dic > ./hyph_be_BY.txt
> + some hand editing to put the content inside \patterns{}

  Thanks, I know how to do that :-)

> According to comment on line 1414: intention to include such awkward patterns
> was to prohibit hyphenation if any part that is composed solely of consonants.

  There’s something odd anyway.  I still suspect the actual list of
patterns does not reflect the intention of the author.

> Ok, I'll ask.

  Thanks.  I don’t mind being copied on the conversation, even if it is
in Belarusian.  You should contact Sviatlana Liasovich as well, since
she’s mentioned as having made corrections; in fact I think it would be
accurate to consider her as the sole author of the OpenOffice file,
since I can’t discern any trace of the original patterns.

>>   That’s correct, but actually I would just write
>> 
>> 	д2ж
>> 	д2з
>> 	.пад3
>> 
>>   Using lower numbers to begin with makes it easier to refine later.
>> 
>>   That being said, is пад really always a prefix?
> 
> This would make life too easy :) In some words it is a part of the root and is hyphenated differently.
> E.g.: па-да-ру-нак, па-дзел, вы-па-дак, па-да-плё-ка.

  OK, that’s what I suspected :-)  In that case it’s probably safer to
stick to

	д2ж
	д2з
	.па2д3ж
	.па2д3з

and input падзел as an exception: \hyphenation{па-зел}.

  You need an even number after .па because of patterns of the type CVn,
with n an odd number to allow break; the OpenOffice patterns have C8V3,
but I would recommend CV1.

> Hyphenation right before й or ў is prohibited at all times, no exceptions. So 8 will be just right, I believe.

  That sounds right.  It’s of course all right to use 8 when break is
really prohibited, but the current files use way too much of them.

	Best,

		Arthur


More information about the tex-hyphen mailing list