[tex-hyphen] Hyphenation of Uyghur

Yannis Haralambous yannis1962 at gmail.com
Sat Feb 27 09:27:16 CET 2021


Here are some ideas about how to implement the needed features in the easiest possible way.

We need exactly two kinds of potential hyphenation point: the one where ZWJ has to be inserted on the next line,
and the one where ZWNJ can be inserted on the next line.

We should avoid changing the way patterns are stored in tables, otherwise it would imply too many changes.
Ideally it should be a generalization of the existing format.

Looking at §921 in TeX, the Program, I see that hyf_num is an array of small_number, and according to §101,
a small_number is a number 0..63. As I doubt anyone has ever used values higher than 20, I suggest considering
values 63, 62, 61, etc. as being -1, -2, -3, etc.

In the hyphenation patterns we will need an additional notation to distinguish "positive" from "negative" patterns,
(for example a * in front of the number)
positive will be stored as usual, and "negative" patterns will be stored with high values 63, 62, etc.

We need new primitives

\neghyphenchar
\prehyphenchar
\posthyphenchar
\preneghyphenchar
\postneghyphenchar

where \post* are character nodes inserted on the next line.

Then we change the code of §923: it will first look for ordinary hyphen locations and then for "negative" hyphen
locations. We handle both in a similar way (replacing high values 63, 62, ... by 1, 2, ...) but apply \prehyphenchar
\hyphenchar and \posthyphenchar in the first case, and \preneghyphenchar, \neghyphenchar and \postneghyphenchar
in the second case. The rest is as usual.

I think this solves the Uyghur issue with a minimal amount of changes.

> Le 27 févr. 2021 à 00:33, Yannis Haralambous <yannis1962 at gmail.com> a écrit :
> 
> Oops, you’re right. We do need two kinds of hyphenchars, with and without ^^^^200d. 
> 
> Envoyé de mon iPhone
> 
>> Le 26 févr. 2021 à 23:52, Jonathan Kew <jfkthame at gmail.com> a écrit :
>> 
>> On 26/02/2021 22:44, Yannis Haralambous wrote:
>>>>> Le 26 févr. 2021 à 23:37, Jonathan Kew <jfkthame at gmail.com <mailto:jfkthame at gmail.com>> a écrit :
>>>> 
>>>> On 26/02/2021 22:00, Yannis Haralambous wrote:
>>>>> dear TeX-hyphen members,
>>>>> I'm new to this list (although not necessarily new to TeX hyphenation :-)
>>>>> Here is the problem: we are preparing hyphenation patterns for Uyghur, written in Arabic script.
>>>>> As letters must be in initial/medial form before the hyphen and medial/final form on the next line begin,
>>>>> I was wondering if we could change TeX internals so that instead of one, three hyphenchars are used:
>>>>> ^^^^200d and `-' on the upper line and ^^^^200d on the lower line, in order to obtain the equivalent
>>>>> of \discretionary{^^^^200d-}{^^^^200d}{}
>>> Hi Jonathan,
>>>> The problem with this is that it wouldn't be the appropriate \discretionary in the case where the letter before the hyphenation position is a right- (rather than dual-) joining character.
>>> Sorry I don't understand what you mean. You mean when it is a biform character like the waw or the ra? In that case the ZWJ will do no harm. It is an invisible character that does not affect glyphs of biform characters.
>> 
>> Yes, the ZWJ before the hyphen on the first line would be harmless. But the ZWJ after the break (at the beginning of the second line) will cause the following character to take on a medial or final form, whereas it should remain initial or medial when it's after alef/dal/re/waw.
>> 
>> JK

 <http://www.imt-atlantique.fr/>	Yannis HARALAMBOUS
Professor
Computer Science Department
UMR CNRS 6285 Lab-STICC
 <http://perso.telecom-bretagne.eu/yannisharalambous/> <https://twitter.com/y_haralambous> <https://www.linkedin.com/in/yannis-haralambous-5529073?trk=hp-identity-name>Technopôle Brest-Iroise CS 83818
29238 Brest Cedex 3, France
Une école de l'IMT <http://www.imt.fr/>
The history of linguistics is largely a history of misreadings,
of failed communication between authors and readers,
exacerbated by the illusion that communication has successfully occurred.     (John E. Joseph)



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/tex-hyphen/attachments/20210227/76d505a1/attachment.html>


More information about the tex-hyphen mailing list.