[tex-hyphen] Hyphenation of Uyghur

Yannis Haralambous yannis1962 at gmail.com
Fri Feb 26 23:44:16 CET 2021



> Le 26 févr. 2021 à 23:37, Jonathan Kew <jfkthame at gmail.com> a écrit :
> 
> On 26/02/2021 22:00, Yannis Haralambous wrote:
>> dear TeX-hyphen members,
>> I'm new to this list (although not necessarily new to TeX hyphenation :-)
>> Here is the problem: we are preparing hyphenation patterns for Uyghur, written in Arabic script.
>> As letters must be in initial/medial form before the hyphen and medial/final form on the next line begin,
>> I was wondering if we could change TeX internals so that instead of one, three hyphenchars are used:
>> ^^^^200d and `-' on the upper line and ^^^^200d on the lower line, in order to obtain the equivalent
>> of \discretionary{^^^^200d-}{^^^^200d}{}

Hi Jonathan,

> The problem with this is that it wouldn't be the appropriate \discretionary in the case where the letter before the hyphenation position is a right- (rather than dual-) joining character.

Sorry I don't understand what you mean. You mean when it is a biform character like the waw or the ra? In that case the ZWJ will do no harm. It is an invisible character that does not affect glyphs of biform characters.

> So it's not sufficient to just have an extended form of \hyphenchar; we would also need hyphenation patterns to record two different types of break position: one for a break between joined letters, and one for a break between non-joined letters.

Not at all. In Arabic-script Uyghur you have only one rule: the glyphs of quadriform characters have to take initial/medial form before the hyphen and
medial/final on the next line. When they are biform they remain as they are, and the ZWJ doesn't change them at all.

> Or else the engine needs to know (perhaps from the Unicode properties of the adjacent characters) which form to use -- but if we accept that the engine can use knowledge of specific Unicode properties here, then it can take responsibility for inserting the ZWJs internally, without needing to change \hyphenchar.

It is precisely to avoid these complications that I'm proposing to use ZWJ: the normal behavior of ZWJ is to change preceding quadriform isolated into initial and final into medial, and following isolated into final and initial into medial. If we can introduce it into the character string then the rendering engine will do the right thing.

Am I wrong to think so?

Yannis

> 
> (On re-reading, perhaps that's more like what you meant anyway?)
> 
> JK
> 
>> Arthur said he would have a different solution.
>> I would personally play with the DVI (resp. XDV) file, even though the widths of initial/medial forms
>> are quite different from those of final/isolated forms, which would require a global redistribution of
>> space in the line.
>> Cheers,
>> Yannis

 <http://www.imt-atlantique.fr/>	Yannis HARALAMBOUS
Professor
Computer Science Department
UMR CNRS 6285 Lab-STICC
 <http://perso.telecom-bretagne.eu/yannisharalambous/> <https://twitter.com/y_haralambous> <https://www.linkedin.com/in/yannis-haralambous-5529073?trk=hp-identity-name>Technopôle Brest-Iroise CS 83818
29238 Brest Cedex 3, France
Une école de l'IMT <http://www.imt.fr/>
Le tact dans l'audace, c'est de savoir jusqu'où on peut aller trop loin.     (Jean Cocteau)



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/tex-hyphen/attachments/20210226/a0f930f6/attachment.html>


More information about the tex-hyphen mailing list.