[XeTeX] Uppercase in Armenian

Jonathan Kew jfkthame at gmail.com
Sun May 1 14:10:50 CEST 2022


Hi Zdeněk,

Checking the Unicode character database[1], U+0587 is listed as having a 
*compatibility* decomposition to <0565,0582> (not 0587):

0587;ARMENIAN SMALL LIGATURE ECH YIWN;Ll;0;L;<compat> 0565 0582;;;;N;;;;;

Likewise, the SpecialCasing.txt file[2] that defines case mappings other 
than simple 1:1 substitutions shows the same decomposition for the 
uppercase form:

0587; 0587; 0535 0582; 0535 0552; # ARMENIAN SMALL LIGATURE ECH YIWN

So if I understand correctly, what \text_uppercase:n is doing is simply 
implementing what the Unicode standard defines.

If this isn't the appropriate behavior, at least for some locales, I 
believe that will need custom programming at some level, but I don't 
know enough about it to get into any details.

As for whether xelatex (or other engines) form a ligature from one (or 
other) of the decomposed sequences, that would be entirely in the hands 
of the font developer. I guess such ligatures are not implemented widely 
(if at all).

JK

[1] https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
[2] https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt

On 01/05/2022 12:50, Zdenek Wagner wrote:
> Hi David,
> 
> when trying to explain it in a greater detail I found that the situation 
> is even more complex. As I wrote, I follow Elena Yerevan on youtube and 
> facebook so all what I know, I know from her videos, from her name 
> written in both alphabets, from Wikipedia and from 
> https://omniglot.com/writing/armenian.htm 
> <https://omniglot.com/writing/armenian.htm> which means that I know 
> generally nothing. We need clarification from people who know Armenian 
> (հայերէն and/or հայերեն), therefore I am sending Cc to Arthur and the 
> authors of te ArmTeX project (hopefully at least one of the addresses 
> still exists).
> 
> I will start with the typical use case. The title of a chapter in the 
> book class is written in lowercase and displayed that way in the chapter 
> title as well as in the table of contents but appears in uppercase in 
> the running head. This is why it should work.
> 
> The case of ligatures is different. My fonts have not only ff, fl, and 
> fi ligatures but even ffi and ffl. If I find a word "difficult" on a web 
> using a serif font, I see the ffi ligature but the source shows that it 
> has the individual characters f, f, i and the ligature was created by 
> the shaping engine. If I copy it and paste into a text editor such as 
> vim or notepad, I will get the three characters. If I use it as a TeX 
> source and typeset it withComputer Modern or Latin Modern, I will get 
> the ffi ligature and \uppercase will work. If I copy U+0587 from a web 
> page and copy it to a text editor, I will get U+0587. I tried both 
> U+0565 U+0582 (եւ) and U+0565 U+057E (եվ) but non of them form the 
> U+0587 (և) ligature in XeLaTeX. I did not understand why the ligature is 
> considered ECH and YIWN but it seems that it is more historical and 
> bound to the shape. If I understand it well, sun is pronounced in 
> Armenian as "arew" but արև (U+0561 U+0580 U+0587) is the Eastern 
> spelling but արեւ (U+0561 U+0580 U+0565 U+0582) is the classical 
> spelling (as given in Wictionary) and probably also in the Western 
> variant. As you can see on Omniglot, the Armenian names of 
> Eastern/Western Armenian start with "arew" with these two spellings. 
> Even "hayeren" (Armenian) has different spelling in the Eastern/Western 
> variants (I have included both at the beginning of this mail). Having 
> found the informatin on variants I saw that polyglossia supports 
> variant=western. I tried to specify variant=eastern but it did not help. 
> If you look at ot6enc.def, it defines uppercase variants at the end of 
> the file where the uppercase version of \armew is \Arm at yechvev which is 
> \Armyech\Armvev. I cannot try because I do not know the transliteration 
> but just from the names of the characters it seems to me that it works 
> correctly while \text_uppercase:n does not. It should know that U+0587 
> shiould be decomposed to U+0565 U+057E (not U+0582) and then uppercase 
> it to U+0535 U+054E (not U+0552), at least for the Eastern variant. I am 
> not sure whether there are other issues and where exactly to fix it.
> 
> Zdeněk Wagner
> http://ttsm.icpf.cas.cz/team/wagner.shtml 
> <http://ttsm.icpf.cas.cz/team/wagner.shtml>
> 


More information about the XeTeX mailing list.