[XeTeX] Uppercase in Armenian

Peter von Kaehne refdoc at gmx.net
Sun May 1 20:29:30 CEST 2022


Generally speaking - and I am speaking here very much as a novice on LaTeX and without any knowledge of Armenian one nearly always does  better to decompose and normalise all UTF8 texts , then work on them and then only recompose. And I think here this approach would have worked out fine . 

Peter

Sent from my phone. Please forgive misspellings and weird “corrections”

> On 1 May 2022, at 19:13, Zdenek Wagner <zdenek.wagner at gmail.com> wrote:
> 
> շնորհակալություն – thank you for confirmation. I believe that there
> are people who know how to fix it.
> 
> Zdeněk Wagner
> http://ttsm.icpf.cas.cz/team/wagner.shtml
> 
> ne 1. 5. 2022 v 19:06 odesílatel DALALYAN Arnak
> <Arnak.Dalalyan at ensae.fr> napsal:
>> 
>> Dear All,
>> 
>> I confirm that there are two correct uppercase versions of և, the reformed spelling is ԵՎ, whereas the classical spelling is ԵՒ.  Note that it has nothing to do with eastern or western Armenians, both versions of Armenian may use both versions of spelling. But the official language in Armenia is the eastern Armenian and the official spelling is the reformed one. Therefore, I believe a good way of operating for the uppercase command would be to output ԵՎ in the default regime, but to have an option "classical" for outputting  ԵՒ if
>> that option is activated.
>> 
>> Just to dive a bit deeper in this topic, it is true that և was originally a ligature but now it is a full letter in the reformed spelling.
>> 
>> The aim of my response was to confirm what was already more or less mentioned in Zdanek's messages. But I fear I can't help with fixing what latex is doing now.
>> 
>> Best regards,
>> Arnak
>> ________________________________________
>> From: Jonathan Kew [jfkthame at gmail.com]
>> Sent: Sunday, May 1, 2022 2:10 PM
>> To: XeTeX (Unicode-based TeX) discussion.; Zdenek Wagner
>> Cc: Serguei.Dachian at math.univ-bpclermont.fr; DALALYAN Arnak; vakopian at yahoo.com
>> Subject: Re: [XeTeX] Uppercase in Armenian
>> 
>> Hi Zdeněk,
>> 
>> Checking the Unicode character database[1], U+0587 is listed as having a
>> *compatibility* decomposition to <0565,0582> (not 0587):
>> 
>> 0587;ARMENIAN SMALL LIGATURE ECH YIWN;Ll;0;L;<compat> 0565 0582;;;;N;;;;;
>> 
>> Likewise, the SpecialCasing.txt file[2] that defines case mappings other
>> than simple 1:1 substitutions shows the same decomposition for the
>> uppercase form:
>> 
>> 0587; 0587; 0535 0582; 0535 0552; # ARMENIAN SMALL LIGATURE ECH YIWN
>> 
>> So if I understand correctly, what \text_uppercase:n is doing is simply
>> implementing what the Unicode standard defines.
>> 
>> If this isn't the appropriate behavior, at least for some locales, I
>> believe that will need custom programming at some level, but I don't
>> know enough about it to get into any details.
>> 
>> As for whether xelatex (or other engines) form a ligature from one (or
>> other) of the decomposed sequences, that would be entirely in the hands
>> of the font developer. I guess such ligatures are not implemented widely
>> (if at all).
>> 
>> JK
>> 
>> [1] https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
>> [2] https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt
>> 
>>> On 01/05/2022 12:50, Zdenek Wagner wrote:
>>> Hi David,
>>> 
>>> when trying to explain it in a greater detail I found that the situation
>>> is even more complex. As I wrote, I follow Elena Yerevan on youtube and
>>> facebook so all what I know, I know from her videos, from her name
>>> written in both alphabets, from Wikipedia and from
>>> https://omniglot.com/writing/armenian.htm
>>> <https://omniglot.com/writing/armenian.htm> which means that I know
>>> generally nothing. We need clarification from people who know Armenian
>>> (հայերէն and/or հայերեն), therefore I am sending Cc to Arthur and the
>>> authors of te ArmTeX project (hopefully at least one of the addresses
>>> still exists).
>>> 
>>> I will start with the typical use case. The title of a chapter in the
>>> book class is written in lowercase and displayed that way in the chapter
>>> title as well as in the table of contents but appears in uppercase in
>>> the running head. This is why it should work.
>>> 
>>> The case of ligatures is different. My fonts have not only ff, fl, and
>>> fi ligatures but even ffi and ffl. If I find a word "difficult" on a web
>>> using a serif font, I see the ffi ligature but the source shows that it
>>> has the individual characters f, f, i and the ligature was created by
>>> the shaping engine. If I copy it and paste into a text editor such as
>>> vim or notepad, I will get the three characters. If I use it as a TeX
>>> source and typeset it withComputer Modern or Latin Modern, I will get
>>> the ffi ligature and \uppercase will work. If I copy U+0587 from a web
>>> page and copy it to a text editor, I will get U+0587. I tried both
>>> U+0565 U+0582 (եւ) and U+0565 U+057E (եվ) but non of them form the
>>> U+0587 (և) ligature in XeLaTeX. I did not understand why the ligature is
>>> considered ECH and YIWN but it seems that it is more historical and
>>> bound to the shape. If I understand it well, sun is pronounced in
>>> Armenian as "arew" but արև (U+0561 U+0580 U+0587) is the Eastern
>>> spelling but արեւ (U+0561 U+0580 U+0565 U+0582) is the classical
>>> spelling (as given in Wictionary) and probably also in the Western
>>> variant. As you can see on Omniglot, the Armenian names of
>>> Eastern/Western Armenian start with "arew" with these two spellings.
>>> Even "hayeren" (Armenian) has different spelling in the Eastern/Western
>>> variants (I have included both at the beginning of this mail). Having
>>> found the informatin on variants I saw that polyglossia supports
>>> variant=western. I tried to specify variant=eastern but it did not help.
>>> If you look at ot6enc.def, it defines uppercase variants at the end of
>>> the file where the uppercase version of \armew is \Arm at yechvev which is
>>> \Armyech\Armvev. I cannot try because I do not know the transliteration
>>> but just from the names of the characters it seems to me that it works
>>> correctly while \text_uppercase:n does not. It should know that U+0587
>>> shiould be decomposed to U+0565 U+057E (not U+0582) and then uppercase
>>> it to U+0535 U+054E (not U+0552), at least for the Eastern variant. I am
>>> not sure whether there are other issues and where exactly to fix it.
>>> 
>>> Zdeněk Wagner
>>> http://ttsm.icpf.cas.cz/team/wagner.shtml
>>> <http://ttsm.icpf.cas.cz/team/wagner.shtml>
>>> 
> 




More information about the XeTeX mailing list.