[XeTeX] Uppercase in Armenian

Zdenek Wagner zdenek.wagner at gmail.com
Sun May 1 13:50:48 CEST 2022


Hi David,

when trying to explain it in a greater detail I found that the situation is
even more complex. As I wrote, I follow Elena Yerevan on youtube and
facebook so all what I know, I know from her videos, from her name written
in both alphabets, from Wikipedia and from
https://omniglot.com/writing/armenian.htm which means that I know generally
nothing. We need clarification from people who know Armenian (հայերէն
and/or հայերեն), therefore I am sending Cc to Arthur and the authors of te
ArmTeX project (hopefully at least one of the addresses still exists).

I will start with the typical use case. The title of a chapter in the book
class is written in lowercase and displayed that way in the chapter title
as well as in the table of contents but appears in uppercase in the running
head. This is why it should work.

The case of ligatures is different. My fonts have not only ff, fl, and fi
ligatures but even ffi and ffl. If I find a word "difficult" on a web using
a serif font, I see the ffi ligature but the source shows that it has the
individual characters f, f, i and the ligature was created by the shaping
engine. If I copy it and paste into a text editor such as vim or notepad, I
will get the three characters. If I use it as a TeX source and typeset it
withComputer Modern or Latin Modern, I will get the ffi ligature and
\uppercase will work. If I copy U+0587 from a web page and copy it to a
text editor, I will get U+0587. I tried both U+0565 U+0582 (եւ) and U+0565
U+057E (եվ) but non of them form the U+0587 (և) ligature in XeLaTeX. I did
not understand why the ligature is considered ECH and YIWN but it seems
that it is more historical and bound to the shape. If I understand it well,
sun is pronounced in Armenian as "arew" but արև (U+0561 U+0580 U+0587) is
the Eastern spelling but արեւ (U+0561 U+0580 U+0565 U+0582) is the
classical spelling (as given in Wictionary) and probably also in the
Western variant. As you can see on Omniglot, the Armenian names of
Eastern/Western Armenian start with "arew" with these two spellings. Even
"hayeren" (Armenian) has different spelling in the Eastern/Western variants
(I have included both at the beginning of this mail). Having found the
informatin on variants I saw that polyglossia supports variant=western. I
tried to specify variant=eastern but it did not help. If you look at
ot6enc.def, it defines uppercase variants at the end of the file where the
uppercase version of \armew is \Arm at yechvev which is \Armyech\Armvev. I
cannot try because I do not know the transliteration but just from the
names of the characters it seems to me that it works correctly while
\text_uppercase:n does not. It should know that U+0587 shiould be
decomposed to U+0565 U+057E (not U+0582) and then uppercase it to U+0535
U+054E (not U+0552), at least for the Eastern variant. I am not sure
whether there are other issues and where exactly to fix it.

Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml


ne 1. 5. 2022 v 8:08 odesílatel David Carlisle <d.p.carlisle at gmail.com>
napsal:

> the input uses a ligature character which has no corresponding uppercase,
> you need to decompose the ligature before uppercasing, which \uppercase
> can't do, you see the same in Latin script ff ligature
>
> \uppercase{diff diff}
>
> looks like DIFF DIff as the second one uses U+FB00 which has no uppercase.
>
>  U+0587 ARMENIAN SMALL LIGATURE ECH YIWN
>
> would be better input as  U+0565 U+0582
>
> David
>
>
>
>
> On Sun, 1 May 2022 at 00:09, Zdenek Wagner <zdenek.wagner at gmail.com>
> wrote:
>
>> Yes, it looks better but the uppercase version should contain ԵՎ, not ԵՒ.
>> Վ is capital vew (U+54E) while Ւ is capital yiwn (U+552).
>>
>> Zdeněk Wagner
>> http://ttsm.icpf.cas.cz/team/wagner.shtml
>>
>>
>> ne 1. 5. 2022 v 0:53 odesílatel David Carlisle <d.p.carlisle at gmail.com>
>> napsal:
>>
>>> Something like this, I think.
>>>
>>> [image: image.png]
>>>
>>> \documentclass{article}
>>> \usepackage{polyglossia}
>>>
>>> \setdefaultlanguage{armenian}
>>> \setmainfont{DejaVu Sans}
>>> \ExplSyntaxOn
>>> \let\tuppercase\text_uppercase:n
>>> \ExplSyntaxOff
>>> \pagestyle{empty}
>>> \begin{document}
>>> Երևան $\rightarrow$ \uppercase{Երևան}
>>>
>>> Երևան $\rightarrow$ \tuppercase{Երևան}
>>>
>>> \end{document}
>>>
>>> David
>>>
>>>
>>> On Sat, 30 Apr 2022 at 22:15, Zdenek Wagner <zdenek.wagner at gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> first I should mention that I do not know Armenian at all and can just
>>>> recognize a few characters. Anyway I came across a problem which
>>>> probably cannot be solved by the standard \lccode / \uccode method.
>>>> What I mean is Yerevan which is written as Երևան but YEREVAN (all
>>>> caps) is ԵՐԵՎԱՆ because և has no uppercase variant and must be
>>>> replaced by two characters ԵՎ. At least, it is visible in Elena
>>>> Yerevan's songs shot in the city of Yerevan. The following file
>>>>
>>>> \documentclass{article}
>>>> \usepackage{polyglossia}
>>>> \setdefaultlanguage{armenian}
>>>> \setmainfont{DejaVu Sans}
>>>> \pagestyle{empty}
>>>> \begin{document}
>>>> Երևան $\rightarrow$ \uppercase{Երևան}
>>>> \end{document}
>>>>
>>>> shows that all characters are converted correctly to uppercase but և
>>>> remains in lovercase. Is there a solution?
>>>>
>>>> Zdeněk Wagner
>>>> http://ttsm.icpf.cas.cz/team/wagner.shtml
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/xetex/attachments/20220501/d721cdf7/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 21403 bytes
Desc: not available
URL: <https://tug.org/pipermail/xetex/attachments/20220501/d721cdf7/attachment-0001.png>


More information about the XeTeX mailing list.