[XeTeX] how to customize sorting order of characters in index

Fri May 18 10:38:58 CEST 2018

2018-05-18 3:15 GMT+02:00 Kamal Abdali <k.abdali at acm.org>:
> Thanks, Dominik. I had seen your
> https://cikitsa.blogspot.ca/2016/07/getting-xindy-to-work-for-iast-encoded.html
> before during a Web searching. But a member of this group has kindly
> promised to write a xindy piece for Urdu. So instead of trying to do the
> same also on my own, I am trying to still make imakeidx/makeindex work. As
> far as I understand, Makeindex sorts non-Latin characters in the order of
> their Unicode numerical code. The following approach, awful and artificial
> though it is, seems to sort Urdu index entries correctly, essentially by
> using the makeindex command \index{string1 at string2} where string2 is the
> actual Urdu index entry and string1 is its representation described below:
>
> Urdu letters
> appear in the Arabic script Unicode block 0600-06FF. For about 12 letters,
> the utf8 numerical order differs from their Urdu alphabetic order. For
> example, in the Urdu alphabet the letter "peh" comes between the letters
> "beh" and "teh", but in the Unicode table the utf8 of "peh" is larger than
> that of "teh". So what I am doing is to represent "peh" by two characters,
> "beh" followed by another one with utf8 larger than of all Urdu letters.
> Actually, for these extra characters I use the so-called "Eastern Arabic
> Digits" 0,1,2, ... which I can type on my keyboard and don't need otherwise.
> Also if two or more Urdu letters come alphabetically between two adjacent
> utf-8 positions, then for representing them this way with properly ordered
> representations, digits serve as convenient and easily rememberable extra
> characters.
>
> Here, for example, is an \index command to get the Urdu word "ناکارگی"
> (pronounced as nākāragī" and meaning "entropy" in English) show up in its
> correct position in index:

Being engaged in thermodynamics it is nice to know how entropy is said
in Urdu. My former boss has a large collection of "vapour-liquid
equilibrium" translated into many languages. How is it in Urdu?

>
> \index{
> ناق۰ارق۵ی
> @
> ناکارگی
> {
>
> Doing such extra work for many entries in the index is too cumbersome. But
> at least the index is prepared correctly. And my priority for the moment is
> to get something (in which the index is the only unfinished part) finished
> and delivered, before worrying about elegant solutions! Thanks to everyone
> who offered suggestions in response to my inquiries.
>
There are many other problems. The Persion module is a good start
because Persian also makes use of a few non-Arabic letters as peh,
tchech, gaf. Urdu/Persian yeh has a different Unicode codepoint
because the isolated and final forms are dotless, so they are visually
indistiguishable from Arabic aleph maqsura. This is not a big problem
because together with baree yeh the Unicode codepoints are largest so
they will be put at the end of the alphabet. However, Persian/Urdu kaf
has a different codepoint than the Arabic kaf, so the words will be on
a wrong place. Arabic heh is not the same as Urdu Heh goal and
dochashmee heh (I do not know what is used in Persian, whether
aspirated consonants are also used). Another problem is caused by
retroflex consonants, so there are too many letters that are
numerically out of order. On the contrary, the Devanagari block
follows numerically the sorting order in Sanskrit so the situation is
quite different.

> Kamal Abdali

Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz

>
> On Wed, May 16, 2018 at 3:27 PM, Dominik Wujastyk <wujastyk at gmail.com>
> wrote:
>>
>> A while back, I had a similar issue, and I made some notes about how I got
>> the sorting I needed.  See here:
>>
>>
>> https://cikitsa.blogspot.ca/2016/07/getting-xindy-to-work-for-iast-encoded.html
>>
>>
>> --
>> Professor Dominik Wujastyk
>> ,
>>
>> Singhmar Chair in Classical Indian Society and Polity
>> ,
>>
>> Department of History and Classics
>> ,
>> University of Alberta, Canada
>> .
>>
>> South Asia at the U of A:
>>
>> sas.ualberta.ca
>>
>>
>> On 13 May 2018 at 17:04, Kamal Abdali <k.abdali at acm.org> wrote:
>>>
>>> I'm using polyglossia and imakeindex to produce an Urdu document. The
>>> sorting order of letters in the index is wrong. The order is according to
>>> Arabic, but Urdu has about 9 more letters which are being pushed after the
>>> Arabic letters in the index. I couldn't find an option to fix this in the
>>> imakeindex style file. It probably should be specified in polyglossia but I
>>> don't see a way there either. Any suggestions please?
>>>
>>> Kamal Abdali
>>>
>>>
>>>
>>> --------------------------------------------------
>>> Subscriptions, Archive, and List information, etc.:
>>>   http://tug.org/mailman/listinfo/xetex
>>>
>>
>>
>>
>>
>> --------------------------------------------------
>> Subscriptions, Archive, and List information, etc.:
>>   http://tug.org/mailman/listinfo/xetex
>>
>