[XeTeX] how to customize sorting order of characters in index

Zdenek Wagner zdenek.wagner at gmail.com
Sun May 20 14:54:24 CEST 2018


Hi,

I knew that I already had something in my computer. Many years ago
someone sent me some files for Arabic/Urdu/Farsi and I made some tests
and modifications. Looking at the old Urdu files (saved as old-*) I
see that letter ث was missing (I do not know the names of the letters,
I know only some of them). Hamza waz not present as a letter, it was
present as maza above waw, yeh, and baree yeh. This means that for
instance ماء would not be sorted correctly. He goal and dochashmee he
were defined as a group. This means that dochashmee he would never be
used as a heading but بھارت would precede بہانا and پھل would precede
پہلو which is probably wrong. If it is correct, I will revert it.
Similarly, yeh and baree yeh were defined as a group which is probably
harmless, the index should contain چھہٹا but not چھہٹے and چھہٹی
(different grammatical forms). I have split them as well as ن and ں
which is most probably harmless because the dotless form is present
only at the end of a word. But without splitting میں would precede
مینار and I do not know whether it is correct. Please, test the
attached xindy module. Even with XeTeX it should be invoked as

texindy -I omega -L urdu

Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz


2018-05-19 0:13 GMT+02:00 Kamal Abdali <k.abdali at acm.org>:
>
>
> On Fri, May 18, 2018 at 4:38 AM, Zdenek Wagner <zdenek.wagner at gmail.com>
> wrote:
>>
>> ...
>> Being engaged in thermodynamics it is nice to know how entropy is said
>> in Urdu. My former boss has a large collection of "vapour-liquid
>> equilibrium" translated into many languages. How is it in Urdu?
>
> ...
>
> (Taking a brief thermodynamical break on this group :-) ), the equivalent
> Urdu term would be
> بخارمائع تعادل
> (pronounced "bukẖār-māʾiʿ taʿādul").
>
>> ...There are many other problems. The Persion module is a good
>> startbecause Persian also makes use of a few non-Arabic letters
>> ...
>>
>> ...
>>
>> Zdeněk Wagner
>> http://ttsm.icpf.cas.cz/team/wagner.shtml
>> http://icebearsoft.euweb.cz
>>
> I have several comments about what your suggestions. Let me write them
> carefully and send those to you a bit later.
>
> Thanks,
> Kamal
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: urdu.zip
Type: application/x-zip-compressed
Size: 11291 bytes
Desc: not available
URL: <http://tug.org/pipermail/xetex/attachments/20180520/6db3449b/attachment.bin>


More information about the XeTeX mailing list