[XeTeX] New feature planned for xetex

Zdenek Wagner zdenek.wagner at gmail.com
Fri Feb 19 16:55:29 CET 2016

2016-02-19 16:26 GMT+01:00 Kamal Abdali <k.abdali at acm.org>:

> Hi Zdeněk,
> Kudos! You figured one out correctly, and got only close on the second one
> because I gave you the wrong clue! Sorry. The two word-separated parsings
> of the second text are:
> جمالو   ہار   گیا۔
جما   لوہار   گیا۔

Thank you, now it is clear.

> meaning: " Jamaloo was defeated" and "The ironsmith Jumma has left". While
> we are on fun and games, it's worth mentioning an embarrassment related to
> the same Nastaleeq ambiguity that a Pakistani TV channel suffered
> recently.  An announcer pronounced the Urdu transliteration of the English
> phrase "motor vehicle" as "MoTroo Haikal", most likely thinking of it as a
> proper noun.

Yes, it is the problem of و and its four types of pronunciation. I know
another funny transliteration from Czech. Under socialism we had Socialist
Youth Union. Union is "svaz" in Czech and the member of this union was
usually called "svazák" (the first vowel is short, the second vowel is
long). And if you transliterate "svazák" into Urdu without vocalization,
you get سوزاک

> Kamal Abdali

Zdeněk Wagner

> On Fri, Feb 19, 2016 at 5:18 AM, Zdenek Wagner <zdenek.wagner at gmail.com>
> wrote:
>> 2016-02-19 4:25 GMT+01:00 Kamal Abdali <k.abdali at acm.org>:
>>> On Thu, Feb 18, 2016 at 7:38 PM, Zdenek Wagner <zdenek.wagner at gmail.com>
>>> wrote:
>>>> I have compared both and personally I like Jonathan's version. Of
>>>> course, I am not an expert. I do not have any collection of high quality
>>>> Urdu documents. I have only seen Mirza Ghalib's manuscript in his museum in
>>>> New Delhi and some Urdu documents in the museum in LaL Qila. My knowledge
>>>> of Urdu is very weak. Spoken Urdu is basically the same language as Hindi
>>>> so that I can listen to BBC Urdu and understand almost everything but
>>>> reading is difficult for me and I know nothing about calligraphy. It will
>>>> take me hours to read the sample text, I can only recognize from the title
>>>> that it is the Universal Declaration of Human Rights. Anyway, the larger
>>>> interword spaces do not help me toread the text.
>>>> As an example I am attaching the text from the Jama Masjid in New
>>>> Delhi. Look at the beginning of the first line. There is a considerable
>>>> space between آ and پ although آپ is a single word. The interword space
>>>> between آپ and جامع is smalle that the space in the middle of جامع and
>>>> there is almost no space between جامع and مسجد. There is no space between
>>>> پر and زیارت but I still can see the words. In the third line the largest
>>>> space is in the middle of پرکشش. Of course, it helped me to see the same
>>>> text in Devanagari, I would probably be unable to read the Urdu text
>>>> without it.
>>> ​Zdeněk,
>>> If each word in Urdu (or in any language written using Arabic
>>> characters) formed a connected figure, then any amount of interword space
>>> (including zero) would be OK. But since some letters connect with the next
>>> letter and some do not, words often consist of two or more separate
>>> figures. Having interword spaces then helps to delimit each word. Stringing
>>> words together without any space between them is an incessant source of
>>> ambiguities and problems. That's why all scripts for the Arabic alphabet
>>> other than Nastaleeq now use interword spaces. This forum is not a place to
>>> go into more details, so I'll just give you two examples in the form of
>>> entertaining puzzles. Without interword spaces, you can read a certain Urdu
>>> text (word string) as:
>>> EITHER "He is eighty-four years old."
>>> OR "That thief is eighty years old."
>>> Another one can be read
>>> EITHER "Jamaloo was defeated."
>>> OR "Jumma went to Lahore."
>>> (Jamaloo and Jumma are both common nicknames.) New learners are
>>> constantly frustrated because the printed shapes in front of them provide
>>> no visual help in separating the words. Basically, the script assumes that
>>> you already know what you're trying to learn by reading!
>>> Again, I am not calling for a ban on tight kerning, but I am asking
>>> Jonathan to be flexible about interword spaces for anyone who wants it. At
>>> present most Urdu word processors make it very difficult to overcome
>>> interword space suppression in Nastaleeq fonts.
>>> Kamal Abdali
>> Hi Kamal,
>> thank you for examples, I see the problem of چوراسی and چور اسی without
>> and with the interword space. The spaces will be needed especially in
>> textbooks of Urdu and in dictionaries.
>> Could you, please, send me the second example in Urdu? It is interesting
>> to me. I can guess that the second sentece ends with حلاحور گیا  and by
>> similarity with Hindi I could imagine verb حارنا but then the first
>> sentence would end with حار گیا
>> The ending is thus different (حار versus حور) but as I wrote, I may be
>> mistaken.
>> I hope the first example in full is:
>> وہ چوراسی سال کا ہے،
>> وہ چور اسی سال کا ہے۔
>> Zdeněk Wagner
>> http://ttsm.icpf.cas.cz/team/wagner.shtml
>> http://icebearsoft.euweb.cz
>>> --------------------------------------------------
>>> Subscriptions, Archive, and List information, etc.:
>>>   http://tug.org/mailman/listinfo/xetex
>> --------------------------------------------------
>> Subscriptions, Archive, and List information, etc.:
>>   http://tug.org/mailman/listinfo/xetex
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
>   http://tug.org/mailman/listinfo/xetex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/xetex/attachments/20160219/a65217d9/attachment.html>

More information about the XeTeX mailing list