[XeTeX] New feature REQUEST for xetex

ShreeDevi Kumar shreeshrii at gmail.com
Tue Feb 23 04:58:46 CET 2016


>> the problem is caused just by a few characters, especially the short
i-matra. It might be more difficult in other Indic scripts containing
two-part vowels.

It is more extensive and applies to all/most glyphs used for conjuncts in
addition to the short i-matra. It also applies to other Indic scripts as
well as other complex scripts.

Example below shows how the conjuncts get copied and displayed as square
boxes. It is also font dependent.

नमऽे ुगेदूसाजु ंगाराः क ु ुराः वाः । अनािरराः ससाः िशवाा
भजािधपाीक ु ृताा भवि ॥ १॥

>> It might be useful to use ActualText only for selected words.

That might work for a predominantly English text with some devanagari, but
not for full devanagari texts.

>> It is not only the problem of copy&paste, you will not be able to use
the search dialog in Acrobat. For instance, you will not be able to find
किताब.

Yes, you are right. Search does not work for unicode fonts for complex
scripts in the current pdfs.

Hence the request ...

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
Hi all,

the problem is caused just by a few characters, especially the short
i-matra. It might be more difficult in other Indic scripts containing
two-part vowels. The reason is that visually they appear in a different
order than they should appear in Unicode representation. It can be solved
by using ActualText. If all words were entered this way, the size of the
PDF will double. It might be useful to use ActualText only for selected
words.

It is not only the problem of copy&paste, you will not be able to use the
search dialog in Acrobat. For instance, you will not be able to find किताब.



Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz

2016-02-22 14:38 GMT+01:00 ShreeDevi Kumar <shreeshrii at gmail.com>:

> Hi Jonathan,
>
> I am using xetex/xelatex for typesetting of devanagari texts.
> eg. http://sanskritdocuments.org/doc_devii/gangAShTakamkAlidAsa.pdf
> http://sanskritdocuments.org/doc_devii/gangAShTakamkAlidAsa.html?lang=sa
> (HTML TEXT version of the same)
>
> However, when the devanagri text is copied from the pdf, it does not
> display correctly - which is the case with complex scripts with most pdf
> creators (as far as I know).
>
> eg.
> ॥ गङ्गाष्टकं कालिदासकृतम् ॥
> is displayed as
> ॥ गाकं कािलदासकृतम ॥
>
> Is it possible to add a feature to xetex to support search and copy of
> complex script text in scripts such as devanagari?
>
> It would really be great to have this ​​​​"coming soon to a XeTeX near
> you"....... :-)
>
> Thanks.
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
>
> On Thu, Feb 18, 2016 at 4:28 PM,
> ​​
> Jonathan Kew <jfkthame at gmail.com> wrote:
>
>> This is a pretty specialized feature, likely to be interest only to a
>> small minority of users. But for those it concerns, here's something that
>> is
>> ​​
>> "coming soon to a XeTeX near you".......
>>
>>
>> I've recently implemented a new feature, controlled by the integer
>> parameter \XeTeXinterwordspaceshaping. This will be available in the TL'16
>> release, if all goes well.
>>
>> This feature is relevant only when using OpenType/Graphite/AAT fonts, not
>> legacy .tfm-based fonts.
>>
>> When \XeTeXinterwordspaceshaping is greater than 0, XeTeX will attempt to
>> support fonts where the width of inter-word spaces may vary contextually,
>> depending on the preceding and following text. This is needed by fonts such
>> as SIL's Awami Nastaliq (in development) where words are expected to kern
>> together across spaces.
>>
>> The default behavior of xetex is to measure each word in isolation, and
>> simply string together a sequence of such word and space (glue) nodes to
>> form the horizontal list that is then line-broken to form a paragraph.
>> Normally, when inter-word spaces do not depend on the adjacent words, this
>> works fine; but in Awami the width of inter-word spaces may vary
>> drastically, even becoming negative in some cases.
>>
>> Setting \XeTeXinterwordspaceshaping=1 tells xetex to measure such spaces
>> "in context" and take account of the contextually-modified widths during
>> line breaking. This greatly improves the typeset result with such a font.
>> Each word is still shaped and rendered individually, but line-breaking and
>> word spacing respects the inter-word kerning.
>>
>> A further complication occurs when not only the width of the space but
>> also the glyphs of the adjacent words themselves may be subject to
>> contextual changes. An example of this would be a font that has OpenType
>> ligature rules that apply to multiple-word sequences; e.g. a symbol font
>> that ligates the text "credit card" to render a credit-card icon. Another
>> example is the word-final swash forms in Hoefler Italic, which are intended
>> to be used at end-of-line but NOT before word spaces within the line.
>>
>> These cases are addressed with \XeTeXinterwordspaceshaping=2. With this
>> value, not only are inter-word spaces measured in context, but also each
>> run of text (words and intervening spaces) in a single font will be
>> re-shaped as a unit at \shipout time. This allows full shaping (contextual
>> swashes, ligatures, etc) to take effect across inter-word spaces.
>>
>> Currently, this feature is implemented only in the "contextual-space"
>> branch of the code at sourceforge; anyone interested in testing it will
>> need to check out and build the code from there. After some time, if no
>> major problems show up, I expect to merge it to the master branch, and then
>> to the TeXLive source tree.
>>
>> Feedback welcome..........
>>
>> JK
>>
>>
>>
>> --------------------------------------------------
>> Subscriptions, Archive, and List information, etc.:
>>  http://tug.org/mailman/listinfo/xetex
>>
>
>
>
>
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
>   http://tug.org/mailman/listinfo/xetex
>
>



--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/xetex/attachments/20160223/44326be0/attachment.html>


More information about the XeTeX mailing list