[XeTeX] New feature REQUEST for xetex

Andrew Cunningham lang.support at gmail.com
Tue Feb 23 03:54:44 CET 2016

It would probably more than double, i was under the impression that
ActualText was a tag attrubute, so extensive tagging would be needed, and
actual text added to the tags.

But the question is how to practically make use of ActualText if there is a
visible text layer.

PDF/UA for instance leaves the question deliberately ambigious. ActualText
is the way to make the content accessible, but developers creating tools
for PDF do not actually have to process the ActualText.

So to index and search PDF files you need to build a discovery system
utilising tools that allow you to specify the use of ActualText in
preference to a visible text layer.

On 23 Feb 2016 12:52 am, "Zdenek Wagner" <zdenek.wagner at gmail.com> wrote:

> Hi all,
> the problem is caused just by a few characters, especially the short
> i-matra. It might be more difficult in other Indic scripts containing
> two-part vowels. The reason is that visually they appear in a different
> order than they should appear in Unicode representation. It can be solved
> by using ActualText. If all words were entered this way, the size of the
> PDF will double. It might be useful to use ActualText only for selected
> words.
> It is not only the problem of copy&paste, you will not be able to use the
> search dialog in Acrobat. For instance, you will not be able to find किताब.
> Zdeněk Wagner
> http://ttsm.icpf.cas.cz/team/wagner.shtml
> http://icebearsoft.euweb.cz
> 2016-02-22 14:38 GMT+01:00 ShreeDevi Kumar <shreeshrii at gmail.com>:
>> Hi Jonathan,
>> I am using xetex/xelatex for typesetting of devanagari texts.
>> eg. http://sanskritdocuments.org/doc_devii/gangAShTakamkAlidAsa.pdf
>> http://sanskritdocuments.org/doc_devii/gangAShTakamkAlidAsa.html?lang=sa
>> (HTML TEXT version of the same)
>> However, when the devanagri text is copied from the pdf, it does not
>> display correctly - which is the case with complex scripts with most pdf
>> creators (as far as I know).
>> eg.
>> ॥ गङ्गाष्टकं कालिदासकृतम् ॥
>> is displayed as
>> ॥ गाकं कािलदासकृतम ॥
>> Is it possible to add a feature to xetex to support search and copy of
>> complex script text in scripts such as devanagari?
>> It would really be great to have this ​​​​"coming soon to a XeTeX near
>> you"....... :-)
>> Thanks.
>> ShreeDevi
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>> On Thu, Feb 18, 2016 at 4:28 PM,
>> ​​
>> Jonathan Kew <jfkthame at gmail.com> wrote:
>>> This is a pretty specialized feature, likely to be interest only to a
>>> small minority of users. But for those it concerns, here's something that
>>> is
>>> ​​
>>> "coming soon to a XeTeX near you".......
>>> I've recently implemented a new feature, controlled by the integer
>>> parameter \XeTeXinterwordspaceshaping. This will be available in the TL'16
>>> release, if all goes well.
>>> This feature is relevant only when using OpenType/Graphite/AAT fonts,
>>> not legacy .tfm-based fonts.
>>> When \XeTeXinterwordspaceshaping is greater than 0, XeTeX will attempt
>>> to support fonts where the width of inter-word spaces may vary
>>> contextually, depending on the preceding and following text. This is needed
>>> by fonts such as SIL's Awami Nastaliq (in development) where words are
>>> expected to kern together across spaces.
>>> The default behavior of xetex is to measure each word in isolation, and
>>> simply string together a sequence of such word and space (glue) nodes to
>>> form the horizontal list that is then line-broken to form a paragraph.
>>> Normally, when inter-word spaces do not depend on the adjacent words, this
>>> works fine; but in Awami the width of inter-word spaces may vary
>>> drastically, even becoming negative in some cases.
>>> Setting \XeTeXinterwordspaceshaping=1 tells xetex to measure such spaces
>>> "in context" and take account of the contextually-modified widths during
>>> line breaking. This greatly improves the typeset result with such a font.
>>> Each word is still shaped and rendered individually, but line-breaking and
>>> word spacing respects the inter-word kerning.
>>> A further complication occurs when not only the width of the space but
>>> also the glyphs of the adjacent words themselves may be subject to
>>> contextual changes. An example of this would be a font that has OpenType
>>> ligature rules that apply to multiple-word sequences; e.g. a symbol font
>>> that ligates the text "credit card" to render a credit-card icon. Another
>>> example is the word-final swash forms in Hoefler Italic, which are intended
>>> to be used at end-of-line but NOT before word spaces within the line.
>>> These cases are addressed with \XeTeXinterwordspaceshaping=2. With this
>>> value, not only are inter-word spaces measured in context, but also each
>>> run of text (words and intervening spaces) in a single font will be
>>> re-shaped as a unit at \shipout time. This allows full shaping (contextual
>>> swashes, ligatures, etc) to take effect across inter-word spaces.
>>> Currently, this feature is implemented only in the "contextual-space"
>>> branch of the code at sourceforge; anyone interested in testing it will
>>> need to check out and build the code from there. After some time, if no
>>> major problems show up, I expect to merge it to the master branch, and then
>>> to the TeXLive source tree.
>>> Feedback welcome..........
>>> JK
