<p dir="ltr">It would probably more than double, i was under the impression that ActualText was a tag attrubute, so extensive tagging would be needed, and actual text added to the tags.</p>
<p dir="ltr">But the question is how to practically make use of ActualText if there is a visible text layer.</p>
<p dir="ltr">PDF/UA for instance leaves the question deliberately ambigious. ActualText is the way to make the content accessible, but developers creating tools for PDF do not actually have to process the ActualText.</p>
<p dir="ltr">So to index and search PDF files you need to build a discovery system utilising tools that allow you to specify the use of ActualText in preference to a visible text layer. </p>
<p dir="ltr">Andrew</p>
<div class="gmail_quote">On 23 Feb 2016 12:52 am, "Zdenek Wagner" <<a href="mailto:zdenek.wagner@gmail.com">zdenek.wagner@gmail.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div>Hi all,<br><br></div>the problem is caused just by a few characters, especially the short i-matra. It might be more difficult in other Indic scripts containing two-part vowels. The reason is that visually they appear in a different order than they should appear in Unicode representation. It can be solved by using ActualText. If all words were entered this way, the size of the PDF will double. It might be useful to use ActualText only for selected words.<br><br></div>It is not only the problem of copy&paste, you will not be able to use the search dialog in Acrobat. For instance, you will not be able to find किताब.<br><br><br></div><div class="gmail_extra"><br clear="all"><div><div>Zdeněk Wagner<br><a href="http://ttsm.icpf.cas.cz/team/wagner.shtml" target="_blank">http://ttsm.icpf.cas.cz/team/wagner.shtml</a><br><a href="http://icebearsoft.euweb.cz" target="_blank">http://icebearsoft.euweb.cz</a></div></div>
<br><div class="gmail_quote">2016-02-22 14:38 GMT+01:00 ShreeDevi Kumar <span dir="ltr"><<a href="mailto:shreeshrii@gmail.com" target="_blank">shreeshrii@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-family:georgia,serif">Hi Jonathan,</div><div class="gmail_default" style="font-family:georgia,serif"><br></div><div class="gmail_default" style="font-family:georgia,serif">I am using xetex/xelatex for typesetting of devanagari texts. </div><div class="gmail_default" style="font-family:georgia,serif">eg. <a href="http://sanskritdocuments.org/doc_devii/gangAShTakamkAlidAsa.pdf" target="_blank">http://sanskritdocuments.org/doc_devii/gangAShTakamkAlidAsa.pdf</a></div><div class="gmail_default"><font face="georgia, serif"><a href="http://sanskritdocuments.org/doc_devii/gangAShTakamkAlidAsa.html?lang=sa" target="_blank">http://sanskritdocuments.org/doc_devii/gangAShTakamkAlidAsa.html?lang=sa</a> (HTML TEXT version of the same)</font><br></div><div class="gmail_default"><font face="georgia, serif"><br></font></div><div class="gmail_default" style="font-family:georgia,serif">However, when the devanagri text is copied from the pdf, it does not display correctly - which is the case with complex scripts with most pdf creators (as far as I know).</div><div class="gmail_default" style="font-family:georgia,serif"><br></div><div class="gmail_default" style="font-family:georgia,serif">eg. </div><div class="gmail_default"><font face="georgia, serif">॥ गङ्गाष्टकं कालिदासकृतम् ॥</font><br></div><div class="gmail_default" style="font-family:georgia,serif">is displayed as</div><div class="gmail_default" style="font-family:georgia,serif">॥ गाकं कािलदासकृतम ॥<br></div><div class="gmail_default" style="font-family:georgia,serif"><br></div><div class="gmail_default" style="font-family:georgia,serif">Is it possible to add a feature to xetex to support search and copy of complex script text in scripts such as devanagari? </div><div class="gmail_default" style="font-family:georgia,serif"><br></div><div class="gmail_default" style="font-family:georgia,serif">It would really be great to have this "coming soon to a XeTeX near you"....... :-)</div><div class="gmail_default" style="font-family:georgia,serif"><br></div><div class="gmail_default" style="font-family:georgia,serif">Thanks.</div><div class="gmail_extra"><br clear="all"><div><div><div dir="ltr">ShreeDevi<br>____________________________________________________________<br>भजन - कीर्तन - आरती @ <a href="http://bhajans.ramparivar.com" target="_blank">http://bhajans.ramparivar.com</a><br></div><div dir="ltr"><br></div></div></div>
<br><div class="gmail_quote">On Thu, Feb 18, 2016 at 4:28 PM, <div class="gmail_default" style="font-family:georgia,serif;display:inline"></div>Jonathan Kew <span dir="ltr"><<a href="mailto:jfkthame@gmail.com" target="_blank">jfkthame@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">This is a pretty specialized feature, likely to be interest only to a small minority of users. But for those it concerns, here's something that is <div class="gmail_default" style="font-family:georgia,serif;display:inline"></div>"coming soon to a XeTeX near you".......<br>
<br>
<br>
I've recently implemented a new feature, controlled by the integer parameter \XeTeXinterwordspaceshaping. This will be available in the TL'16 release, if all goes well.<br>
<br>
This feature is relevant only when using OpenType/Graphite/AAT fonts, not legacy .tfm-based fonts.<br>
<br>
When \XeTeXinterwordspaceshaping is greater than 0, XeTeX will attempt to support fonts where the width of inter-word spaces may vary contextually, depending on the preceding and following text. This is needed by fonts such as SIL's Awami Nastaliq (in development) where words are expected to kern together across spaces.<br>
<br>
The default behavior of xetex is to measure each word in isolation, and simply string together a sequence of such word and space (glue) nodes to form the horizontal list that is then line-broken to form a paragraph. Normally, when inter-word spaces do not depend on the adjacent words, this works fine; but in Awami the width of inter-word spaces may vary drastically, even becoming negative in some cases.<br>
<br>
Setting \XeTeXinterwordspaceshaping=1 tells xetex to measure such spaces "in context" and take account of the contextually-modified widths during line breaking. This greatly improves the typeset result with such a font. Each word is still shaped and rendered individually, but line-breaking and word spacing respects the inter-word kerning.<br>
<br>
A further complication occurs when not only the width of the space but also the glyphs of the adjacent words themselves may be subject to contextual changes. An example of this would be a font that has OpenType ligature rules that apply to multiple-word sequences; e.g. a symbol font that ligates the text "credit card" to render a credit-card icon. Another example is the word-final swash forms in Hoefler Italic, which are intended to be used at end-of-line but NOT before word spaces within the line.<br>
<br>
These cases are addressed with \XeTeXinterwordspaceshaping=2. With this value, not only are inter-word spaces measured in context, but also each run of text (words and intervening spaces) in a single font will be re-shaped as a unit at \shipout time. This allows full shaping (contextual swashes, ligatures, etc) to take effect across inter-word spaces.<br>
<br>
Currently, this feature is implemented only in the "contextual-space" branch of the code at sourceforge; anyone interested in testing it will need to check out and build the code from there. After some time, if no major problems show up, I expect to merge it to the master branch, and then to the TeXLive source tree.<br>
<br>
Feedback welcome..........<br>
<br>
JK<br>
<br>
<br>
<br>
--------------------------------------------------<br>
Subscriptions, Archive, and List information, etc.:<br>
<a href="http://tug.org/mailman/listinfo/xetex" rel="noreferrer" target="_blank">http://tug.org/mailman/listinfo/xetex</a><br>
</blockquote></div><br></div></div>
<br><br>
<br>
--------------------------------------------------<br>
Subscriptions, Archive, and List information, etc.:<br>
<a href="http://tug.org/mailman/listinfo/xetex" rel="noreferrer" target="_blank">http://tug.org/mailman/listinfo/xetex</a><br>
<br></blockquote></div><br></div>
<br><br>
<br>
--------------------------------------------------<br>
Subscriptions, Archive, and List information, etc.:<br>
<a href="http://tug.org/mailman/listinfo/xetex" rel="noreferrer" target="_blank">http://tug.org/mailman/listinfo/xetex</a><br>
<br></blockquote></div>