[XeTeX] New feature REQUEST for xetex

ShreeDevi Kumar shreeshrii at gmail.com
Tue Feb 23 11:18:01 CET 2016


Wow! This is wonderful, Jonathan.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Feb 23, 2016 at 3:36 PM, Jonathan Kew <jfkthame at gmail.com> wrote:

> On 23/2/16 02:54, Andrew Cunningham wrote:
>
>> It would probably more than double, i was under the impression that
>> ActualText was a tag attrubute, so extensive tagging would be needed,
>> and actual text added to the tags.
>>
>
> The ActualText tagging is highly compressible, so in practice the increase
> in overall PDF size is not all that great.
>
>
>> But the question is how to practically make use of ActualText if there
>> is a visible text layer.
>>
>> PDF/UA for instance leaves the question deliberately ambigious.
>> ActualText is the way to make the content accessible, but developers
>> creating tools for PDF do not actually have to process the ActualText.
>>
>> So to index and search PDF files you need to build a discovery system
>> utilising tools that allow you to specify the use of ActualText in
>> preference to a visible text layer.
>>
>>
> Acrobat Reader uses it, if present, so that Copy/Paste from the PDF
> results in the correct Unicode text (more or less), and Find behaves as
> expected.
>
> Other PDF readers (such as Apple's Preview) may well ignore the ActualText
> tagging, in which case it doesn't help. I don't know whether tools like
> Evince or Okular handle it....
>
>
> I'm attaching two sample PDFs with a simple chunk of Hindi text (from the
> Unicode web site). The first, dev-old.pdf, is what XeTeX currently
> generates (using the "Annapurna SIL" OpenType font). In general, Copy/Paste
> and text search don't work very well -- a few characters may be OK, but
> others are junk.
>
> The second sample, dev-actualtext.pdf, was generated with an experimental
> new \XeTeXgenerateactualtext feature, which automatically "tags" each word
> with an ActualText representation.
>
> Some points to note:
>
> - The file size is 24662 bytes, while dev-old was 22875 bytes. Not too
> bad. Of course, a lot of that is the embedded font data; with longer
> documents that have lots of text but only a few fonts, the difference would
> presumably be somewhat greater.
>
> - Copy/Paste and Search work pretty well in Acrobat Reader. Not in
> Preview.app.
>
> - Highlighting of selected text (in Acrobat Reader) is somewhat broken,
> apparently due to the ActualText tagging (it looks better in dev-old). This
> may be fixable by tweaking exactly how the tagging is written into the PDF;
> I haven't investigated it further.
>
>
> No guarantees at this point as to whether/when this feature will actually
> be available. It was just a quick attempt to hack something up, to see how
> promising the results might be...
>
> JK
>
>
>
>
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
>   http://tug.org/mailman/listinfo/xetex
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/xetex/attachments/20160223/7d01a11e/attachment.html>


More information about the XeTeX mailing list